Prune gh-pages branches using GitHub Actions

This website is hosted on GitHub Pages, built using nix and hugo with a custom theme that I maintain. I have automated updates for the nix flake to get new versions of hugo and nixpkgs, which applies to both my theme and the website itself.

This automation generates numerous commits and deployments to the gh-pages branch, mainly due to minor version bumps of hugo and other changes to the theme. While this hasn’t been a problem for my static website, which primarily consists of text, a recent addition has created some challenges.

I added a page for my ./3d-models/ containing .3mf, .stl, and .glb files. These binary files are not always reproducible, particularly with updates to nixpkgs that affect openscad and blender, which I use to create glb files from openscad sources. Consequently, the repository size has been growing continuously.

Identifying the problem

Recently, while on a coffee shop WiFi, I attempted a git pull that took too long to complete before I had to leave. This event highlighted the growing size of my repository and prompted me to find a solution.

Solution: Pruning the gh-pages branch

To address this, I decided to prune the gh-pages branch by squashing commits. However, I wanted to retain the latest commits separately to easily track recent changes in case of unexpected issues.

Initially, I considered writing a script for this task, but I found an existing action, myactionway/branch-pruner-action, which simplifies the process.

Here’s my workflow for pruning the branch: .github/workflows/squash.yml

---
name: 'Squash gh-pages'

env:
  NEW_FIRST_COMMIT: HEAD~19
  DEFAULT_BRANCH: 'gh-pages'

'on':
  workflow_dispatch:
  schedule:
    - cron: '47 7 * * 2'  # At 07:47 on Tuesday.

jobs:
  squash-gh-pages-branch:
    runs-on: ubuntu-22.04
    permissions:
      contents: write
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ env.DEFAULT_BRANCH }}
          fetch-depth: 0

      - uses: myactionway/branch-pruner-action@v2.0
        with:
          new_first_commit: ${{ env.NEW_FIRST_COMMIT }}
          branch: ${{ env.DEFAULT_BRANCH }}

This workflow runs weekly or can be triggered manually. It keeps the 19 most recent commits, squashing the rest into a single commit representing the entire previous history.

Results

Before squashing, the repository size from this API call was:

$ curl -s https://api.github.com/repos/etu/etu.github.io | jq '.size'
386904

After squashing the gh-pages branch, it reduced to:

$ curl -s https://api.github.com/repos/etu/etu.github.io | jq '.size'
296830

This shows a reduction from approximately 386 MiB to 296 MiB. However, I believe GitHub can further optimize the repository size internally.

Update: After squashing some more commits, the repository size reported by the API is now about 161 MiB.

Upon inspecting my local clone of the repository, it was about 680 MiB before the squash. After running git gc --aggressive --prune=now, it shrank to about 354 MiB.