Hacking:repo-shrink of 2014-06

From ParabolaWiki
Jump to: navigation, search


abslibre.git has gotten large (>150MB) because some files have been committed that shouldn't have been. It is slow to clone, and is taking too much disk space on the server.

To correct this, lukeshu wrote a filter-branch script to shrink the repo. If you have an existing checkout of abslibre.git, you will get a warning about the upstream changing. You have two options:

  1. Delete (or backup) your checkout, and clone the new version.
  2. Run the script yourself on your copy. On lukeshu's box, it took about 13 minutes. On the git server, it took about 8 minutes.

If you have unpushed commits in your copy, and are not comfortable with git rebase, then running the script on you copy may be the better option. If you are comfortable with git rebase, then you don't need me to explain what to do.

Running the script

The script is:

cleanup.sh
#!/bin/bash

files=(
	# sources
	libre-testing/hplip-libre/hplip-3.12.4.tar.gz
	libre-testing/hplip-libre/hplip-3.12.4.tar.gz.asc	
	pcr/ryzom-hg/.ryzom-hg-20131213

	# sources
	# I've verified that all of these PKGBUILD behave correctly without these files
	libre/blackbox-libre/blackbox-0.70.1.tar.gz
	libre/dvdrip-libre/dvdrip-0.98.11.tar.gz
	pcr/python2-sfml2/1.4.zip
	pcr/python2-sfml2/master.zip
	pcr/qtkeychain/qtkeychain-0.1.zip

	# compressed files
	# I would rather they were uncompressed, but I guess they can stay
	#libre/linux-libre/patch-3.14-gnu-3.14.1-gnu.xz
	#pcr/debootstrap-libre/debootstrap.8.gz

	# binaries
	pcr/wuala # non-free, .jar file checked into git

	# logs
	libre/p7zip-libre/p7zip-libre-9.13-2-i686-build.log

	# vim/kate swap files
	kernels/aufs3-libre/.PKGBUILD.kate-swp
	libre/gnu-ghostscript/.PKGBUILD.swp
	libre/grub2/.archlinux_grub2_mkconfig_fixes.patch.swp
	libre/grub2/.archlinux_grub_mkconfig_fixes.patch.swp
	libre/hplip-libre/.hplip.install.swp
	libre/iceweasel-libre/.libre.patch.swp
	libre/kdebase-konqueror-libre/.PKGBUILD.swp
	libre/linux-libre-tools/.PKGBUILD.swp
	social/hunspell-pt-br/.PKGBUILD.kate-swp
	'~emulatorman'/hunspell-pt-br/.PKGBUILD.kate-swp
)

git filter-branch --prune-empty --index-filter "git rm -r --cached --ignore-unmatch $(printf '%q ' "${files[@]}")" master

Before doing any of this, make sure you have no uncommitted files.

My process of running the script was:

$ emacs cleanup.sh
( enter the above script )
$ cd abslibre
$ time bash ../cleanup.sh
( hundreds (thousands?) of lines of output ommitted )
Rewrite 390443060f96a9599bdea9e18811d5a56e4c5b64 (6953/6954)rm 'pcr/qtkeychain/qtkeychain-0.1.zip'
Rewrite d07c850a109062459c30ac81a4097d13603872ee (6954/6954)rm 'pcr/qtkeychain/qtkeychain-0.1.zip'

Ref 'refs/heads/master' was rewritten

real    13m26.555s
user    3m31.543s
sys     1m44.310s
$ cd ..
$ mv abslibre abslibre.bak
$ git clone file:///${PWD}/abslibre.bak abslibre
Cloning into 'abslibre'...
remote: Counting objects: 39965, done.
remote: Compressing objects: 100% (28925/28925), done.
remote: Total 39965 (delta 14456), reused 27420 (delta 8807)
Receiving objects: 100% (39965/39965), 27.82 MiB | 13.71 MiB/s, done.
Resolving deltas: 100% (14456/14456), done.
Checking connectivity... done.
$ cp abslibre.bak/.git/config abslibre/.git/config
$ du -h --max-depth 0 abslibre{,.bak}
50M     abslibre
223M    abslibre.bak

The reason for performing the clone is that git tries hard to not delete your data, and that the "large" version is still sitting there. It could be purged in the existing repository, but it is safer to clone it, especially if you aren't extremely familiar with git. You probably want to keep your abslibre.bak for a while, just in case.