Cleaning our repository history
In our daily work we all make mistakes in our git commits. Sometimes this errors can easily be repaired just by reverting our commits. But if we are working in a public repository and we have accidentally pushed some sensitive information, we now have a problem.
That sensitive information is in our repository history and anybody who has the enough time to explore can gain access to that. Our clients or even ourselves are now dealing with a privacy issue.
We can always try to repair that commit in our local environment and push our code again using the --force parameter. But we know, when you do that, a kitten dies. And if your team members already pushed something, everything in the repository will be messed up.
So the best option is to try and fix this in a more elegant way that allow us to erase all the traces of our mistake, but preserves repository integrity.
Git provides the filter-branch command, but sometimes this powerful tool becomes too complicated and slow. In trying to find an easier way to do it, finally came across the BFG Repo-Cleaner.
This tool is an alternative to git filter-branch that provides a faster and easier way to clean git repositories. It is written in Java, so you need to make sure you have JRE 6.0 or above installed. To clean your repository you only have to follow the steps below:
Clone your repository using the --mirror option. Beforehand, you should repair manually your mistakes in the repository.
1
$ git clone --mirror git://example.com/my-repo.git
Now, download BFG and execute it against your cloned repository.
1
$ java -jar bfg.jar --strip-blobs-bigger-than 1M my-repo.git
This step will remove all the blobs bigger than 1MB from your repository.
Once the index has been cleaned, examine your repository's history and then use the standard git gc command to strip out the unwanted dirty data, which Git will now recognise as surplus to requirements:
1
2
3
$ cd my-repo.git
$ git reflog expire --expire=now --all
$ git gc --prune=now --aggressive
Finally, once you're happy with the updated state of your repo, push it back up
1
$ git push
If everything went well, your repository won't include any of the accidentally committed files.
Here you have some common examples to use with Drupal:
Delete all files named 'id_rsa' or 'id_dsa' :
1
$ java -jar bfg.jar --delete-files id_{dsa,rsa} my-repo.git
Delete database dumps:
1
$ java -jar bfg.jar --delete-files *{mysql,mysql.gz}
Delete files folder:
1
$ java -jar bfg.jar --delete-folders files
We have to remark that BFG assumes that you have repaired your repository before executing it. You need to make sure your current commits are clean. This protects your current work and gives you peace of mind knowing that the BFG is only changing your repo history, not meddling with the current files of your project.
Finally, here you have some useful related links:
- BFG repository in GitHub: https://github.com/rtyley/bfg-repo-cleaner
- BFG project page: http://rtyley.github.io/bfg-repo-cleaner
- GitHub doc - Remove Sensitive data: https://help.github.com/articles/remove-sensitive-data
- Git fitler-branch documentation: https://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html
- Git Tools - Rewriting History: http://git-scm.com/book/en/Git-Tools-Rewriting-History
Tags: Drupal Planet