Branching out to Git
Drupal is switching from CVS to Git. That caused me to re-evaluate the mess of branching we're starting to experience with Subversion. The process in SVN of doing a copy to a branch, then checking out that branch to another directory, and setting up another settings.php and .htaccess file for the separate install is pain for me. Whether I'm on a the work LAN or at home, having to download 200M of files each time I want to branch is pain. That pain causes our development process to cram fixes to several Open Atrium case fixes into one branch. By the time I get around to peer reviewing, that means merging the branch into trunk is either blocked by another another case needing review in the same branch, or having to do a manual merge of specific components in the branch into trunk.
From what I read about Git, branching is better. I never really see how in the commentary, but simply that it's better. I set out on a process of finding whether or not it was even possible to retain all our SVN history in a conversion to Git. Git has a svn commandline interface, which I used to checkout our SVN repository to a local Git one, make changes, then submit the changes from my local Git checkout back to the central SVN repository. I don't think another SVN user would be able to tell the difference between that Git-sourced commit and a SVN-checkout-sourced one.
Once I examined the SVN checkout to Git a bit closer, I found some issues which could have been easily overlooked until they were needed. One big one was how tags are interpreted as branches in the git-svn conversion. A second is that fetching SVN with Git creates a local copy. I learned in the investigation process that a "bare" Git repository is better suited for using as a central copy. A bare version consists of just the Git repository information, not an actual checkout of the files with a hidden subdirectory. How else would I solve the issue of proper tagging into a bare repository other than by help from another Drupaler, John Albin? The answer is that there's simply no better way. Seriously, that's the best way I found. I tried another Ruby script on Github named svn2git, but I had to boot a Linux VM to get Ruby installed quickly, and what it produced just didn't feel like what I was supposed to end up with.
When fetching SVN with Git, there is an option to leave SVN metadata out of the commit notes. I think I read a lot of SVN conversion message boards, blogs, and documents, and most recommended adding the --no-metadata
option when doing the conversion from SVN. They cited that the only reason to keep the metadata would be to rebuild or sync up the Git repository with the SVN one again. In our organization, I see the conversion as a one-way street and that it should be all or none. We'll mothball the SVN server. I have a problem with leaving the SVN metadata out entirely because we have years of case history that cite specific SVN changesets where a bug fix was made or feature was added and we do refer back to them. I made a little change to John's script as part of my first Github fork so the conversion process will keep the metadata by default instead of assuming it should be left out. The result is each one of the notes in the Git converted repository will have a tagline with SVN information, making the commit note look like:
Add session expire module
git-svn-id: svn+ssh://svn.cgraphics.com/var/svn/cg-projects/trunk@76 d32c1383-1abf-4ac8-9459-07b60b375d64
In the future, when I need to search for an old SVN changeset ID, that leaves at least one breadcrumb for me to find it, like searching for changeset 4 in this example:
deekayen-macbook:acmebank.git davidnorman$ git log | grep -B 6 @4<br>commit d72195d0cc0714e93f84bb3f90d5f0d15631ce2f<br>Author: David Norman <davidnhatesspam@cgraphics.com><br>Date: Fri Apr 23 20:10:54 2010 +0000<br><br> add seed db dump<br><br> git-svn-id: svn+ssh://svn.cgraphics.com/var/svn/acmebank/trunk@4 6f85cd5b-73ed-4e50-a4a9-c7788a632756
In any case, I was satisfied with the conversion process using my fork of John's script. Now I needed to address some of the concerns others developers have about Git. They are roughly as follows:
- What we have now [with SVN] is working, so why do we have to break it?
- Is branching and merging actually any easier than with SVN?
- Do we really have to reference commit d72195d0cc0714e93f84bb3f90d5f0d15631ce2f when documenting changes in a case instead of an incrementing changeset ID?
- Are our developers actually ever going to commit between each other before they commit to the central repository?
- Is someone going to show me how this works before it launches?
- Can the Drupal testing platform work with Git repositories?
At first, my answer to all this was, "Drupal is switching to Git, so we're all going to have to learn it anyway." However, I remembered cYu spent some time using Bazaar, so I needed to break out of my Git tunnel vision. What better place is there to start than the Wikipedia comparison of revision control software? There is no better place. They at least make an effort to keep bias out of the contents or document bias that does exist.
First, I narrowed the list down to free ones, then ones that are actively maintained, then ones which listed SSH access to the repository. At that point I had Bazaar, CVS, CVSNT, darcs, Git, Mercurial, and Subversion. I excluded CVS and CVSNT for their age and file-level revisioning. Then I excluded Subversion because we're trying to find something better than it. Having not heard of darcs much, I actually set out on researching it. When I did start researching it, the documentation I found on conversion from SVN had a lot of phrasing like, "Still being sketched out," or "PROPOSAL A." Reading enough of that discouraged me enough to classify it "too much work."
Next I looked at Bazaar. Others, like the Ubercart team, spent time developing separate from drupal.org's CVS system with bzr. There seems to be enough Subversion to Bazaar documentation for me to work with, but when I looked for Mac OSX GUIs, I just couldn't find anything. I did see mentions of installing Macports or Fink to get any number of GUIs, but that's not the native Cocoa interface I had in mind. Even if I'm missing a Cocoa interface, I'd really like to have a selection of options to choose from. I like options. On the basis of a shortage of GUIs, I ignored how good or bad the underlying revision management is or is not, and tossed it out as a choice.
If you're keeping track, that left my selection of choices down to just Git and Mercurial, both created originally for Linux kernel management. At this point, how do you not just base your decision on playing in the same sandbox as all the cool kids at drupal.org?
As for answering the questions developers had about making a switch away from SVN, I have only found the following answers.
-
Q: What we have now [with SVN] is working, so why do we have to break it?
A: I haven't read commentary from anyone that decided to go back to SVN after switching to Git.
-
Q: Is branching and merging actually any easier than with SVN?
A: I still don't know for sure. I haven't branched and merged yet, but watching a Git branch/merge demonstration video did show an appealing process of switching to a branch and back to the HEAD without having to do 200M checkouts and switch to alternate directories.
-
Q: Do we really have to reference commit d72195d0cc0714e93f84bb3f90d5f0d15631ce2f when documenting changes in a case instead of an incrementing changeset ID?
A: Sort of, yes. It also seems to be acceptable to use a 7-character shorthand like d72195d. Even if we picked what seems to be my second choice, Mercurial, it also tracks the code by SHA-1 hashes. I did read where it also has incrementing revision numbers, but those are specific to each repository instance, so 200 on my laptop might be different from 200 on a co-worker's laptop within the same repository. The downside of this is that it's harder to see the evolution of changes represented through commit IDs without actually going to a Git client to look at logs.
-
Q: Are our developers actually ever going to commit between each other before they commit to the central repository?
A: I think it's highly unlikely that I'll ever setup port forwarding on my home firewall to start accepting branch commits directly from someone else's desk in their house. That said, it is interesting that the possibility is available.
-
Q: Is someone going to show me how this works before it launches?
A: I downloaded the Kindle version of Version Control with Git: Powerful Tools and Techniques for Collaborative Software Development which does a decent job of explaining the guts and background of Git. I think it fails in explaining how to pickup from SVN and continue getting work done because it goes too deep. If you want to understand how and why Git does what it does, it is a great book. I haven't finished it yet. That means I'll be messing with various Git interfaces like SmartGit, GitX, Gity, SourceTree, Git Gui, GitBox, git-cola, and others to find things to recommend to our team. Spoiler: I already like SmartGit and SourceTree.
-
Q: Can the Drupal testing platform work with Git repositories?
A: The Drupal Quality Assurance platform runs on CVS right now. It will need to be upgraded. Right now the only interfaces that are in the PIFR repository support bzr and SVN, so I'll likely have to create the Git one if I want it soon for our internal use
Git also has the added benefit of having resources like github, which, for a fee of course, are willing to license their platform so we can run it on our own infrastructure. I see this kind of thing potentially lessening our repository management burden someday in the future.
Go get started. Clone a contributed Drupal module on http://git.drupalcode.org/. See what the Drupal.org Git Migration Team is doing and how you can help.
Post categories