orib.dev: Git9

Plan 9 is a non-posix system. Upstream git has been ported, but feels distinctly un-plan9ish, and even in its native environment, there are many complaints about its user experience.

So, in my hubris, I wrote a new git client. The goal was to provide a small, simple, easy to understand implementation that fit well into plan 9. This means a minimum of special commands, configuration, or flags. It also means taking binary file formats, and exposing them in a way that they could be manipulated from the shell directly.

I would say that I succeeded. I have a git implementation that covers my day to day needs. As of several months ago, it's graduated to being my daily driver.

It provides the usual commamds you'd expect:

git/clone:: clones a repository
git/pull:: fetches and updates a repository
git/add:: tracks a file
git/rm:: removes a file
git/diff:: checks the difference between the current repository
and more:: check the manpage

Implementation

Git9 is built around five important binaries. Others are shipped, but are implemented in C for performance, not because they need to be. These are the core binaries:

git/fetch:: Negotiates and downloads a pack file
git/send:: Negotiates and uploads a pack file
git/save:: Updates a repository to include
git/query:: Implements a small query language to walk the commit graph
git/fs:: Serves repository history as a file system

The first three are unsuprising, and have analogs in unix git. Git/fetch is the core of git/clone and git/pull. For a provided set of branches, it downloads the data that is not currently in the repository, printing the branch names and their hashes for further processing by the wrapper script.

Git/send is the inverse of git/fetch. It computes the commits in the local repository, and sends the data that the remote repository does not have.

Finally, git/save creates a new commit from a given list of files, a message, and a parent commit. The commit info is prepared by git/comit, git/import, or other scripts that can produce commits.

So far, this is conventional. Where git9 differs most strongly is the addition of git/fs.

Git/query allows asking questions about the commit graph that are difficult to answer without walking the tree in a manner that would be very inefficient from a file system interface. The query language is vaguely inspired by mercurial revsets, but stripped down and converted to a mostly postfix form for ease of parsing.

Git/fs serves a file system interface to a git repository in the current directory. This file system provides a read-only view of the repository contents. By default, it is mounted on /mnt/git. It does not cache mutable data, so any changes to the git repository will immediately be reflected in git/fs.

The existence of a file system interface is extremely powerful, allowing scripts that operate on historical data to be written with ease. For example, git diff is not strictly necessary. The same effect could be achieved with plain old diff(1), though with clunky file paths. The git/diff shipped with git9 exists simply for the purpose of shortening the paths typed.

The file system looks something like:

/mnt/git
      +-- ctl
      +-- HEAD
      |    +-- tree
      |    |    +--files
      |    |    +--in
      |    |    +--head
      |    |
      |    +-- hash
      |    +-- msg
      |    +-- parent
      |
      +-- branch
      |      |
      |      +-- heads
      |      |      +-- master
      |      |            +-- [commit files, see HEAD]
      |      +-- remotes
      |             +-- origin
      |                     +-- master
      |                            +-- [commit files, see HEAD]
      +-- object
            +-- 00051fd3f066e8c05ae7d3cf61ee363073b9535f # blob contents
            +-- 00051fd3f066e8c05ae7d3cf61ee363073b9535c
                  +-- [tree contents, see HEAD/tree]
            +-- 3f5dbc97ae6caba9928843ec65fb3089b96c9283
                  +-- [commit files, see HEAD]

So, if you wanted to look at the commit message of the current branch, you could simply do:

	cat /mnt/git/HEAD/msg

This makes scripting easy.

Walkthrough

A good example of the benefits this brings is evident in the implementation of git/merge, which is attached below in full.

`git/merge`

#!/bin/rc -e
rfork ne
. /sys/lib/git/common.rc

fn merge{
	ourbr=$1/tree
	basebr=$2/tree
	theirbr=$3/tree

	all=`{walk -f $ourbr $basebr $theirbr | \
		subst -g '^('$ourbr'|'$basebr'|'$theirbr')/*' |\
		sort | uniq}
	for(f in $all){
		ours=$ourbr/$f
		base=$basebr/$f
		theirs=$theirbr/$f
		if(! test -f $ourbr/$f)
			ours=/dev/null
		if(! test -f $basebr/$f)
			base=/dev/null
		if(! test -f $theirbr/$f)
			theirs=/dev/null
		if(! ape/diff3 -m $ours $base $theirs > $f)
			echo merge needed: $f

		if(test -f $f)
			git/add $f
		if not
			git/rm $f
	}
}

gitup

flagfmt=''; args='theirs'
eval `''{aux/getflags $*} || exec aux/usage
if(! ~ $#* 1) exec aux/usage

theirs=`{git/query $1}
ours=`{git/query HEAD}
base=`{git/query $theirs ^ ' ' ^ $ours ^ '@'}

if(~ $base $theirs)
	die 'nothing to merge, doofus'
if(! git/walk -q)
	die 'dirty work tree, refusing to merge'
if(~ $base $ours){
	>[1=2] echo 'fast forwarding...'
	echo $theirs > .git/refs/`{git/branch}
	git/revert .
	exit ''
}
echo $ours >> .git/index9/merge-parents
echo $theirs >> .git/index9/merge-parents

ourpath=/mnt/git/object/$ours
basepath=/mnt/git/object/$base
theirpath=/mnt/git/object/$theirs
merge $ourpath $basepath /mnt/git/object/$theirpath
>[1=2] echo 'merge complete: remember to commit'
exit ''

There's more code involved in deciding what to merge than there is in the merging itself. Walking through it in sections, we begin with initializing the binary.

`setup`

The gitup function comes from the /sys/lib/git/common.rc shell library. It checks that we're in a git repository, and adds a few small utility functions like die.

From there, we parse the flags. We take no flags, so flagfmt is empty. The only argument is the branch we want to merge into the current one.

gitup

flagfmt=''; args='theirs'
eval `''{aux/getflags $*} || exec aux/usage
if(! ~ $#* 1) exec aux/usage

`commits`

Next, we need to figure out what the commits we're merging are. For a 3-way merge, we need the last common commit, and the two heads we're bringing together. A few git/querys make short work of that.

theirs=`{git/query $1}
ours=`{git/query HEAD}
base=`{git/query $theirs ^ ' ' ^ $ours ^ '@'}

The only interesting thing to note is the @ operator in the git/query command line. What it does is find the least common ancestor of two commits. So, if you had this commit graph:

       o---o---T <-- theirs
      /
-----L     o---o
      \  /
        o---o---O <--ours

Then the @ operator would walk back to the point at which the two branches diverged, marked with L.

`preflight`

The next chunk of code ensures that we're in good shape to merge. If the least common ancestor of our curernt commit is the same as their commit, that means that the branches never diverged. Moreover, it means that we already have their commit.

--o--o--T--o--o--O <--ours
        ^
        |
      theirs

It also checks that we don't have unmerged work in the tree, so we don't make a mess of things that are in progress.

if(~ $base $theirs)
	die 'nothing to merge, doofus'
if(! git/walk -q)
	die 'dirty work tree, refusing to merge'

If, on the other hand, the base commit is the same as our commit, we can simply fast forward. The commit graph for that looks like:

--o--o--O--o--o--T <--theirs
        ^
        |
      ours

And we can simply move the ours pointer forward to point at their branch.

merging

After the setup and checks are complete, we're ready to merge. For this, we simply invoke ape/diff3 on the files, pairwise, to do the merge. We first figure out which files we want to merge:

fn merge{
	ourbr=$1/tree
	basebr=$2/tree
	theirbr=$3/tree
	all=`{walk -f $ourbr $basebr $theirbr | \
		subst -g '^('$ourbr'|'$basebr'|'$theirbr')/*' |\
		sort | uniq}

In this snippet of code, we walk down the file trees to get the list of all files in all three branches. This is the list of files we want to invoke merge on. Since some of the files may exist in one branch but not another, it's necessary to substitute the ones that don't exist with /dev/null. That happens in the next snippet:

	for(f in $all){
		ours=$ourbr/$f
		base=$basebr/$f
		theirs=$theirbr/$f
		if(! test -f $ourbr/$f)
			ours=/dev/null
		if(! test -f $basebr/$f)
			base=/dev/null
		if(! test -f $theirbr/$f)
			theirs=/dev/null

Finally, the important bit happens: the merge:

		if(! ape/diff3 -m $ours $base $theirs > $f)
			echo merge needed: $f

We ensure the files are tracked or removed, as needed, and then we're done.

		if(test -f $f)
			git/add $f
		if not
			git/rm $f
	}

Most of the other tools in git9 are written as shell scripts, following similar principles. Examples include git/clone, git/log, git/commit, and git/revert.

All in all, I'm pretty happy with how it turned out.