Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Git is built out of deltas. You're already storing all of them.


This is isn't quite how I would describe it. Git does do delta compression in packfiles, but the fundamental primitives lack deltas. It's just:

* The contents of this directory is this list of of files whose contents have these SHAs.

This is called a "tree".

The SHA of a tree is also an object, and can appear in another tree.

To see this for yourself, in any git repository run `git cat-file -p HEAD`. You'll see the (more or less) raw commit object for HEAD, which will point at a tree SHA. To see the contents of that tree-sha, run `git cat-file -p <the tree SHA>`. That tree object has a one-to-one correspondence with what you'll see on-disk in the objects directory, (if the object has not been put in a pack file).

Above I have more or less fully described the contents of the files found in `.git/objects`.

The delta'ing doesn't happen until later, if and when packfiles are constructed. But they're just a storage/bandwidth optimisation. AFAICT, these deltas have nothing to do with what you might think of as "git diff", which is just some fancy porcelain which looks at objects.

The nice property of the construction is that given a large tree, even if nested, if you change a single file in that tree, you will only change as many trees as the file is deep in the tree, so computing changes between two nearby trees can usually be done quickly.


The problem is not with deltas between revisions (i.e. the commits themselves) but with the Git packs spanning multiple revisions. At the time of a `git pull`, a user's repository can be at any revision between initial and latest. Who is going to seed (and keep seeding) packs for all of those possible revision intervals?


Conceptually, git is built out of snapshots.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: