Tims Git Notes

Here are my own notes about how git works, and why it's so fast at some things.

git directories

 * All meta-information is contained outside the working tree (in /.git)
 * .git/refs/heads - has named heads (these are local branches)
 * each file has a filename with the branch name, and a sha1 commit id as the file contents
 * .git/objects = object repository
 * each object is zlib-compressed file, or pack of delta-compressed objects
 * all sha1s refer to full-sized objects (that is, individual changes (like patch hunks) are not addressable in the object repository, only full items)

git concepts

 * inside the object repository are 4 types of objects:
 * blob = file contents
 * tree = directory (references other trees and blobs, with permissions)
 * commits = structured (but flat) text describing a commit
 * tags = structured text describing a tag
 * use git cat-file -p  to pretty-print any particular object in the tree


 * git uses content-addressable storage
 * the content of every item has a hash (sha1) that is used to reference the item
 * the item is stored under .git/objects/xx/yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy where xx is a directory, with the name being the first two digits of the sha1 and yy... is the filename. Note that the object repository does NOT know the original filename of the object it is storing (this is determined from 'tree' entries at runtime)
 * packs are compressed archives holding multiple objects, which are created by doing a garbage-collect or pack operation
 * packs consist of an index and a pack
 * content in packs are delta-encoded and compressed


 * The repository does not hold "changes" to files, it hold entire files and trees
 * pack files are delta-encoded, but this is an implementation detail.
 * Conceptually, every instance (version) of every file that ever existed in the project history, and every instance of every directory configuration that every existed, is stored in the repository.
 * a commit ALWAYS includes more than just a blob object
 * a commit is plain text, with a reference to commit parents (previous source states) and tree (current source state)
 * a commit refers to the top of tree object, which refers to sub-tree object and blobs for the entire source tree state at the time of the commit


 * there are two tree spaces in a git reposistory:
 * a commit tree - a graph of commits from a head leading back to the beginning of the repository
 * a source tree - a graph of trees and blobs forming the source state at a particular time
 * refs (heads, branches, tags, remotes) are all just references inside the object repository to different tree roots

pseudo-code for git operations

 * git add file.c
 * place file.c in object repository
 * calculate sha1 hash, zip file.c, and place contents under .git/object/xx/yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
 * update index with hash for file.c object
 * update index with new tree? object for directory containing file.c??


 * git branch test_branch - create a new branch called test_branch (but don't switch to it)
 * create file .git/refs/heads/test_branch with sha1 from the contents of the file pointed to by .git/HEAD
 * that is, if .git/HEAD has the contents "ref: refs/heads/master", and .git/refs/heads/master contains sha1 fe42b357730f3b37e80b3664f9b761b26cee9f68, then created the file .git/refs/heads/test_branch with that sha1


 * git revert HEAD - revert the last commit
 * ask for the commit message
 * create a commit object (plain text file), referring to the current HEAD commit
 * commit parent will be the current HEAD commit
 * commit tree will be the same as HEAD parent's tree (taking us back to the state before the current HEAD commit)
 * add that commit object to the object repository
 * move the HEAD reference to this commit (e.g. change the sha1 in .git/refs/heads/master to the sha1 of the new commit)

how does git status work?

 * git status compares the index with the head and the current working directory
 * [need lots of detail here - some things that git status detects are: 1) untracked files, 2) staged files, 3) files edited in the working directory but not staged]

Random notes

 * index functionality
 * when you 'git add' a file, the full text is put in the object repository
 * it is not clear whether if you un-stage a file it is removed from the object repository (I don't think it is, until you garbage-collect it)


 * staging is not atomic - you organize a bunch of changes (some on file boundaries and some at the individual hunk level), and then commit them atomically
 * a commit tracks the full tree state, as well as indicating ancestors (forming a tree graph)
 * However, the tree state and the commit history are completely (logically) separate

= git cheat sheet =
 * repository: init, clone, fetch, pull, push
 * branches: branch, checkout
 * info: status, log, show, diff, blame, gitk, describe
 * find problems: bisect, blame
 * adjust changesets: revert, rebase, cherry-pick, reset
 * commit-related: add, rm, mv, revert, status, commit
 * patch management: format-patch, send-email, apply, am
 * merge: merge, rebase, diff, add

= useful operations I keep forgetting how to do =
 * find the version of a kernel which includes a particular feature (commit)
 * git describe --contains 