-
Notifications
You must be signed in to change notification settings - Fork 1
Working with Branches
In module one we said that Git is a stupid content tracker. We had this metaphor of an onion. Now we can move on the next layer of the Git onion and look at the features that target into a full-fledged revision control system, features like branches and merges.
- As soon as you have a Git project you also have a branch. Git creates this branch (master) for us when we do our first commit. Let's look at the list of branches in the project with
git branch
like this without any argument.
$ git branch
master
And there it is, our default branch, the master branch.
- Git normally puts branches inside
.git
in a directory calledrefs
and the subdirectory calledheads
. Ignore the other subdirectory for now. And there it is, a small 41 bytes file called master. This is our master branch. What's inside this file?
$ git cat-file master -p
007ffe9977176cdf4af8928073252ba50125504b
The file contains a single line, a SHA1. And as you probably expect, it's the SHA1 of the current commit, this commit here.
-
The branch is nothing else than a simple reference, a pointer to a commit essentially. That's why the directory that contains branches is called refs, references.
-
Note that the master branch actually has no special status in Git. Yeah Git created it for us, but otherwise it's just a branch like any other, and all there is too it is this small file. I could actually delete or rename the master branch just by deleting or renaming this file. I could even create a new branch just by writing a new file into this folder containing the SHA1 of a commit. That would be hacking arguably, but it would work.
-
Let's create a new branch the right way by using a git branch with a branchname.
$ git branch lisa
- About this branch. Imagine that we want to insert recipes in our cookbook, but we also get alternate recipes for a friend, and we want keep those ones in a separate branch. Let's call this new branch
lisa
, the name of our friend. Our idea is that we'd put our own recipes inmaster
and our friend's recipes in thelisa
branch. There we are. We have a new branch, we can see it listed amongst branches, and we can see it alongside master in therefs heads
folder. And if we look at it, we see that it has exactly the same content as master, same commit.
So this is what we have now, two commits, two branches, and the branches are pointing at the same commit.
-
Now that we have two branches, if we look at the list of branches again, we see that one branch is marked with an asterisk because it's the current branch. How does Git know that master is our current branch? There must be some kind of information, probably in the
.git
folder that says which branch is the current branch, some kind of file maybe that contains that information. And indeed there is such file, if you look at the.git
folder again, you will see a file named HEAD in here. If you look inside HEAD, then you will see that it contains a reference to a file, another file. -
This is Git's way to reference files, this syntax. It's saying
HEAD
is currently pointing atrefs/heads/master
, the file representing the master branch. There is only one HEAD, so there is only one current branch. Let's add it to the diagram and move on. -
So now let's change the files in the project. I will add the list of ingredients for the
apple pie.txt
. Here, let me add this file to Git and commit it.
$ vim apple pie.txt
--8 Apples added, Oranges Removed--
$ git add apple pie.txt
$ git commit -m "Updated ingridients"
- Okay, let's see what just happened inside Git step by step.
-
Git created a few new objects in the object database for this commit. In particular, it created the commit itself -
e268
. It's an object to remember. And this commit has the previous commit as a parent. Then Git looked inside the HEAD file to find what the current branch is, and it moved that branch to point at the new commit. So the master branch moved, but notice that HEAD itself did not move. It was pointing at master before the commit. It's still pointing at master. Master is moving. HEAD is just coming along for the ride.
-
So far we didn't touch the new lisa branch. Lisa is still pointing at the previous commit where it was when we created it. Now let's make lisa the current branch.
-
$ git checkout lisa
-
When I git checkout lisa, two things happen. The first thing that happens is that Git changes HEAD to point at lisa. There, now HEAD is pointing at
refs/heads/lisa
.
-
The second thing that happens, Git just replaced the files and folders in our working area, the working directory, with the files and folders in this commit. So after the checkout, our working area changed to the content of the commit pointed at by lisa. If I look at the content of the Apple Pie file here, the ingredients are gone. It is the previous version of the file.
$ cat apple pie.txt
- So, that's what checkout means. It means move HEAD and update the working area. Now let's modify the Apple Pie recipe again. I will paste in Lisa's versions of the ingredients.
$ vim apple pie.txt
--10 Apples added, Oranges Added back--
$ git add apple pie.txt
$ git commit -m "Updated ingridients"
- Let's commit these changes. Git adds the commit to the object database, and it moves the current branch, lisa, to point at the new commit. HEAD didn't change, master didn't change of course, but lisa changed. Now it points at the new commit.
- First, let's move back to the master branch. I will check it out. There.
$ git checkout master
Now the branches didn't move remember, but HEAD did move. It's now pointing at master. And if I look into the Apple Pie recipe, I will find my own version of the recipe here, not Lisa's version.
- Now, let's merge Lisa's changes from her branch, lisa, into the master branch.
$ git merge lisa
And there you are. We have a conflict. We want to have both our changes and Lisa's changes in master, but Git is warning us that at least some of those changes are conflicting. We need to solve the conflict manually.
- If we look inside the Apple Pie file, we will see that this line, this one was changing divergent way in our recipe and in Lisa's recipe:
Master => --8 Apples added, Oranges Removed--
Lisa => --10 Apples added, Oranges Added back--
Let's go for a middle ground.
--9 Apples added, 5 Oranges Added--
Now if we git status
, we see that this file is not staged for the next commit. We need to add it explicitly. This is our way to tell Git that the conflict has been fixed.
$ git commit
There. And now we can complete the merge. If we hadn't had conflicts, then Git would've done this last step automatically, but because we did have conflicts, we have to say okay, we are done fixing all the conflicts, Git. And we do that with a commit. Without even need to give it a commit message, Git knows that we are in middle of a merge, so it will create a suitable message automatically.
- If you look at the log now, you will see a brand new commit, and if you look inside this commit with
cat- file
, there it is. It's just like any other commit we've seen so far. A merge is just a commit with one exception. It has two parents. That's what makes it a merge. A commit in Git usually has one parent, but it can have as many parents as you like actually. So let's update the diagram. Git created a new commit with two parents to represent the merge and moved master to point at the new commit. That's how merging works.
-
Let's talk about trees and blobs to show you in more detail how Git manages your working directory.
-
The objects in the database are commits, trees, and blobs and also annotated tags. All these objects are arranged in a graph. They reference each other. There are
- references from a commit to its parents
- references from a commit to its tree
- references from trees to blobs and other trees.
-
These references all look alike, but they are used in two different ways.
- References between commits are used to track history.
- All the other references are used to track content.
-
We've also seen that Git is good at reusing content so you can have objects that are reachable from more than one commit, like these ones here.
-
The point I want to make is that when you checkout something Git doesn't care about history. It doesn't look at ways that commits connect to each other. It just cares about trees and blobs.
-
So, if you looking towards from this commit here(2.), then Git forgets about the link to the parent of the commit, and it looks at the tree in the commit and all the objects that can be reached from there. That is the entire state of the project at the time of the commit, a complete snapshot of every file, every folder. Git uses this information to replace the content of your working directory. That's how you travel back and forth in time with Git. It is the whole point of versioning.
-
And if you look at this commit here(3.), well same thing. It comes with an entire representation of the entire project.
-
You might think that merge commits most be more complicated than that, but actually they're not. Okay, they have multiple parents, that's the definition of a merge, but Git doesn't care about that if you checkout. It just goes into the commit and retrieves the tree in the commit as usual.
-
A merge commit will in general have its own tree because the objects in the merge might not be present in any of the parents. Same goes with a file that has lines from both parents, for example. On the other hand, from the merge commits tree you can probably reach objects that are also reachable from other commits. And once again, Git doesn't care about which blob or tree was introduced by which commit. When it's towards the commit, it just reuses objects that are already there, and it creates the objects that are not already there. And when it checks out a commit, it just looks at the tree and rebuilds the state of the project from there.
-
Don't get confused with trees and blobs. Retrieving a past state in Git is a pretty simple affair. It's just a stupid content tracker. You should just focus on history, how commits connect to each other, and then you should trust Git to do the right thing with trees and blobs.
-
Git doesn't really care much about your working area. Remember, when you checkout, Git just replaces the working area with the stuff from the object database. Git mostly cares about the objects in the database, not your working directory. The objects in the database are immutable and persistent while the files in your working directory are expressive as they get. They can change as quickly as you can do a checkout. Git is not reckless with your working area. It will give you a warning before overriding your files. For example, if you try to do a checkout, but you have uncommitted changes, Git will tell you that. But other than that, as far as Git is concerned, your working area is the least important part of your project. All the good stuff is in the .git directory.
-
Let's discuss first special case of a merge. Let's checkout the lisa branch. There, HEAD moves to point at lisa. Now we're in lisa's mind again.
-
Imagine that we managed to convince Lisa that our version of the apple pie, the one in master, is tastier than her version. You know, one less apple can work miracles. So she decided to update her version of the recipe, the one in her branch.
-
Earlier on we merged lisa in master. Now we want to merge master in lisa.
-
Now, how does Git handle this merge? It could do it in the usual way just like it did when we merged in the other direction. It could create a new commit that has two parents (
ecbe
and007f
), these two commits here would be the parents, and then move lisa to point at the new commit. This new commit would be currently not to have conflicts because we already solved the conflicts when we merged in the other direction. -
So it would be easy for Git to create this commit, but it would also be wasteful. Think about what we're trying to achieve here. We want the commit that contains the latest version of all the stuff in master and the latest version of all the stuff in lisa. That's all we want. But we already have such a commit. It's the latest commit of master (
ecbe
). It contains all the latest objects in master, of course, and also the latest objects in lisa because lisa's latest commit is a ancestor commit of master, and all the conflicts have already been solved in master.
$ git checkout list
$ git merge master
- We learned by now that Git is frugal, it doesn't like waste, so it can spare a commit and just do this instead. It moves lisa to point at the same commit as master. So Git didn't have to create a new commit. This trick happens all the time in practice. It's called a fast-forward. Whenever you see this message on the screen, this is Git bragging about being able to spare a few objects in the object database and making your project's history less complicated.
-
Let's discuss second special case of a merge.
-
I will checkout master and forget about the lisa branch for a while.
-
HEAD is a reference to a branch, which in turn is a reference to a commit. When you checkout a branch, that means you are changing HEAD; however, you can also do something different. You can directly checkout a commit instead of a branch. I will checkout this commit. I will just use the commits SHA1.
$ git checkout ecbe
There. Now if you look inside HEAD, it's not pointing to a branch. It's pointing directly to a commit. And indeed there is no current branch at all. We're not on branch. This is a situation that is called detached HEAD.
- How is that useful in practice? Let's make some experiments in the Apple Pie recipe, something that I'm not sure I want to keep around. There it's good with 9 apples. It must be even better with 20, right? And I will commit this.
$ vim apple pie.txt
--20 Apples added, 5 Oranges Added--
$ git add apple pie.txt
$ git commit -m "Updated ingridients to make it more tastier"
-
What happens when I commit? Well, in this case Git cannot move the current branch as usual. There is no current branch, so it will track the latest commit by moving HEAD directly. HEAD is working exactly like branch here.
-
Okay, let me hack in a few more changes. Let's make the pie sugar free. It's healthy.
$ vim apple pie.txt
--20 Apples added, 5 Oranges Added, Remove Sugar--
$ git add apple pie.txt
$ git commit -m "Updated ingridients to make it more healthier"
Another commit, another HEAD movement. Okay, now let's say that we've had enough of this. I tried cooking an apple pie with all these extra apples and no sugar. It tastes like cooked apples. I don't like that, so we'll abandon the experiment. I will checkout master again.
$ git checkout master
-
Okay, now HEAD is back where it belongs on the master branch. So are our files. Everything is business as usual. There, we rolled back the latest two commits. But there is a nagging question here. What happened to these commits? Well, they are still in the object database somewhere together with all their trees and blobs, but unless I took note of their SHA1s, these commits and their connected objects are now unreachable.
-
They cannot be reached by starting for a branch or a tag and walking the objects in the database. They are effectively isolated. I can only reach them directly by their SHA1s, and I'm bound to forget those too.
-
If you have an experience with object related languages, then you know what happens to an object when it can't be reached by any reference. It gets garbage collected. At some point the system decides that the object is wasting precious memory, and it will delete the object and recover the memory. Well, this is exactly what happens in Git.
-
Every now and then in the course of other operations Git decides that it's worth running a garbage collection. The garbage collector will look for objects in the database that cannot be ultimately reached from a branch or HEAD or a tag, and it will remove them to save disk space. Remember, each object is just a file in the object database, so removing them is as easy as deleting those files.
-
So these commits I created will likely stay in the database for some time and then disappear. If I want to save them, I must act now. How do I do that? One thing that I can do is move back to the last commit. I can still do it because I have their SHA1s here and the garbage collector didn't run yet, so these objects are still in the database.
$ git checkout 7160d61
There, that was a last minute save. And now that I have the commit, I can put a branch on it.
- Here, let's create a branch called
nogood
. Now I can checkout master again, and this time around the commits are safe.
$ git branch nogood
$ git checkout master
There is a branch now that acts as the entry point to this section of the object graph, so these object will never be garbage collected. And I can easily get back to them by checking out nogood if I wish.
- This is a common way to use a detached HEAD. When you want to try out something, go down maybe two, run a general experiment with your code, you don't have to leave behind the convenience of using Git. You can detach HEAD, do your experiment, still commit the experiment as much as you wish so that you won't lose data, and then you decide whether to keep the experiment or to do away with it. Just remember to put a branch on the stuff that you care about before you leave it behind.
-
A Git repository is a bunch of objects linked to each other in a graph, they can be commits, blobs, trees, or tags.
-
Then there are branches that are references to a commit.
-
Finally, there is HEAD that's also a reference, but there is only one of it, and it marks our current position in the graph. It's usually pointing to a branch, but it could also be detached and pointing directly to a commit.
-
Then there are a few rules.
-
First rule. The current branch tracks new commits. So if you create a new commit by saying
git commit
orgit merge
, for example, then the current branch moves to the new commit. If you are in detached HEAD state, then HEAD itself moves to the new commit. -
Second rule. Your working directory is updated automatically. When you move to a commit, for example with git checkout, Git replaces the content of your working directory with the content that can be reached from that commit.
-
Rule three. Any commit, blob, or tree that cannot be reached from either a branch, HEAD, or a tag is considered dead and can be garbage collected.
-