-
Notifications
You must be signed in to change notification settings - Fork 1
History Fixing Mistakes
-
We'll see how to edit project history. Now of course, you are already editing your project history in a way. Every time you commit, for example, you are editing something to the history and every time use rebase your changing your history.
-
You should never
rebase
shared commits. That is, once you push the commit to a shared repository, from that moment on, you should avoidrebasing
that commit. That's because arebase
is a command that changes history. It copies all commits to new commits. The new commits might look the same as the old ones, but they are actually different objects in the database, and as a result, if yourebase
commits that other people have in their repository, there you can create a lot of confusion. -
You can introduce conflicts that cannot be fixed without a lot of manual tweaking, so in general, you shouldn't do that. It's all fine when you change your own local history, but changing shared history is not going to win you any friends on your team.
- The first way to change history is about fixing the latest commit. We're in the cookbook project on the master branch and let's add the new recipe to the
menu.txt
. Let me stage this change and commit it.
$ git add menu.txt
$ git commit -m "recipe added: Ceasar Salad to menu file"
- Now after committing, I realized that I didn't quite finish the job. The rules of the cookbook say that whenever I have a recipe in the menu, I also need the mentioned file in recipe's folder. I don't have that file for Caesar salad. Let's create that file and populate it with a few ingredients. And I stage it.
$ git add recipe/README.txt
-
So now I'm in the process of fixing the problem, but I don't want to fix it by just creating yet another commit. That's what I figure out. I will end up with two separate commits where the first commit could still have a menu item without a corresponding recipe. That's an inconsistent state for my cookbook. It's the equivalent of code that doesn't compile in the repository. And the second commit would fix that, but instead, I would like my history to be cleaner than that.
-
I would like to go back and fix my latest commit and that is file to the latest commit, so I would have only one commit and that one would be good and clean. I can do that by amending the commit, only takes an additional command argument to my commit command.
$ git commit --amend -m "recipe add: Ceasar Salad to menu file plus readme file"
-
I'm not creating a new commit from scratch. Instead, I'm amending the latest commit and now this commit will include both the modified menu and the new recipe file.
-
I save and quit this message file and let's see what is happening with the diagram. Look at the current commit
5de2465
. That's the commit we're amending. Git cannot really change this latest commit, commits are immutable. What Git is really doing when I finish amending this commit is it copies the current commit to another commit that also includes all of my amendments, the new file and the edited message. This is a brand new object with a new hash. Then Git moves the current branch to point to the new commit and the old commit will eventually be garbage collected in the future while the new commit is staying there.
-
So amending the commit is a history changing operation. It's like a very small rebase. In this case, I amended a commit that I hadn't shared yet. I'd never pushed it to a shared repository, so it was okay to change it. And there we are.
$ git push
- If we look at the log, the commit has changed. This new commit has the new message that I fixed, and if we look at the details, we can see that it includes both the updated menu and the new recipe file and that's commit minus my amend. It quite useful because whenever you want to fix something, more often than not, it's something in the very latest commit.
-
Requirement of project says that *for each line in the
menu.txt
likeapple pie
here, you want to mention file in the recipe's directory likeapple_pie.txt
.
-
There are two items in the menu, cheesecake and chicken tikka masala that have no mention files in the recipes folder. So what if I want to fix these mistakes? If the mistakes have happened in the latest commit, then I could just
amend
them, but that's not the case unfortunately. Let's usegit log
to see when these lines were added to the menu.
-
So here is the commit that added chicken tikka masala to the menu. It was a few commits ago and here is the commit that added the cheesecake. This one happened way back in the past, one of the very first commits. Both of these commits broke the rules. They added items to the menu without adding mention recipes in the recipe's directory.
-
Now we have to be a bit careful. Look at the position of the remote branches here in the log. In particular, the remote master branch and origin (highlighted in blue). This our shared repositories. Here is where the master branch on GitHub is, or at least where it was the last time we communicated to the project on GitHub. So these two commits are both wrong, but this commit happened after the last time I communicated with GitHub. While this other commit happened way before that, it's already been shared, so I don't want to change the cheesecake commit because of the golden rule. I will just decide to live with it for now. Instead, I will fix the tikka masala commit to which is not shared yet. It's still a local commit in my repository, so I can change it.
-
So long story, short, here is the plan. I will fix this commit so that it includes a new recipe file that matches this menu item, tikka masala. Let's do it in two stages.
- First, I will create this recipe file and commit it as a brand new commit.
- Now second, I will change my project history so that this commit and the old tikka masala commit get squashed together in one single commit.
-
The first part is easy. Let's create a new recipe.
$ vim recipes/chicken_tikka_masala.txt
$ git add recipes/chicken_tikka_masala.txt
$ git commit -m "Adding missing chicken tikka masala file"
-
And that was the first part. Now let's finally get to the important point. I need to edit my history and do some serious surgery on it. How can I do that? This is where I'll show you one of the most powerful commands in Git and strangely enough, it happens to be just a different flavor of a command -
git rebase
. Forget what you know about the standardrebase
style. If you do agit rebase --interactive
or simply-i
for short, thenrebase
seizes to be a normalrebase
and becomes a super powerful history editing command. -
I need one more argument for this to work and this is the reference to a commit. I can use the latest commit that was shared here, the one pointed at by the remote origin master branch. This means, let me edit history from this commit excluded or worse.
$ git rebase -i origin/master
-
And here we are in text editor doing an
interactive rebase
. We have a list of commits here and the order of the commits is the opposite of the log order from the least recent to the most recent.
-
Here is what
interactive rebase
is about. What we're doing here is essentially we're writing a computer program. Here is the program: The program runs on the current commits in the history and the output of the program is a brand new history. The first award in each line of the program is an instruction that applies to a commit and it tells Git what to do with this specific commit. -
Right now, every line is a pick. That means, just take this commit. So if we executed the program as it is now, Git would just compose the new history by picking all these commits one after the other, which means that the new history would be exactly like the history I have now. Nothing would change.
-
But we do want to change this history, so let's change the program. For example, before even looking at the tikka masala commits, look at this commit here -
5be5356
. It's a valid commit, but it has a weird message that doesn't really match its convention. I want to change this message. And if you read the comments down here, you will see that the instruction to change a commit's message is reward. So let me change this pick instruction to reward. Notice that I'm not rewarding the message just yet. That will happen later when the program runs. -
First, I want to change the order of these commits. I will cut this commit here
2c74ea2
, the latest one, the one where we added the recipe for chicken tikka masala and paste it right after the older commit that added tikka masala to the menu,80f2a48
, and also, I want to squash this commit and the previous commit together and make them one single commit. -
We have a few commits about guacamole here and you might remember that these commits involve some branching and merging and I think that's unnecessary. I would like to squash all this guacamole stuff into a single commit. So we have two squash instructions in a row and that's it. The program is done.
-
Let's exit and save, and at that point, Git starts executing this program. Let's start at the first instruction. It's a pick, so Git is just picking this commit. But then, the second instruction is a squash, so Git has to squash these two commits together and that requires a decision from me, and in fact, if you look back at the terminal, Git just stopped the
interactive rebase
and is asking me I have two separate commits and you want me to squash them into one commit, but what should be the commit message of this new commit? I can see both messages here on the original commits plus commit lines and empty lines that are ignored. I would just pick one of these messages as the new message. And this other line goes away. And as soon as I say, Git squashes the two commits together creating a brand new commit.
-
The first instruction is a pick, so Git can just pick this next commit and add it to history. Actually, you cannot literally add this commit to the history because that commit includes a link to its parent and Git cannot change that link without changing the entire commit and commits are immutable, so this is once again the brand new commit, a copy of the existing commit. By the way, this step doesn't require any intervention from me, so Git just does the pick and moves to the next instruction.
-
Next, we have a reward and Git stops again. Now it's asking me to change the commit message. That's what the reward is all about. Okay, let's do that and save.
-
Then we have the most complicated sequence of instructions so far, a pick followed by squash and then yet another squash. This means that these three commits must be squashed all together. This doesn't seem hard. it seems to be the same as squashing two commits together, but this time we have a problem. Git stops and complains about a conflict. What conflict is this? Well if you remember what happened a couple of moments ago, back then, we had a merge in the guacamole recipe that resulted in conflict and we fixed the conflict by hand. Now what we're doing is we're getting rid of that merge, including the resolution of the conflict and we're squashing everything into a single commit instead. So we have to go over that old conflict again and solve it again, this time, for the sake of the interactive rebase instead of the merge. And once again, I reopen the guacamole recipe, put tomato and onion in the lines, solve the conflict, and then I can continue the rebase as Git is suggesting here.
-
Oh sorry, I forgot to tell Git that I solved the conflict, and just like in the merge, I can do that by adding the file to the index and now I continue again. There, now we've solved the conflicts and we're finally squashing these three commits together. We pick one commit message and we continue.
-
And after this, we have instruction eight, which is just a pick, and nine, which is also a pick. So Git can just copy these commits to the new history. And now that this last operation is done, Git can move the current branch to the new history and leave the old history behind for the garbage collector and we're done. The interactive rebase is finished.
-
If I look at the log now, I can see my brand new history. We wielded a lot of power here. You can do a lot of stuff with interactive rebases. We've seen reordering commits, squashing commits, rewarding them, but you can also remove commits by just deleting them from the program and even split a commit into multiple smaller commits. Interactive rebases are really powerful and still easy to do as you've just seen.
-
Actually, they are so easy that I make them a standard part of my workflow. Here is what I do. When I'm working on a project, I commit early and often. I commit, commit, commit all the time, every few minutes and most of my commits are half broken. They have temporary commit messages, maybe titles in the message, I don't care much about that. By committing all the time, I'm sure that if I make a mistake, I can immediately back trace to a previous state of my code. It's like having an undo operation always available as I write code. Then once I'm happy with the state of my code and I feel ready to share it, typically before pushing to origin, I stop and do an interactive rebase. I clean up my history. I refactor my history so to say.
-
It's like the refactoring that many of us do to our code after we make it run and before we commit it so that it's nice and clean and ready for production.
-
Whenever you do anything that changes history like an interactive rebase, for example, or even something as simple as a menu commit, Git has to copy information from all commits to new commits. The new commits might look like the old commits, but they are not the same objects and the old commits are left behind and usually they are unreachable. There is no branch or tag pointing at them anymore. So they would stay in the object database for a while until Git eventually decides to garbage collect them.
-
Now what if I change my mind and I want to recover one of those objects. This is not a common situation, but sometimes it can happen. For example, what if I do an interactive rebase and delete the commit by mistake. Now that commit and its associated data, they're not in the history anymore. I know they're still somewhere in the object database, but I don't know their hashes anymore, so I cannot recover them, so what do I do now?
-
Well the good news is there is a very easy way to recover the hashes of abandoned objects. Every time a reference moves in the repository, Git locks that move. For example, when you checkout the branch, you are moving the head reference, so Git is logging that.
-
Let's checkout the spaghetti branch and then let's checkout master again. There, I just moved the head reference twice.
-
Git logged those movements into something that is called the reference log or the ref log for short and I can look at the ref log with
git reflog
and then I can give it the name of reference. Let's look at head. There we are.
$ git reflog HEAD
-
Look at the first two lines of the ref log. It tells us that the last two changes to had were from master to spaghetti and then from spaghetti to master again. And if you keep reading, you can see all the changes to head that happened when I did the interactive rebase earlier on. And even earlier, when I amended the latest commit, when I checked out branches, when I create a new commit, when I raise branches a few moments ago, and so on.
-
And this information counts with the hashes of the objects that head was pointing to. So for example, this commit here -
5de2465
is not in the history anymore. It was amended and replaced by a following commit -5127436
. But until it gets garbage collected, its hash is staying in the ref log, so we can still look at it either by reference in the hash directly or by using this syntax here, which means the 15's previous position of head. There it is.
And if you can see it, then you can recover it. For example, you can put a branch on it and then it's not an abandoned commit anymore. -
Just to make it clear, the information in the ref log is strictly local information. This ref log belongs to this repository and this repository alone. If I clone this repository again to another directory, then I will get a different ref log. But when it comes to this repository, every time head moves, Git is going to log it here. And the same goes for other references, such as the master branch.
-
There look at it moving around here all the way back to the moment I cloned this repository from GitHub. That's where the local master branch was created. And that's it about the ref log command. Hopefully, it will make you feel a bit safer when you're using it.
-
We have seen a few operations that are truly irreversible and destroy data, but those operations usually destroy in your working area and maybe the index. When it comes to the repository, you can usually recover all the objects you left behind thanks to the ref log.
-
Let's move to another branch, the lisa branch, and let's look at log. If we look at it, we can see that it added a new item to the menu, cheesecake, but it doesn't add the mention a recipe for the cheesecake in the recipe's folder. See no cheesecake recipe here.
-
I want to fix this situation for good. I want every menu item to have a mention recipe, but in this case, I have the item, but I don't have the recipe. So I can either finally add the recipe for the cheesecake or I can remove the cheesecake from the menu altogether. And after thinking about it, I decided that I would rather do the second. I want to delete the cheesecake from the menu. This commit is wrong.
-
How many ways to we have it deleting it?
- The first way would be nuclear option. Go forth and make a big interactive rebase that is based on the very first commit in the history and remove this commit that adds the cheesecake while you do the interactive rebase. This is exactly what I don't want to do over because it would create an entire line of new commits in our new history and it would change a shared history, so no, let's not do that.
- Second way to do it, add it to menu, remove this line from the menu, and create a new commit. The old broken commit stays in the history, but at least it gets fixed later on. I can do that.
-
But instead of doing it by hand so to say by manually editing this file, I can ask Git to do it automatically with a command called
git revert
. All I have to do is to say look Git I want to revert this commit here -5720fdf
.
$ get revert 5720fdf
-
What happens is that Git automatically creates a new commit that contains changes that are exactly the opposite of the changes in this original commit, which is very useful if the changes are not just one line as in this trivial example, but say hundreds of lines of code.
-
I can even revert multiple commits at once. But in this case, it's just once more, one line commit. And indeed, I have a new commit here with a nice message that explains exactly what is happening and the only thing it contains is the reverse of the cheesecake commit. It removes this line.
-
One more thing, I told you that revert or reverse changes in a way that is completely safe, in a sense that it doesn't touch existing commits. It just adds new commits. But this behavior has a catch. All that revert can do is revert your data by writing the opposite data, but it can't revert your structure of changes. For example, if you try to reverse the merge, then revert can remove all the data that was added by the merge, but it cannot remove the merge commit itself. The merge is still there and this can cause some confusing situations down the line, especially if you try to merge again after that.
-
Be careful when you revert merges. It's a special case and you should be aware of how to deal with it specifically. In other words, don't think of a revert as if it were a generic undo operation in Git. In fact, the closer that Git has to undo operation is probably a reset when you use it to move a branch back to where it was in the past. Revert is much more narrow in scope. All that it does is write a new commit with new data that is the opposite of existing data.