Skip to content

Introduction to Git

Aakash Goplani edited this page Feb 25, 2018 · 1 revision

Topics Covered

A Brief History

Advantages of DVCs

About Git

Installing Git

Configuring Git

A Brief History

We'll start with a brief history of version control so that we can understand where we've come from and how we got to where we are now. The very first version control systems were developed in the early 70s and operated on a single file and had no networking support. These were systems such as SCCS and RCS. They operated on a single file so you can have a file such as foo.c and how multiple versions of that file, but there was no correspondents between different files within a repository. There was no notion that version 1.1 of foo.c went with version 1.1 of bar.c, it could be arbitrary. So, we only had single files. This lead to the obvious innovation of having a multi-file system or the second generation and this is simplified by centralized version control systems such as CVS, Visual SourceSafe, Subversion, Team Foundation Server, and Perforce. All of these are multi-file centralized systems so you can check out into a working copy on your local system all the files necessary for particular version of a repository. Along came the third generation which are the distributed version control systems such as Git, Mercurial, Bizaar, and BitKeeper. These work on changesets. These changes sets can be shift around and both clients and servers can have the entire repository present which allows us to do some interesting things. So, you can see this gradual evolution of going from single file to multi-file to change-sets. Going from no networking such as centralized to a distributed and all the additional capabilities that get added with these new generations, a version control systems. If you want to read more about the history of version control, I will refer you to Eric Sink's article at this address.

Advantages of DVCs

Some of the advantages of the distributed version control system over a centralized one include the ability to have different topologies. If we want to use a centralized model, we still can by having developers push their changes to one central repository. This is commonly done in enterprise environments. We can also use a hierarchical model. The hierarchical model has developers pushing their changes to a subsystem-based repository and those sub-system repositories are periodically merged into a main repository. This is done in Linux kernel development because the Linux kernel is too large. There are separate sub-system repositories for graphics, networking file system and other portions of the Linux kernel. Those sub-system repositories are periodically merged with the main Linux kernel so that development can continue on its way. We can also use a distributed model where developers push their changes to their own repository and then the project maintainers will pull those changes into the official repository if they're deemed valuable. This is very common in open source projects on GitHub where if you want to contribute changes, you can fork the main repository, make your changes and then issue a pull create request to the project maintainer. Another advantage of DVCS is that backups are extremely easy. A backup is simply a clone of the repository. So, your failover strategy if something happens to your main server is very set right-forward to simply stand up another server and clone the repository to it. Another advantage of DVCS is reliable branching and merging. Branching and merging is a very straight forward operation and doesn't entail the pin that you might would be familiar with from large merges in central version control systems. This allows us to do things like feature branches or bug fix branches. So we're creating a new feature. We will create a separate branch for that and eventually merge it back into our mainline of work. This allows us to always work under version control. So, even if we got a set of changes that might not pass all of our test at the moment, we can still commit it locally so that we are working under version control and have stable rollback points and only then, push that change up to our central server or share it with the public when it's ready for public consumption. We can also easily apply fixes to different branches. So, if we've made a fix on a version one branch, we can pull it into our master or mainline work very easily by taking that patch and applying on to different branches. DVCSs make this operation easy. We've also got full local history for the repository which allows us to do some very interesting things like computing repository statistics on our local machines very quickly. We can also analyze regressions. Most DVCSs have the notion of a bisect command which will search the repository looking for where a bug was introduced. So, you can search back through your repository and find it where bugs got introduced which will give you additional information on how to fix it. What change actually caused that bug to be appeared in the codebase because it's not necessarily the last change that introduced the bug, it might not be found for a while. So, there are some interesting things that you can do by having a full local history present. Most operations in the DVCSs are local operations. You can also introduce new ideas such as using your version control system for deployment. Heroku does this. You can actually do a git push heroku prod_branch and what this will do is it will push to a server on heroku and it will push a local branch which I'm calling prod_branch here up to heroku. What this is going to do is push our changes out and then heroku will look at the repository and deploy our solution from there. So, we can use DVCSs in new and interesting ways for doing reliable branching and merging, different server topologies, computing repository statistics, performing deployments. There's a lot of interesting things that can be done because we have a complete repository at our fingertips.

About Git

Git was created by Linus Torvalds, who is also the creator of Linux. Git's creation was prompted by the Linux-BitKeeper separation. BitKeeper is a commercial DVCS that was used by the Linux kernel team from 2002 to 2005. When BitKeeper decided to stop supplying the Linux kernel team with free licenses for BitKeeper, Linus started up the git project in 2005 to create their own DVCS. It's written in Perl and C and runs on a wide variety of operating systems including Linux, Mac OS X, Windows, and many other commonly used operating systems available today. It's main design goals include speed, simplicity, strong branching, and merging support, a fully distributed nature, and for to scale well for large project. Remember, this was designed to be used on the Linux kernel which is a very large piece of software.

Installing Git

Windows Let's look at how we can install git onto a variety of operating systems. First up is Windows where I would recommend using the msysgit project. Let's install git onto Windows. We are going to use the msysgit project and I can go to the Downloads tab and I'm going to download the very latest, so this is the full installer for Git 1.7.10. So we will download that to our downloads folder ( Pause ) and launch the installer. ( Pause ) Let's walk through the setup wizard and some of the options that you are going to want to set. We will select the default install folder and I personally do not like to have it on the Desktop and I don't really need git in the quick launch either. We can choose to have Windows Explorer Integration, but if you want Windows Explorer Integration I would recommend looking at Git Extensions. Stay away from tortoisegit as that is an older project and much more closely mimics towards SVN. It doesn't expose the full power of git. If you want Windows Explorer Integration, I would highly recommend looking at Git Extensions. We will leave the rest of these options on and I'll choose to install the TrueType fonts for all console Windows. Program group is fine. Now, this is the important menu item. I can adjust my PATH to include git commands so that I can use a normal command prompt in PowerShell. By default, we only allow it and Git Bash. I personally don't mind including some Unix tools on my Windows Command Prompt. This is only going to replace, for instance, the find command which isn't commonly used in Windows anyways. It replaces it with the much more powerful Unix version which I think is a good overall change. So, I'm going to say, yes there's a big red warning, but I'm going to run both git and the Unix tools from Windows Command Prompt. We can choose what line-endings style. By default, git only has line feeds in the repository. In Windows, we both used both carriage returns and line feeds since we know line-endings. So, I can choose which way that I want to deal with, line-endings. Some people will advocate, checkout as-is and commit as-is so that means that your repository will have carriage return line feeds. It really depends on whom you're sharing with. If this is going to be a cross OS project that is going to be buildable on both Windows and Mac and Linux, you want to use the first option. If you are going to only be working on Windows, you do have the option of this, checkout as-is, commit as-is. These days, I would recommend the first option which is the default. It would go ahead and install git. ( Pause ) All right now, that's done. I will click finish and I will bring a PowerShell. ( Pause ) PowerShell is here so I can type git version and see that msysgit 1.7.10 is in fact installed. If I want to change to my code directory, I can now say, make a directory called test, change to test and do a git init to create a repository and now I successfully create a repository which assures me that git is now working on the system.

Mac OSX If you're using a Mac OSX, you can use homebrew to install it using brew install git. If you're not using homebrew, you can also download a DMG package which will allow you to install git onto your system. On OSX, installing git using a homebrew is very straightforward, by doing a brew install git. ( Pause ) To verify that it was successfully installed, we can do a git version and see that the correct version is actually now available at our command prompt. If you're using Linux, you can use apt-get install git-core on Debian and Ubuntu distros

Linux or yum install git-core on Fedora. Most other package managers have git available. You'll just have to check your distros for instructions. ( Pause ) Let's see how we can install git onto Ubuntu. I'll do a sudo, apt-get, install and git-core and agree to that. If I now do a git version we can see that git has been installed and is now ready to use.

Configuring Git

Now that git is installed on your system, let's look at how we can configure it. Git provides 3 different configuration stores. The first of this is the system-level configuration and it's stored in /et cetera/gitconfig or if you're on Windows in program files/git/et cetera/gitconfig. This git configuration applies to the entire computer that is installed on. And you can access it by using gitconfig-- system. The second level is user-level. We use the git config-- global, it's global for a particular user and it's stored in the user's home directory in a file called .gitconfig. The last is a repository-level configuration. You access this by using git config without any specifier and it's stored in the .getdirectory/configfile in each repo. ( Pause )

Let's see how we can configure git. Right now I don't have a configuration file. It doesn't exist yet. It is not very common to modify the system level with git config, but much more common to modify the global or user-level git config and the repository based one. So, let's start off with a git config and I'm going to just ask it to list of-- I'll say, global list of all global options. So it says, that file does not exist which we already knew. I now going to do a git config and I'll give it the global option 'cause I'm going to configure some global user options of configured global username as James Kovacs and now I will do a git config global on user.email. ( Pause ) And now if I run git list, you will see that we've got both user name and user email set. If I catch that git config file, you'll see that git config is a simple name value pair in a file and we've got a header called user with individual properties called name and email. So I can add additional properties. Some other common ones that you are going to want to set up are git config, global and set your core editor. Your core editor is the default editor that you want to use when editing commit messages or viewing diffs and other pieces of information from git. If you're an Emacs user, you can use Emacs. If you are Vim user like me, you can use Vim. If you want to use Notepad or Notepad++, all of these are possible. I also add another config option called help AutoCorrect and I'll set it to 1. What help AutoCorrect 1 does is, let's go to the git fundamentals directory and if I do a git status, a command that we'll see in just a second and I misspelled it. With AutoCorrect, it will do a fuzzy match on that command name and guess what you want to use instead. By setting it to 1, it waits 0.1 seconds before it actually executing commands. So you are basically saying do it immediately. If you set AutoCorrect to zero, it doesn't do auto correcting. If you ted to a higher number, then it will wait that many tens of a millisecond before performing that option. So, I find it helpful especially when typing quickly. If you make a minor spelling error in a git command, it will use a fuzzy match to determine which command that you wanted to use. Another option that we are going to want to set is git config and set the color of the UI to auto. What auto will do is it will use colors to show a lot of git information. So, when were doing diffs or when we're showing status, it will colorize the output. By setting it to auto, it's going to try to detect whether it's running within a script. If it's running within a script then it will not put out any output color, output escape sequences so that logs are easier to pars, but if it is detected as running within the terminal it will output the escape code to colorize the output. The last option that we're going to look at is global and core auto carriage return line feed options. So, what should git do when to type carriage return, line feeds? There's a variety of different options, we can use true or false, or input and we'll talk about each of this in turn. True, means to convert carriage return, line feeds into solely line feeds. So, when you commit to the repository, it will change the carriage return, line feed combination which is typically use in Windows into solely a line feed which is then stored in the repository. When you check those files out, it will convert those text files back. It only performs this action on text files, not on binary files so you are not going to corrupt your binary files. Another option is false which says do nothing. That means, commit-- carriage return line feeds to the repository and store them there and don't do anything when you pull them back out. If you're only doing Windows Development with git then this option is fine. But if you're doing cross-platform development you will end up having carriage return, line feed in your repository which will then end up checkout on to other platforms like Linux, BSD and Mac OS X, which don't generally use carriage return, line feeds. They use line feeds only. The last option is input, which means convert carriage return, line feeds into line feeds when you put it into the repository but don't do any conversion on the way back out. So that said, where should you use each of these options? If you're on Windows, I would recommend using true, store solely line feeds in your repository for text files and convert them to carriage return, line feeds when you're pulling at the repository. On Mac or Linux, I would recommend using inputs so that if you do happen to grab a Windows text file and it had-- it has carriage return, line feeds, it would be properly converted into a line feed only version in your repository. If you are doing Windows only development and don't want git messing around with your line-endings then you can use false. This can have consequences if you ever do check this out on a Linux or Mac system later on as you'll have the unexpected carriage return, line feeds. This is running on Mac OS X, so I'm going to use input. So let's check the result of all these configuration options. I'm going to just change back at this repository and I'll see a git config global list, and see all of the options that have been set. I can also see these options by catting my git congfig, there they are. If you're going to use diff tools, which we will talk about in module 4, there's some configuration option that you can specify for configuring your own personal favorite diff tool for performing diffs in merges. You'll just have to look at up in your-- in the documentation for your diff tool, how to configure git for it. If you can't find it within the diff documentation, often you will be able to find it on Stock Overflow or another information source. Let's change to our git fundamentals and here we have our .gitfolder and it's got a config file as well. It specifies all the information for this repository. Now, what we can do is we can say git config and I can change my user.name to something else, John Smith. And if I now do a git config list we can see that user.name has been over written to John Smith at the very bottom. So these changes are-- these config sources are hierarchical. The user-level one overwrites any system level settings and repo-level settings overwrite user-level settings. So if you did want to change how for instance, line-endings, endings were handled, you could do a git config, core auto, carriage return line feed and change it to true just for this repository. Now, if you want to renew something, you can do a git config, unset, core auto, carriage return line feed to remove that setting and we'll do the same thing for user.name. And if I do a git congif list, we can see that those settings have been stripped back out again. You can also simply edit the config files themselves, it's really up to you. So I can go down here, there's a empty heading left from those changes. So I go ahead and write that. It has the same effect. It's up to you whether you prefer doing it in a text carriage return or using the git config commands.

Clone this wiki locally