Git is a source control system in wide and growing use. It began life as something that only serious programming geeks used, but it’s popularity has grown rapidly, due in part to GitHub, which uses git new and baffling ways to do things that git was almost certainly never intended to do.
Perhaps it is true that git is called ‘git’ for the following reason:
Linus Torvalds has quipped about the name “git”, which is British English slang for a stupid or unpleasant person: “I’m an egotistical bastard, and I name all my projects after myself. First Linux, now git.”
But then again, perhaps not. There are other stories out there on the internet. It is true, however, that Linus Torvalds is responsible for git and Linux.
The purpose of git is to allow a large group of generally anti-social programmers to work together on very large and complicated projects. This is a hard problem socially as well as technically. With lots of smart people programming a lot and communicating a little, things can easily wind up not fitting together. Add to that the need to always have a current “production” version ready for users, and you have something like the Challenger launch every day.
While git’s main purpose is as a tool for collaboration, it turns out to be quite useful even if you only collaborate with your own bad sel – and in combination with Rstudio, it’s potentially awesome. Unfortunately, at present the awesomeness of the Rstudio+git nexus is only 56.3 percent realized. Git is fundamentally a command line tool and Rstudio has yet to figure out how to implement in its GUI the (vast) majority of git’s powers. It is thus both beneficial and unavoidable that we become familiar with command line git – as well as Rstudio git.
Even though our goal is to use just a small subset of git’s features, we are still going to have come to terms with some nomenclature and ideology.
Or there may be just one repo It if the project is your own private thing. If there are many repos of the same project, then users can “push” and “pull” updates back and forth in order to share and absorb each other’s code. It is common to treat one particular repo as the central repository but technically, all repos are created equal.
‘’commit’’ : (noun) A snapshot of point in time in the history of your project. (verb) to use the command commit, to create a commit. You explicitly “commit” when you are at a point that you might want to visit again. How often you commit is a matter of style.
‘’branch’’ : (noun) an alternative parallel history. (verb) to create a new alternative history. You create a branch when you want to explore an intellectual cul-de-sac or experiment with some new feature that may or may not later become part of your project. If you project is a dissertation chapter, you might create a branch to work on a related journal article. Of course every project must have at least one branch generally called “master”.
‘’clone’’ (verb) to create a fresh copy of an existing repo onto your local machine. Once cloned, the new repo is equal in git’s moral universe to the original. However, it’s a little more complicated than that in terms of branches.
‘’checkout’’ (verb) to make current an existing local or “remote” branch. A remote branch is one that exists in another repository – almost always the repo from which your repo was originall cloned. Changes you make to a “remote” branch don’t really happen. You can easily create a local branch from a remote branch if you want to work with it for real.
‘’pull’’ (verb) the command by which the owner of a repo includes work from other users repos into her own. The pull command merges committed files from the remote into the local repo. ## Time travel with git
The point of git is to be able to reconstruct your work at any point in time. Even though this sort of time travel only goes backwards – its’ still much more complicated than it sounds – IF you have a bunch of collaborators who are all working on different aspects of the project, it just will not do to casually erase things that happened in the past. Git knows how to deal with this – but since we’re keeping it simple and working on our own, this is not a huge problem – just something to keep in mind.
in what follows, we’ll learn to move backwards in time several different ways
Rstudio works great with git but it can be rather confusing if you don’t already understand how git works. As noted earlier, git is really a command line tool. Rstudio presents us with menus, but it just translates those menu choices into subset of git command line calls– It’s kind of like the safety guards on power tools – yes they keep your fingers attached, but every now and then they also prevent you from doing useful things. Fortunately because Rstudio provides an excellent terminal window we can have it both ways.
When Rstudio completes the construction of your new project you should note the following:
Since I promised you that this repo holds the works of your ancestors, you might expect that it would contain some actual files. And in fact it does but as the README tells you, you’ll need to figure out how git uses “branches” in order to find the easte eggs.
The aRt repo has several branches one is local the others are remote:
In a terminal window type:
git branch -a
to see a list of remote and local branches. Hopefully there is only one local branch shown in green with the asterisk (*) of currency. That would be cohort40. It’s pretty much empty as the Files tab indicates.
Now in the terminal window type:
git checkout origin/cohort39
Dramatic changes to your Files tab should suddenly take place along with some useful information in the terminal window.
Until now in your life, your projects have seemed to have only one path along which progress has been made. Sometimes science actually works that way, but more often, in the pursuit of truth, we find it necessary to explore many alternate high reward – low probability paths. The way you have generally done this in the past is to comment out a bunch of stuff in one or more files and maybe have some if statements that allow your code to execute differently. Then you run it this way and that way and decide which is better and then maybe you comment other things out and uncomment the stuff you commented and so on. Ugly code results; and mistakes creep in.
A neater, cleaner and more elegant approach is to use git’s branching capability (that we just saw the evidence of in our aRt repo) in order to allow your project’s history to follow more than just one path. branching makes it easy to abandon dead ends and go back – or to take off on a new path and never look back.
Since commits are generally meaningless in remote branches such as we have explored, let’s create a new ’‘’empty’’ branch in order to experiment with git. When you create a new branch the files in your current branch become part of it. WE don’t want to include cohort39 in our new branch so first we’ll make the nearly empty cohort40 our current branch.
git checkout cohort40
(note the disappearnace of many files)
create new branch …
git branch experimental
now make the new branch current…
git checkout experimental
remove the extraneous files from the branch
git rm README.txt
commit the change
git commit -m “deleted README.txt”
Ooops let’s get that file back.
git log
git checkout asd86087x6d0vasasd8asdcoase7 – README.txt
where asd86087x6d0vasasd8asdcoase7 is the “commit hash” that is indicated from the output of the git history command.
Suppose that you suddenly realize that everything you’ve done for the last month is a waste of time. But you’re not certain. Going back in time is appealing in this situation, but you want to make sure that you’ll be able to go forward in time again – if you turn out to be wrong about having wasted all that time.
For this, git has the command ‘revert’ (quite different from “revert”" as Rstudio uses it).
In order to do this cool thing, we’re going to have to type some commands at the command line. Let’s do something like the following together and figure out what it does.
In a terminal window
git status
git log
git log —oneline
git revert HEAD
git revert
What’s happened now is that one “commit” at a time, we have rolled back change and created a new commit which is just like the previous one. In other words, the state of the project is identical to what it was two commits ago–but there are new commits in the project’s history that indicate what we just did and allow us to go back to the state before we went back in time.
For lone ranger scientists, branches are best thought of as cul-de-sacs to explore. If, however, your project is a collaboration, then while you’re off working on your branch, your colleagues might continue to work on the “master” branch. This sort of thing happens in software projects all the time. With progress happening on different branches at the same time, it can be tricky to put all that progress back together into something that makes sense. Git has a “merge” process for taking care of this even when two or mare contributors have changes the same file. It’s not a simple automatic thing, and we’re not going to do it this week– but now you know about it.
The second part of this week’s effort – and the last assignment of this class is to produce a work of lasting beauty for our second floor aRt gallery. The only rules are: