Git is a source control system in wide and growing use. It began life as something that only serious programming geeks used, but it’s popularity has grown rapidly, due in part to GitHub, which uses git new and baffling ways to do things that git was almost certainly never intended to do.

Perhaps it is true that git is called ‘git’ for the following reason:

Linus Torvalds has quipped about the name “git”, which is British English slang for a stupid or unpleasant person: “I’m an egotistical bastard, and I name all my projects after myself. First Linux, now git.”

But then again, perhaps not. There are other stories out there on the internet. It is true, however, that Linus Torvalds is responsible for git and Linux.

The purpose of git is to allow a large group of generally anti-social programmers to work together on very large and complicated projects. This is a hard problem socially as well as technically. With lots of smart people programming a lot and communicating a little, things can easily wind up not fitting together. Add to that the need to always have a current “production” version ready for users, and you have something like the Challenger launch every day.

While git’s main purpose is as a tool for collaboration, it turns out to be quite useful even if you only collaborate with your own bad sel – and in combination with Rstudio, it’s potentially awesome. Unfortunately, at present the awesomeness of the Rstudio+git nexus is only 56.3 percent realized. Git is fundamentally a command line tool and Rstudio has yet to figure out how to implement in its GUI the (vast) majority of git’s powers. It is thus both beneficial and unavoidable that we become familiar with command line git – as well as Rstudio git.

Even though our goal is to use just a small subset of git’s features, we are still going to have come to terms with some nomenclature and ideology.

Or there may be just one repo It if the project is your own private thing. If there are many repos of the same project, then users can “push” and “pull” updates back and forth in order to share and absorb each other’s code. It is common to treat one particular repo as the central repository but technically, all repos are created equal.

The point of git is to be able to reconstruct your work at any point in time. Even though this sort of time travel only goes backwards – its’ still much more complicated than it sounds – IF you have a bunch of collaborators who are all working on different aspects of the project, it just will not do to casually erase things that happened in the past. Git knows how to deal with this – but since we’re keeping it simple and working on our own, this is not a huge problem – just something to keep in mind.

in what follows, we’ll learn to move backwards in time several different ways

  1. To discard what we have done in a particular file since the most recent commit (easy in Rstudio)
  2. To “revert” the entire project to a simpler time (aka “commit”) – but without destroying the history of all the stupid things you have done since that simpler time.
  3. To start over again by visiting a past “commit” and creating a new future (“branch”) from it.

Rstudio and git

Rstudio works great with git but it can be rather confusing if you don’t already understand how git works. As noted earlier, git is really a command line tool. Rstudio presents us with menus, but it just translates those menu choices into subset of git command line calls– It’s kind of like the safety guards on power tools – yes they keep your fingers attached, but every now and then they also prevent you from doing useful things. Fortunately because Rstudio provides an excellent terminal window we can have it both ways.

Initializing a git repository for an Rstudio project

  1. If you have not already done so, launch rstudio.demog.berkeley.edu and be in a project and “clone” a project containing aRt works from your ancestors.
  • File -> New Project -> Version Control -> Git
  • Repository URL: /hdir/0/carlm/aRt

When Rstudio completes the construction of your new project you should note the following:

  • A new menu choice:
  • A file called README.txt explaining what the repository is for.

Since I promised you that this repo holds the works of your ancestors, you might expect that it would contain some actual files. And in fact it does but as the README tells you, you’ll need to figure out how git uses “branches” in order to find the easte eggs.

Exploring the repository

The aRt repo has several branches one is local the others are remote:

In a terminal window type:

git branch -a

to see a list of remote and local branches. Hopefully there is only one local branch shown in green with the asterisk (*) of currency. That would be cohort40. It’s pretty much empty as the Files tab indicates.

Now in the terminal window type:

git checkout origin/cohort39

Dramatic changes to your Files tab should suddenly take place along with some useful information in the terminal window.

Creating branches

Until now in your life, your projects have seemed to have only one path along which progress has been made. Sometimes science actually works that way, but more often, in the pursuit of truth, we find it necessary to explore many alternate high reward – low probability paths. The way you have generally done this in the past is to comment out a bunch of stuff in one or more files and maybe have some if statements that allow your code to execute differently. Then you run it this way and that way and decide which is better and then maybe you comment other things out and uncomment the stuff you commented and so on. Ugly code results; and mistakes creep in.

A neater, cleaner and more elegant approach is to use git’s branching capability (that we just saw the evidence of in our aRt repo) in order to allow your project’s history to follow more than just one path. branching makes it easy to abandon dead ends and go back – or to take off on a new path and never look back.

Creating a new (empty) local branch

Since commits are generally meaningless in remote branches such as we have explored, let’s create a new ’‘’empty’’ branch in order to experiment with git. When you create a new branch the files in your current branch become part of it. WE don’t want to include cohort39 in our new branch so first we’ll make the nearly empty cohort40 our current branch.

git checkout cohort40

(note the disappearnace of many files)

create new branch …

git branch experimental

now make the new branch current…

git checkout experimental

remove the extraneous files from the branch

git rm README.txt

commit the change

git commit -m “deleted README.txt”

Ooops let’s get that file back.

git log

git checkout asd86087x6d0vasasd8asdcoase7 – README.txt

where asd86087x6d0vasasd8asdcoase7 is the “commit hash” that is indicated from the output of the git history command.

How about git with Rstudio menus

Making a commit

Every Rstudio mediated git operation, not surprisingly can be done under that new git menu.

  1. Click on the git menu item and behold … a new window appears.
  • click on all the files present in the left panel. These should include .gitignore and your .Rproj file
  • type a clever comment in the message box on the right.
  1. Hit the “commit” button and see the message box that informs you that git has guessed your email address and is keeping track of who you are.

Now that we have some commits, we can experiment with git’s ability to show you past versions of files.

Select git -> History and see what you get.

Add a few more commits to make this interesting

  1. Create a new file or two and type silly but identifiable things in the
  2. Select git -> commit and create a new “commit”

Do this again a few times so that have a repo with several “commits”

‘reverting’ a file (Rstudio)

  • Having ‘’just completed a commit’’, edit one of the files in your repo – just type a few lines.
  • Save the file.
  • Now think better of those lines and decide that you would like this file to return to it’s state as of the most recent “commit”.

Notice that you cannot simply Edit->undo (or maybe you can but suppose that you can’t because what you’ve done since the last commit is as complicated as it is pointless).

Notice also, that ‘revert’ as used by Rstudio is NOT the same as ‘revert’ in git.

Going back in time without losing track of the things you have done in mean time.

Suppose that you suddenly realize that everything you’ve done for the last month is a waste of time. But you’re not certain. Going back in time is appealing in this situation, but you want to make sure that you’ll be able to go forward in time again – if you turn out to be wrong about having wasted all that time.

For this, git has the command ‘revert’ (quite different from “revert”" as Rstudio uses it).

In order to do this cool thing, we’re going to have to type some commands at the command line. Let’s do something like the following together and figure out what it does.

In a terminal window

type the following git commands one at a time, with a pause for discussion.

git status
git log
git log —oneline
git revert HEAD
git revert

What’s happened now is that one “commit” at a time, we have rolled back change and created a new commit which is just like the previous one. In other words, the state of the project is identical to what it was two commits ago–but there are new commits in the project’s history that indicate what we just did and allow us to go back to the state before we went back in time.

Merging

For lone ranger scientists, branches are best thought of as cul-de-sacs to explore. If, however, your project is a collaboration, then while you’re off working on your branch, your colleagues might continue to work on the “master” branch. This sort of thing happens in software projects all the time. With progress happening on different branches at the same time, it can be tricky to put all that progress back together into something that makes sense. Git has a “merge” process for taking care of this even when two or mare contributors have changes the same file. It’s not a simple automatic thing, and we’re not going to do it this week– but now you know about it.