Versioning your code and data with git

Author(s) Anton Nekrutenko avatar Anton Nekrutenko
Reviewers Björn Grüning avatar
Overview
Creative Commons License: CC-BY Questions:
  • What is git

  • How is git different from GitHub?

  • What is branch?

  • How to uise git as a time machine

Objectives:
  • Get a basic understanding of git and version control

Time estimation: 1 hour
Supporting Materials:
Published: Mar 12, 2024
Last modification: Mar 12, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00425
version Revision: 1

XKCD1597.

Setup and Introduction

Lecture setup

  1. Start JupyterLab
  2. Start terminal within JupyterLab instance

Git resources

Version control

  • Manages changes over time
  • Enables collaboration
  • Provides complete history

Git versus GitHub

  • Git - version control system
  • GitHub - hosting service

Initial fundamentals

Logic

Why do we need version control? Well … to control versions and avoid mess:

Why version control?Open image in new tab

Figure 1: From tutorials by Kyle Bradbury

Main commands

Here are some of the most fundamental commands in git repertoire:

Main commands. Open image in new tab

Figure 2: From tutorials by Kyle Bradbury

Branches

A repository may have multiple branches:

Branch flow. Open image in new tab

Figure 3: From GitBetter

Let’s do it!

Clone a repo

Let’s clone a sample repo from GitHub

git clone https://github.com/nekrut/git_foo_bar.git

Check history of this repo using git log

$git log
commit ee99d64890f3b1cf0637a14269ab8738357e8dd8 (HEAD -> main, origin/main, origin/HEAD)
Author: Anton Nekrutenko <anekrut@gmail.com>
Date:   Mon Mar 11 17:38:42 2024 -0400

    Update README.md

commit de89f51d8e124665713f6fd94cd46447d172033b
Author: Anton Nekrutenko <anekrut@gmail.com>
Date:   Tue Feb 21 08:02:50 2023 -0500

    Create file2.txt

commit bc3e5c4a9d54203739bb90f29d92e20082b9e5d4
Author: Anton Nekrutenko <anekrut@gmail.com>
Date:   Tue Feb 21 08:02:25 2023 -0500

    Create file1.txt

commit 99dd43783be37b08c1ca80cdc881eae537526396
Author: Anton Nekrutenko <anekrut@gmail.com>
Date:   Tue Feb 21 08:00:34 2023 -0500

    Updated readme

commit 18ebdabfe6f92f33ea626994ce6c9998cbe63522
Author: Anton Nekrutenko <anekrut@gmail.com>
Date:   Tue Feb 21 07:59:46 2023 -0500

    Initial commit

Check status of the repo

$git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

Let’s change and stage a file

Use editor to modify file1.txt. Once it is saved, we can see what is happening by using git status again:

$git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   file1.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        .ipynb_checkpoints/

no changes added to commit (use "git add" and/or "git commit -a")

You can see, that the file is modified, but it is not staged. To stage it for commit you need to explicitly add it to staging:

git add file1.txt 

If you run git status now you will get:

$git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   file1.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        .ipynb_checkpoints/

Oh 💩 get reset

Running git reset will restore the status quo as it was prior to git add:

$git reset
Unstaged changes after reset:
M       file1.txt

and git status will look as it did prior to git add:

git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   file1.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        .ipynb_checkpoints/

no changes added to commit (use "git add" and/or "git commit -a")

Let’s stage and commit

If you are indeed ready to go ahead let’s add:

git add file1.txt

and commit

$git commit -m 'modified file1'
[main 476adb7] modified file1
 1 file changed, 1 insertion(+), 1 deletion(-)

If run git log you will this addition commit:

$git log
commit 476adb7b64a9330199fea13ca2e7523f7fe90189 (HEAD -> main)
Author: nekrut <anekrut@gmail.com>
Date:   Tue Feb 21 13:23:24 2023 +0000

    modified file1

or alternatively you can use --oneline flag:

$git log --oneline
476adb7 (HEAD -> main) modified file1
de89f51 (origin/main, origin/HEAD) Create file2.txt
bc3e5c4 Create file1.txt
99dd437 Updated readme
18ebdab Initial commit

Oh 💩 get revert

To roll everything back you can do this:

$git revert 476adb7
[main 1f13d5f] Revert "modified file1"
 1 file changed, 1 insertion(+), 1 deletion(-)

This will bring vim editor with a pre-filled rollback message:

vim image.

To save this and get out:

  • press ESC
  • type :wq
  • hit Enter

Now, let’s look at the log (showing just two lat commits):

 git log
commit 1f13d5fcf38057db9a90066b99aa475a6eeb1bae (HEAD -> main)
Author: nekrut <anekrut@gmail.com>
Date:   Tue Feb 21 13:31:24 2023 +0000

    Revert "modified file1"
    
    This reverts commit 476adb7b64a9330199fea13ca2e7523f7fe90189.

commit 476adb7b64a9330199fea13ca2e7523f7fe90189
Author: nekrut <anekrut@gmail.com>
Date:   Tue Feb 21 13:23:24 2023 +0000

    modified file1

Let’s actually do make changes and highlight them with diff

First, let’s modify, say, file2.txt by adding a line. In this example I modified it from:

This
is
another
file
I've
made

to

This
is
file2
I've
made

Now add and commit:

$ git status
On branch main
Your branch is ahead of 'origin/main' by 2 commits.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   file2.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        .ipynb_checkpoints/

no changes added to commit (use "git add" and/or "git commit -a")

$ git add file2.txt 

$ git commit -m 'changed file2'
[main a947686] changed file2
 1 file changed, 1 insertion(+), 2 deletions(-)

Let’s now compare the changes between two commits:

$ git diff a947686 1f13d5f
diff --git a/file2.txt b/file2.txt
index 1f383ac..e13e461 100644
--- a/file2.txt
+++ b/file2.txt
@@ -1,5 +1,6 @@
 This
 is
-file2
+another
+file
 I've
 made

here:

  • a and b = tags of files being compared
  • --- and +++ marked to indiciate differences between a and b
  • @@ -1,5 +1,6 @@ file chunk header which has the following format:
    • @@ [file a range][file b range] @@
    • File ranges are: <start line><number of lines>

Branches

git branches.

From GitBetter

To enable collaborations and to give the ability to develop major features without disrupting production (master or main) branch Git allows creation of multiple branches.

Create a new branch and switch to it

Let’s create branch dev:

$ git branch dev

to see existing branches:

$ git branch
  dev
* main

The main branch is active (tagged with *). To actually switch branches:

$ git checkout dev
Switched to branch 'dev'

Make changes

Let’s make some changes, say, to file1.txt from this:

This
is
a 
file

to this:

This
is
a 
file1

and then get status, add, and commit:

$ git status
On branch dev
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   file1.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        .ipynb_checkpoints/

no changes added to commit (use "git add" and/or "git commit -a")

$ git add file1.txt 

$ git commit -m 'changes to file1.txt'
[dev 05ce64d] changes to file1.txt
 1 file changed, 1 insertion(+), 1 deletion(-)

Switch to another branch and look at the modified file

If we switch back to main and look at the content of file1.txt we will see the “old” content:

This
is
a 
file

Merge branches

To incorporate changes from dev to master we need to do a merge:

$ git merge dev main
Updating a947686..05ce64d
Fast-forward
 file1.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

and if we look at file1.txt again we will see the change:

$ more file1.txt This is a file1

We can now delete the branch:

$ git branch -d dev
Deleted branch dev (was 05ce64d).

Merging with conflicts

To simulate a merging conflict we need to make different changes to the same line of a file in different branches.

Modify file1.txt in main

First, let’s checkout main and make the following change to file1.txt, say, from

This
is
a 
file1

to

This
is
a 
file1 - the first file we created

now we add and commit:

$ git add file1.txt 

$ git commit -m 'main changes to file1'
[main dfd0598] main changes to file1
 1 file changed, 1 insertion(+), 1 deletion(-)

You can use git diff to see the differences between branches such as, for example:

$ git diff main dev
diff --git a/file1.txt b/file1.txt
index f8e8191..e82ee85 100644
--- a/file1.txt
+++ b/file1.txt
@@ -1,4 +1,4 @@
 This
 is
 a 
-file1 - the first file we created
+file1

Modify file1.txt in dev

Checkout dev:

$ git checkout dev
Switched to branch 'dev'

Change file1.txt, say, from:

This
is
a 
file1

to

This
is
a 
file1 - the first file

now we add and commit:

$ git add file1.txt 

$ git commit -m 'dev changes to file1'
[dev b2b8b9e] dev changes to file1
 1 file changed, 1 insertion(+), 1 deletion(-)

Checkout main and try to merge

 $ git merge main dev
 Auto-merging file1.txt
 CONFLICT (content): Merge conflict in file1.txt
 Automatic merge failed; fix conflicts and then commit the result.

If then actually look at the content of the file, you will see this:

file1.

so here you need to decide which version you will keep and edit the file correspondingly. For example, I edited it to this form and saved:

file1_edited.

Now I need to add and commit:

 $ git add file1.txt
 jovyan@jupyter-jupyterlab-2djupyterlab-2ddemo-2dryet2dsd:~/bmmb554_foo_bar$ git commit -m 'merged dev into main'