Content from Introduction


Last updated on 2025-11-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

  • What do I do when I need to make complex decisions with my git respository?
  • How do I collaborate on a software project with others?

Objectives

  • Understand the range of functionality that exists in git.
  • Understand the different challenges that arrise with collaborative projects.

Introduction


Version control systems are a way to keep track of changes in text-based documents. We start with a base version of the document and then record the changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.

The git version control system, used to manage the code in many millions of software projects, is one of the most widely adopted one. It uses a distributed version control model (the “beautiful graph theory tree model”), meaning that there is no single central repository of code. Instead, users share code back and forth to synchronise their repositories, and it is up to each project to define processes and procedures for managing the flow of changes into a stable software product.

Challenges


Git is powerful and flexible to fit a wide range of use cases and workflows from simple projects written by a single contributor to projects that are millions of lines and have hundreds of co-authors. Furthermore, it does a task that is quite complex. As a result, many users may find it challenging to navigate this complexity. While committing and sharing changes is fairly straightforward, for instance, but recovering from situations such as accidental commits, pushes or bad merges is difficult without a solid understanding of the rather large and complex conceptual model. Case in point, three of the top five highest voted questions on Stack Overflow are questions about how to carry out relatively simple tasks: undoing the last commit, changing the last commit message, and deleting a remote branch.

An XKCD comic about the git control system.

Mouse-over text: If that doesn’t fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of ‘It’s really pretty simple, just think of branches as…’ and eventually you’ll learn the commands that will fix everything.

With this lesson our goal is to give a you a more in-depth understanding of the conceptual model of git, to guide you through increasingly complex workflows and to give you the confidence to participate in larger projects.

Review of Intro Git Commands


First, lets review the concepts and commands that constitute the basic git workflow.

BASH

git init
A diagram showing the relationship between the working directory, staging area, and repository in git.
Staging Area

BASH

git add file.txt
git commit -m "Message"

A commit, or “revision”, is an individual change to a file or set of files. It’s like when you save a file, except with git, every time you save it creates a unique ID (a.k.a. the “SHA” or “hash”) that allows you to keep record of what changes were made when and by who. Each commit contains several key pieces of information that uniquely define its state:

  • Commit message: A description provided by the user explaining the purpose or details of the commit.

  • Committer: The person who added the commit to the repository.

  • Commit date: The date and time when the commit was added to the repository.

  • Author: The original creator of the changes in the commit, which may differ from the committer.

  • Authoring date: The date and time when the changes were originally made by the author.

  • Parent commit(s): Reference to the previous commit(s), which allows Git to trace the history and create a chain of commits.

  • Working directory hash: A unique hash representing the state of all tracked files in the working directory at the time of the commit.

A diagram showing the components that make up a git commit.
What is in a Commit

All these elements together generate a unique commit hash, which identifies the commit across the Git repository.

BASH

git log
git status
git diff
git checkout HEAD file.txt
git revert

git checkout returns the files not yet committed within the local repository to a previous state, whereas git revert reverses changes committed to the local and project repositories.

BASH

git clone http://....

BASH

git push

BASH

git pull

Finally, the git fetch command downloads commits, files, and refs from a remote repository into your local repo. When downloading content from a remote repo, git pull and git fetch commands are available to accomplish the task. You can consider git fetch the ‘safe’ version of the two commands. It will download the remote content but not update your local repo’s working state, leaving your current work intact. git pull is the more aggressive alternative; it will download the remote content for the active local branch and immediately execute git merge to create a merge commit for the new remote content. If you have pending changes in progress this will cause conflicts and kick-off the merge conflict resolution flow. The following command will bring down all the changes from the remote:

BASH

git fetch

It is sometimes useful to only pull the changes from a certain branch, e.g., main. For a repository that has a lot of contributors and branches, all the changes may be unnecessary and overwhelming:

BASH

git fetch origin main

https://www.atlassian.com/git/tutorials/syncing/git-fetch

Review 2
Review 2
Key Points
  • Git version control records text-based differences between files.
  • Each git commit records a change relative to the previous state of the documents.
  • Git has a range of functionality that allows users to manage the changes they make.
  • This complex functionality is especially useful when collaborating on projects with others

Content from Branches


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • What are branches?
  • How do I view the current branches?
  • How do I manipulate branches?

Objectives

  • Understand how branches are created.
  • Learn the key commands to view and manipulate branches.

Branching is a feature available in most modern version control systems. Branching in other version control systems can be an expensive operation in both time and disk space. In git, branches are a part of your everyday development process. When you want to add a new feature or fix a bug—no matter how big or how small—you spawn a new branch to encapsulate your changes. This makes it harder for unstable code to get merged into the main code base, and it gives you the chance to clean up your future’s history before merging it into the main branch.

A diagram showing branching in a theoretical git repository.
Git Branching

The diagram above visualizes a repository with two isolated lines of development, one for a little feature, and one for a longer-running feature. By developing them in branches, it’s not only possible to work on both of them in parallel, but it also keeps the main branch free from questionable code.

The implementation behind Git branches is much more lightweight than other version control system models. Instead of copying files from directory to directory, Git stores a branch as a reference to a commit. In this sense, a branch represents the tip of a series of commits—it’s not a container for commits. The history for a branch is extrapolated through the commit relationships.

(https://www.atlassian.com/git/tutorials/using-branches)

What is a branch?


In git a branch is effectively a pointer to a snapshot of your changes. It’s important to understand that branches are just pointers to commits. When you create a branch, all Git needs to do is create a new pointer, it doesn’t change the repository in any other way. If you start with a repository that looks like this:

A diagram showing several commits on a single branch in a theoretical git repository.
Git Branching

Then, you create a branch using the following command:

BASH

git branch crazy-experiment

The repository history remains unchanged. All you get is a new pointer to the current commit:

Git Branching
Git Branching

Note that this only creates the new branch. To start adding commits to it, you need to select it with git checkout, and then use the standard git add and git commit commands.

A branch also means an independent line of development. Branches serve as an abstraction for the edit/stage/commit process. New commits are recorded in the history for the current branch, which results in a fork in the history of the project. However, it is really important to remember that each commit only records the incremental change in the document and NOT the full history of changes. Therefore, while we think of a branch as a sequence of commits, each commit is independent unit of change.

Branching Commands


Creating, deleting, and modifying branches is quick and easy; here’s a summary of the commands:

To list all branches:

BASH

git branch

BASH

git branch -avv

To create a new branch named <branch>, which references the same point in history as the current branch.

BASH

git branch <branch>

To create a new branch named <branch>, referencing <start-point>, which may be specified any way you like, including using a branch name or a tag name:

BASH

git branch <branch> <start-point>

To delete the branch <branch>; if the branch is not fully merged in its upstream branch or contained in the current branch, this command will fail with a warning:

BASH

git branch -d <branch>

To delete the branch <branch> irrespective of its merged status:

BASH

git branch -D <branch>

To switch to a different branch <branch>, updating the working directory to reflect the version referenced by <branch>.

BASH

git switch <branch>

To create a new branch <new> referencing <start-point>, and check it out.

BASH

git switch -c <new> <start-point>

The special symbol "HEAD" can always be used to refer to the current branch. In fact, Git uses a file named HEAD in the .git directory to remember which branch is current:

BASH

$ cat .git/HEAD
ref: refs/heads/master

Renaming a branch can be done with the -m tag:

BASH

git branch -m <old-branch-name> <new-branch-name>
Key Points
  • A branch represents an independent line of development.
  • git branch creates a new pointer to the current state of the repository and allows you to make subsequent changes from that state.
  • Subsequent changes are considered to belong to that branch.
  • The final commit on a given branch is its HEAD.

Content from Remote Repositories


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • How do I connect my code to other versions of the it?

Objectives

  • Learn about remote repositories.

https://www.atlassian.com/git/tutorials/syncing

Git’s distributed collaboration model, which gives every developer their own copy of the repository, complete with its own local history and branch structure. Users typically need to share a series of commits rather than a single “changeset”. Instead of committing a “changeset” from a working copy to the central repository, Git lets you share entire branches between repositories.

Git remote


The git remote command lets you create, view, and delete connections to other repositories. Remote connections are more like bookmarks rather than direct links into other repositories. Instead of providing real-time access to another repository, they serve as convenient names that can be used to reference a not-so-convenient URL.

A diagram showing a local git repository with remote connections to two other repositories.
Remote Schematic

For example, the diagram above shows two remote connections from your repo into the central repo and another developer’s repo. Instead of referencing them by their full URLs, you can pass the origin and john shortcuts to other Git commands.

The git remote command is essentially an interface for managing a list of remote entries that are stored in the repository’s ./.git/config file. The following commands are used to view the current state of the remote list.

Git is designed to give each developer an entirely isolated development environment. This means that information is not automatically passed back and forth between repositories. Instead, developers need to manually pull upstream commits into their local repository or manually push their local commits back up to the central repository. The git remote command is really just an easier way to pass URLs to these “sharing” commands.

View Remote Configuration


To list the remote connections of your repository to other repositories you can use the git remote command:

BASH

git remote

If you test this in our training repository, you should get only one connection, origin:

BASH

origin

When you clone a repository with git clone, git automatically creates a remote connection called origin pointing back to the cloned repository. This is useful for developers creating a local copy of a central repository, since it provides an easy way to pull upstream changes or publish local commits. This behaviour is also why most Git-based projects call their central repository origin.

We can ask git for a more verbose (-v) answer which gives us the URLs for the connections:

BASH

git remote -v

For our training repository this should return:

BASH

origin	https://github.com/user_name/advanced-git-training.git (fetch)
origin	https://github.com/user_name/advanced-git-training.git (push)

As expected these point to the original repository we cloned.

Create and Modify Connections


The git remote command also lets you manage connections with other repositories. The following commands will modify the repo’s ./.git/config file. The result of the following commands can also be achieved by directly editing the ./.git/config file with a text editor.

Create a new connection to a remote repository. After adding a remote, you’ll be able to use <name> as a convenient shortcut for <url> in other Git commands.

BASH

git remote add <name> <url>

Remove the connection:

BASH

git remote rm <name>

Rename a connection:

BASH

git remote rename <old-name> <new-name>

To get high-level information about the remote <name>:

BASH

git show <name>

Exercise: Add a connection to your neighbour’s repository. Having this kind of access to individual developers’ repositories makes it possible to collaborate outside of the central repository. This can be very useful for small teams working on a large project.

BASH

git remote add john http://dev.example.com/john.git

Starting a branch from the main repository state:


Remember that when you create a new branch without specifying a starting point, then the starting point will be the current state and branch. In order to avoid confusion, ALWAYS branch from the stable version. Here is how you would branch from your own origin/main branch:

BASH

git fetch origin main
git checkout -b <branch> origin/main

You must fetch first so that you have the most recent state of the repository.

If there is another “true” version/state of the project, then this connection may be set as upstream (or something else). Upstream is a common name for the stable repository, then the sequence will be:

BASH

git fetch upstream main
git checkout -b <branch> upstream/main

Now we can set the MPIA version of our repository as the upstream for our local copy.

Challenge

Setting the upstream repository

Set the https://github.com/mpi-astronomy/advanced-git-training as the upstream locally.

Then, examine the state of your repository with git branch, git remote -v, git remote show upstream

BASH

git remote add upstream https://github.com/mpi-astronomy/advanced-git-training.git
git fetch upstream
git checkout -b develop upstream/develop
Key Points
  • The git remote command allows us to create, view and delete connections to other repositories.
  • Remote connections are like bookmarks to other repositories.
  • Other git commands (git fetch, git push, git pull) use these bookmarks to carry out their syncing responsibilities.

Content from Undoing Changes


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • How do I undo changes?

Objectives

  • How do I roll back a single change?
  • How do I get back to a specific state?
Challenge

Exercise: Creating a branch.

Create a new branch called hotfix. Create a new file and make 3-4 commits in that file or create 3-4 new files. Check the log to see the SHA of the last commit.

You can use the touch command to create new files quickly.

Use git add and git commit -m "your message" to save your changes.

BASH

git checkout -b hotfix
touch a.txt
git add . && git commit -m "1st git commit: 1 file"
touch b.txt
git add . && git commit -m "2nd git commit: 2 file"
touch c.txt
git add . && git commit -m "3rd git commit: 3 file"
git status
git log --oneline

Git Revert


Reverting undoes a commit by creating a new commit. This is a safe way to undo changes, as it has no chance of re-writing the commit history. For example, the following command will figure out the changes contained in the 2nd to last commit, create a new commit undoing those changes, and tack the new commit onto the existing project.

BASH

git revert HEAD~1
ls
A diagram showing a repository before and after reverting the last commit.
git revert

Note that revert only backs out the atomic changes of the ONE specific commit (by default, you can also give it a range of commits but we are not going to do that here, see the help).

git revert does not rewrite history which is why it is the preferred way of dealing with issues when the changes have already been pushed to a remote repository.

Git Reset


Resetting is a way to move the tip of a branch to a different commit. This can be used to remove commits from the current branch. For example, the following command moves the hotfix branch backwards by two commits.

BASH

git checkout hotfix
git reset HEAD~1
A diagram showing a repository before and after resetting the last commit.
git reset

The two commits that were on the end of hotfix are now dangling, or orphaned commits. This means they will be deleted the next time git performs a garbage collection. In other words, you’re saying that you want to throw away these commits.

git reset also reverts the commits but leaves the uncommitted changes in the repo.

BASH

git status
git restore b.txt

git reset is a simple way to undo changes that haven’t been shared with anyone else. It’s your go-to command when you’ve started working on a feature and find yourself thinking, “Oh crap, what am I doing? I should just start over.”

In addition to moving the current branch, you can also get git reset to alter the staged snapshot and/or the working directory by passing it one of the following flags:

–soft – The staged snapshot and working directory are not altered in any way.

–mixed – The staged snapshot is updated to match the specified commit, but the working directory is not affected. This is the default option.

–hard – The staged snapshot and the working directory are both updated to match the specified commit.

It’s easier to think of these modes as defining the scope of a git reset operation.

To just undo any uncommitted changes:

BASH

git status
git add c.txt
git status
git reset HEAD
git status

You can add and commit the changes or restore the file.

git reset can also work on a single file:

BASH

git reset HEAD~2 foo.txt

Git Checkout: A Gentle Way


We already saw that git checkout is used to move to a different branch but is can also be used to update the state of the repository to a specific point in the projects history.

BASH

git checkout hotfix
git checkout HEAD~2
A diagram showing a repository before and after using git checkout to move to a previous commit.
git checkout

This puts you in a detached HEAD state. AGHRRR!

Most of the time, HEAD points to a branch name. When you add a new commit, your branch reference is updated to point to it, but HEAD remains the same. When you change branches, HEAD is updated to point to the branch you’ve switched to. All of that means that, in these scenarios, HEAD is synonymous with “the last commit in the current branch.” This is the normal state, in which HEAD is attached to a branch.

The detached HEAD state is when HEAD is pointing directly to a commit instead of a branch. This is really useful because it allows you to go to a previous point in the project’s history. You can also make changes here and see how they affect the project.

BASH

echo "Welcome to the alternate timeline, Morty!" > new-file.txt
git add .
git commit -m "Create new file"
echo "Another line" >> new-file.txt
git commit -a -m "Add a new line to the file"
git log --oneline

If you haven’t made any changes or you have made changes but you want to discard them you can recover by switching back to your branch:

BASH

git checkout hotfix

Alternatively, you want to keep the changes:

BASH

git branch alt-history
git checkout alt-history

https://www.atlassian.com/git/tutorials/resetting-checking-out-and-reverting Also OMG: http://blog.kfish.org/2010/04/git-lola.html

Exercise: Undoing Changes


Challenge

Exercise: Undoing Changes

  1. Create a new branch called hotfix. Create a new file and make 3-4 commits in that file. Check the log to see the SHA of the last commit.
  2. Revert the last commit that we just inserted. Check the history.
  3. Completely throw away the last two commits [DANGER ZONE!!!]. Check the status and the log.
  4. Undo another commit but leave it in the staging area. Check the status and log.
  5. Wrap it up: add and commit the changes.

Step 1:

BASH

git checkout -b hotfix
touch my_file.txt
echo "First line" > my_file.txt
git add my_file.txt
git commit -m "First commit"
echo "Second line" >> my_file.txt
git add my_file.txt
git commit -m "Second commit"
echo "Third line" >> my_file.txt
git add my_file.txt
git commit -m "Third commit"
git status
git log --oneline

Step 2:

BASH

git revert -m 1 <SHA>
git log

Step 3:

BASH

git reset HEAD~2 --hard
git status
git log

Step 4:

BASH

git reset HEAD~1
git status
git log

Step 5:

BASH

git add .
git commit -m "Message"
Key Points
  • git reset rolls back the commits and leaves the changes in the files
  • git reset --hard roll back and delete all changes
  • git reset does alter the history of the project.
  • You should use git reset to undo local changes that have not been pushed to a remote repository.
  • git revert undoes a commit by creating a new commit.
  • git revert should be used to undo changes on a public branch or changes that have already been pushed remotely.
  • git revert only backs out a single commit or a range of commits.

Content from Merging


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • How do I merge a branch changes?

Objectives

  • Learn about git merge.

When you are collaborating, you will have to merge a branch independent if your branch may or may not have diverged from the main branch. Most of the Git hosting platform like GiHub or GitLab allows you to merge a branch from their web interface but you can also merge the branches from your machine using git merge.

There are 2 ways to merge:

  • non-fast-forward merged (recommended)

  • fast forward merged

A diagram showing different types of Git merges.
Merging diagram.

Reminder: when starting work on a new feature, be careful where you branch from!

BASH

git remote add upstream https://github.com/mpi-astronomy/advanced-git-training.git
git fetch upstream
git checkout -b develop upstream/develop

Non-fast-forwad Merge


Merges branch by creating a merge commit. Prompts for merge commit message. Ideal for merging two branches.

BASH

git checkout main
git merge --no-ff <branch> -m "Message"

The --no-ff flag causes the merge to always create a new commit object, even if the merge could be performed with a fast-forward. This avoids losing information about the historical existence of a feature branch and groups together all commits that together added the feature.

Challenge

Exercise: Creating a non-fast-forwad merge.

Create a new Git repository that has the following tree.

*   69fac81 (main) Merge branch 'gitignore'
|\
| * 5537012 (gitignore) Add .gitignore
|/
* 6ec7c0f Add README

BASH

git init
touch README.md
git add README.md
git commit -m 'Add README'
git checkout -b gitignore
touch .gitignore
git add .gitignore
git commit -m "Add .gitignore"
git checkout main
git merge --no-ff gitignore

Fast-forward Merge


If there are no conflicts with the main branch, a “fast-forward” merge can be executed with. This will NOT create a merge commit! Aborts merge if it cannot be done. Ideal for updating a branch from remote.

BASH

git checkout main
git merge --ff-only <branch>

If using the fast-forward merge, it is impossible to see from the git history which of the commit objects together have implemented a feature. You would have to manually read all the log messages. Reverting a whole feature (i.e. a group of commits), is a true headache in the latter situation, whereas it is easily done if the –no-ff flag was used.

For a good illustration of fast-forward merge (and other concepts), see this thread: https://stackoverflow.com/questions/9069061/what-effect-does-the-no-ff-flag-have-for-git-merge

Challenge

Exercise: Creating a fast-forwad merge.

Consider the following Git tree

BASH

* a78b99f (main) Add title
| * 3d88062 (remote) Add .gitignore
|/
* 86c4247 Add README

Is possible to run a fast-forward merge to incorporate the branch remote into main?

No, it is not possible to run a fast-forward merge because of commit a78b99f.

Three-way Merge

Similar to --no-ff, but there may be dragons. Forced upon you when there’s an intermediate change since you branched. May prompt your to manually resolve

BASH

git merge <branch> [-s <strategy>]

See https://git-scm.com/docs/merge-strategies for a zillion options (“patience”, “octopus”, etc), But also git is only so smart and you are probably smarter.

Merging strategies: https://git-scm.com/docs/merge-strategies

comment: <> (Merging 2comment: <> (Merging FF) comment: <> (Merging no FF) comment: <> (Merging 3 Way)

https://nvie.com/posts/a-successful-git-branching-model/

Note: there are a number of external tools that have a graphical interface to allow for merge conflict resolution. Some of these include: kdiff3 (Windows, Mac, Linux), Meld (Windows, Linux), P4Merge (Windows, Mac, Linux), opendiff (Mac), vimdiff (for Vim users), Beyond Compare, GitHub web interface. We do not endorse any of them and use at your own risk. In any case, using a graphical interface does not substitute for understanding what is happening under the hood.

Key Points
  • git merge --no-ff is the best way to merge changes
  • git merge --ff-only is a good way to pull down changes from remote

Content from Tags


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • How can I flag a specific state of the project?

Objectives

  • Learning about the git tag command

A tag is a marker of a specic commit in the project history. You can think of it as a permanent bookmark. Tags can be created to point to a release version, a major code change, a state of the code that was used to produce a paper or a data release, or any other event you (or the development team) may want to reference in the future.

Once a tag has been created, no other changes can be added to it. But you can delete it and create a new one with the same name.

Don’t name your tags the same as your branches. Or the other way around. git fetch can get a tag or a branch and that can be confusing.

The command that allows you to handle git tags is just git tag. Without any flags it simply list the existing tags:

BASH

git tag

You can create a new tag based on the current state of the repository by providing a tag name to the git tag command:

BASH

git tag 1.0.0

This however creates what is called a lightweight tag. Lightweight tags are like a branch that doesn’t change.

You can get information on a tag via git show:

BASH

git show 1.0.0

Lightweight tags are not recommended in most use cases because they do not save all the information. Instead, use annotated tags (https://git-scm.com/book/en/v2/Git-Basics-Tagging). They are stored as full objects in the Git database: they’re checksummed; contain the tagger name, email, and date; have a tagging message; and can be signed and verified with GNU Privacy Guard (GPG).

To create an annotated tag from the current commit:

BASH

git tag -a 2.0.0 -m <message>

It is also possible to tag a past commit by providing that commit’s SHA:

BASH

git tag -a <tag> [<SHA>] -m <message>

To get more information about an existing tag you can “verify” it, which displays that tag’s details, including the tagger, date, and message. This only works for annotated commits:

BASH

git tag -v 1.0.0
git tag -v 2.0.0

A tag allows you to switch to the version of the code that was tagged, to use that version of the code, or to see what the code looked at that tag. Here is how to check out a state of the code that has been tagged:

BASH

git checkout <tag>

Push a tag to origin:

BASH

git push origin <tag>

And of course you can delete a tag. This does not delete the commit, just removes the marker/lable. Delete a tag:

BASH

git tag -d <tag>

Since tags are frequently used to do releases, it is useful to be aware that codebases and languages have standards on how release versions should be labled. If you are working with an existing code base, follow the standard set by the dev team. If you are developing a library by yourself, follow the standards for the language. For example, the (Python Packaging Authority)[https://packaging.python.org/en/latest/specifications/version-specifiers/#version-specifiers] (and previously(PEP440)[https://peps.python.org/pep-0440/]) specifies the scheme for identifying versions for python libraries.

Key Points
  • git tag allows us to mark a point we can return to.
  • A tag is tied to a commit.

Content from Branching Models


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • What is a branching model?
  • Why do you need one?
  • What are the most common branching models?

Objectives

  • Learn about the importance of a branching model.

What is a branching model/strategy?


Branches are primarily used as a means for teams to develop features giving them a separate workspace for their code. These branches are usually merged back to a master branch upon completion of work. In this way, features (and any bug and bug fixes) are kept apart from each other allowing you to fix mistakes more easily.

This means that branches protect the mainline of code and any changes made to any given branch don’t affect other developers.

A branching strategy, therefore, is the strategy that software development teams adopt when writing, merging and deploying code when using a version control system.

It is essentially a set of rules that developers can follow to stipulate how they interact with a shared codebase.

Such a strategy is necessary as it helps keep repositories organized to avoid errors in the application and the dreaded merge hell when multiple developers are working simultaneously and are all adding their changes at the same time. Such merge conflicts would eventually deter the combination of contributions from multiple developers.

Thus, adhering to a branching strategy will help solve this issue so that developers can work together without stepping on each other’s toes. In other words, it enables teams to work in parallel to achieve faster releases and fewer conflicts by creating a clear process when making changes to source control.

When we talk about branches, we are referring to independent lines of code that branch off the master branch, allowing developers to work independently before merging their changes back to the code base.

In this and the following episodes, we will outline some of the branching strategies that teams use in order to organize their workflow where we will look at their pros and cons and which strategy you should choose based on your needs, objectives and your team’s capabilities.

Why do you need a branching model?


As mentioned above, having a branching model is necessary to avoid conflicts when merging and to allow for the easier integration of changes into the master trunk.

A BRANCHING MODEL AIMS TO: - Enhance productivity by ensuring proper coordination among developers - Enable parallel development - Help organize a series of planned, structured releases - Map a clear path when making changes to software through to production - Maintain a bug-free code where developers can quickly fix issues and get these changes back to production without disrupting the development workflow

Git Branching Models


Some version control systems are Very Opinionated about the branching models that can be used. git is very much (fortunately or unfortunately) not. This means that there are many different ways to do development in a team and the team needs to explicitly agree on how and when to merge contributions to the main branch. So the first rule of git granching is: “Talk about your branching model.” The second rule is: “Talk about your branching model.” If in doubt, do what other people around you are doing. If they don’t do anything, call a friend.

That said, there are a number of established (and less so) branching models that are used with git. These include, but are not limited to:

  • Centralized workflow: enables all team members to make changes directly to the main branch. Every change is logged into the history. In this workflow, the contributors do not use other branches. Instead they all make changes on the main branch directly and commit to it. This woks for individual developers or small thema which communicate very well, but can be tricky for larger teams: the code is in constant state of flux and developers keep changes local until they are ready to release.

  • Trunk-based development (cactus flow?): is somewhat similar to the centralized workflow. The development happens on a single branch called trunk. When changes need to be merged, each developer pulls and rebases from the trunk branch and resolves conflicts locally. This can work if small merges are made frequently and is more successful if there is CI/CD.

  • Feature branch workflow: every small change or “feature” gets its own branch where the developers make changes. Once the feature is done, they submit a merge/pull request and merge it into the main branch. Features branches should be relatively short-lived. The benefit of this model is that the main branch is not poluted by unfinished features. Good for teams.

  • Gitflow: is a model where the main development happens in a develop branch with feature branches. When the develop branch is ready for a release (or to go into production), a team member creates a release branch which is tested and eventually merged onto the dev and eventually main branch.

  • GitHub flow (https://docs.github.com/en/get-started/quickstart/github-flow): similar to the branching workflow.

  • GitLab flow: is a simplified version of Gitflow (https://about.gitlab.com/topics/version-control/what-is-gitlab-flow/)

  • Oneflow: is similar to Gitflow but relies on the maintanance of one long-lived branch. It is meant to be simpler, without a develop branch but feature branches still exist (https://www.endoflineblog.com/oneflow-a-git-branching-model-and-workflow).

  • Forking workflow (e.g. astropy): is a model where each contributor creates a fork or a complete copy of the repository. Every contribitor effectively has two repositories: his own and the main (upstream) one. Changes are made as pull requests against the main repository. This model is propular with open source projects because the vast majority of contributors do not need to have priviledges in the main repository.

A longer description of some of these can be found here: https://about.gitlab.com/topics/version-control/what-is-git-workflow/#feature-branching-git-workflow

In summary, there are many different ways to collaborate on a project. Look at the pros and cons and select one that fits the needs and organization of your team and project. In the following several sections we look at some of these models in more detail.

Key Points
  • A branching model is a pre-agreed way of merging branches into the main branch.
  • A branching model is needed when multiple contributors are making changes to a single project.

Content from Forking Workflow


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • What are the common workflows of the forking branching model?

Objectives

  • First learning objective. (FIXME)

Preparation: Make sure that the main is clean, everything is committed.

The forking workflow is popular among open source software projects and often used in conjunction with a branching model.

FIXME: Why?

The focus of this workflow is to keep the “upstream main” stable while allowing anyone to work on their own contributions independently. Contributions are then suggested and accepted via pull requests. There is not necessarily a develop branch, but you may have release branches.

GitFlow 1
GitFlow 1

In order to understand the forking workflow, let’s first take a look at some special words and roles needed:

upstream - Remote repository containing the “true copy” origin - Remote repository containing the forked copy Pull request(PR) - Merge request from fork to upstream (a request to add your suggestions to the “original copy”) Maintainer - Someone with write access to upstream who vets PRs Contributor - Someone who contributes to upstream via PRs Release manager - A maintainer who also oversees releases

Example release workflow for the astropy Python package Spacetelescope (STScI) style guide for release workflow

GitFlow 1{Alt: A brief refresher from Git Training: The figure shows the local computer (“You”) with branch1 that includes three files of which one is indicated as removed. An arrow from the local computer points to the cloud in which origin and upstream are located, with a picture of GitHubs Octocat. The arrow from local points to origin with you/code(branch1), also with three files of which the same is indicated as removed. Origin has an arrow pointing to upstream with “PR” written on top of it and a screenshot of the “merge pull request” button from the GitHub webinterface. Upstream has spacetelescope/code (main) with the same three files of which the same file is indicated as removed as in local and origin.}

FIXME: Remove text from image and add as caption, source?

GitFlow 1{Alt: …}

FIXME: Alt text. Remove text from image and add as caption, source?

Exercises


FIXME: More description about what is happening at each step in the solution

Challenge

Exercise 1: Create and push a feature branch

You will be assigned a number by the instructor/helper. Create a feature branch based on upstream main. Then create a file in the trainees folder called hello_NNN.txt using the number you just got (replace NNN with your number, e.g. 007). Then push your feature branch out to GitHub.

BASH

git fetch upstream main
git checkout -b myforkfeature upstream/main
touch ./trainees/hello_NNN.txt
git add ./trainees/hello_NNN.txt
git commit -m "adding my textfile"
git push origin myforkfeature
Challenge

Exercise 2: Suggest your changes via pull request

Go to your repository (your fork) on GitHub and find the tab called “Pull requests”. Klick the green “new pull request” button. Then find and click the blue link uder “Compare changes” called “compare across fork”. Select your username and branch name from the right menus. Then click the big green button under the menus called “create pull request”.

GitFlow 1
GitFlow 1
Key Points
  • First key point. Brief Answer to questions. (FIXME)

Content from Data Science Workflow


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • What are the common workflows of the Data Science branching model?

Objectives

  • First learning objective. (FIXME)

GitFlow 1GitFlow 1GitFlow 1GitFlow 1GitFlow 1 :::::::::::::::::::::::::::::::::::::::: keypoints

  • First key point. Brief Answer to questions. (FIXME)

::::::::::::::::::::::::::::::::::::::::::::::::::

{% include links.md %}

Content from Large Files


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • Why are (large) binary files a problem in Git?
  • What is Git LFS?
  • What are the problems with Git LFS?

Objectives

  • Understanding that Git is not intended for (large) binary files
  • Learning about the git lfs commands
  • Understanding the disadvantages of git lfs

Sometimes, you might want to add non-textual data to your Git repositories. Examples for such uses cases in a software project are e.g.

  • assets for the project documentation like images
  • test data for your test suite

However, such data is stored in binary formats most of the time. Git’s line-based approach of tracking changes is not suited for this type of data. While Git will work with binary data without any errors, it will internally treat each binary file as a file with one (very long) single line of content. Consequently, if you apply changes to such a file, Git will store the entire file in the commit even if there was a lot of similarity between the two versions of the file. As Git does not “forget” about previous versions of the file, doing this repeatedly and/or with very large files will quickly make your repository grow in size. At some point this will severely impact the performance of all your Git operations from git clone to even git status. It is therefore generally discouraged to use Git to track (large) binary files.

However, the problem of binary files in Git repositories cannot be fully neglected: There is a lot of value for a software project in keeping things together that belong together: Documentation assets belong to the documention they are part of. Therefore we will now explore some options on how to integrate large file handling into Git.

The git lfs subcommand is part of an extension to Git. LFS stands for Large File Storage. It allows you to mark individual files as being large. Git does not apply its normal, line-based approach to tracking changes to these large files, instead they are stored separately and only referenced in the Git data model. During push and pull operations, large files are transmitted separately - requiring the server to support this operation.

For the sake of demonstration, we create a file called report.pdf. We assume that it is a large, binary file in order to show how to handle it with git lfs:

BASH

echo "This is a very large report." > report.pdf

Next, we tell Git, that this file should be treated with LFS:

BASH

git lfs track report.pdf
Tracking "report.pdf"

Having done so, we can inspect the repository and we learn that a new file .gitattributes was added to the repository.

BASH

git status
On branch main

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitattributes
	report.pdf

BASH

cat .gitattributes
report.pdf filter=lfs diff=lfs merge=lfs -text

Similar to .gitignore this file is part of the repository itself in order to share it with all your collaborators on this project. We therefore craft a commit that contains it:

BASH

git add .gitattributes
git commit -m "Setup LFS tracking"

Now, we are ready to add the large file to the repository the same way we would with any other file:

BASH

git add report.pdf
git commit -m "Add final report to the repository"

Pushing our commits to the remote repository, we can see in the console output, that our LFS data was transferred to the remote server separately.

BASH

git push origin main
Uploading LFS objects: 100% (1/1), 17 B | 0 B/s, done.
Callout

Tracking with wildcard patterns

LFS tracking is not limited to explicitly spelled out filenames. Instead, wildcard patterns can be passed to git lfs track. However, you should be careful to quote these patterns, as they might otherwise get expanded by to existing files by your shell. For example, tracking all PDFs with LFS could be achieved with the following command:

BASH

git lfs track "*.pdf"
Caution

Disadvantages of Git LFS

Although git lfs by design solves the problem of storing large files in Git repositories, there are some practical hurdles that you should consider before introducing LFS into your project:

  • The git lfs command is a separately maintained extension to the Git core. It is therefore not part of most Git distributions, but needs to be installed separately. Using it in your project will require you to educate your users about LFS and how to install it. Depending on your target audience, you should carefully consider whether the benefits outweigh this disadvantage.
  • Users that do not have git lfs installed will not be notified by Git. They will see the files, but the content will be Git metadata instead of the actual content. Trying to work with those files will typically produce cryptic error messages.
  • Some hosting providers - most notably GitHub - apply restrictive quotas to LFS storage. On the free plan, GitHub currently allows 1GB of storage and 1 GB bandwidth per month. As the band width quota counts every single clone by users, LFS should currently be considered unusable on the GitHub free plan.
Key Points
  • (Large) binary files can grow the repository size immensely and make it unusable
  • git lfs is an extension that stores large files outside the Git data model
  • Use of Git LFS is discouraged in many scenarios.

Content from Undo, Move, Cherrypick


Last updated on 2025-11-24 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • How to undo changes to a repository?
  • How to rename a branch?
  • How to incoproate specific changes in one branch into another?

Objectives

  • Learn how to undo a specific commit.
  • Learn to rename an existing branch.
  • Learn to pick and incorporate specific changes into a different branch.

GitFlow 1GitFlow 1GitFlow 1

Key Points
  • A local repository can still be changed.
  • Once pushed to a remote, changing history can create complications.

{% include links.md %}

Content from Rebase, Squash, Bisect, Patch


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • What are rebase, squash, bisect and patch?

Objectives

  • First learning objective. (FIXME)

GitFlow 1GitFlow 1GitFlow 1GitFlow 1

Key Points
  • First key point. Brief Answer to questions. (FIXME)

{% include links.md %}

Content from Hooks and Actions


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • How do I automate my work locally?
  • How do I add automations to GitHub?

Objectives

  • First learning objective. (FIXME)

Git hooks are scripts that get run when a specific event occurs in git. The scripts can be written in any language and do anything you like, so any executauble script can be a hook.

Git hooks can trigger events on the server size or locally. Examples of local events that can trigger hooks include commit (pre- or post-commit hooks), checkout or rebase. Pre-commit hooks are perhaps the most common and useful ones: they trigger actions before the code is committed and if the hook script fails, then the command is aborted. This can be very powerful - you can automatically run linters, before the code is even committed.

List of pre-written pre-commit hooks: https://github.com/pre-commit/pre-commit-hooks

The executable files are stored in the .git/hooks/ directory in your project directory. A pre-commit hooks will be an executable file in this directory stored with the magic name pre-commit. Check the directory, there are already several examples. Let’s create a new one

BASH

touch .git/hooks/pre-commit
nano .git/hooks/pre-commit

And add the following text to it:

#!/usr/bin/env bash

set -eo pipefail
flake8 hello.py
echo "flake8 passed!"

Now let’s make hello.py:

BASH

touch hello.py
nano hello.py

And add some text to it:

PYTHON

print('Hello world!'')

The typo is on purpose. Add and commit it to the repository.

GitHub actions are the equivalent of serverside hooks on GitHub.

There are lots of things that can be done with GitHub actions: https://docs.github.com/en/actions

Here is an example of a simple cron job: https://github.com/mpi-astronomy/XarXiv

GitFlow 1
GitFlow 1

Materials: https://verdantfox.com/blog/how-to-use-git-pre-commit-hooks-the-hard-way-and-the-easy-way

Key Points
  • First key point. Brief Answer to questions. (FIXME)

{% include links.md %}

Content from Setting up the Command Prompt


Last updated on 2025-11-21 | Edit this page

Estimated time: 5 minutes

Overview

Questions

  • How can I visualize Git information at a glance in the command prompt?

Objectives

  • Improve your command prompt for working with Git

When working with the command prompt in Git, it may prove helpful to keep some information about the repository available at a glance. As Unix shells allow to modify the prompt, a natural approach is to integrate such information into the prompt itself.

Challenge

What would be useful information to integrate into the prompt?

Take a minute to think about which information might be helpful to be shown as part of the prompt.

  • The active branch. As you can swith between different branches of the same repository, it can sometimes be confusing to know which branch your working copy currently reflects. Presenting the branch name as part of the directory name you are currently in may help as a reminder.
  • The state of the branch. An indicator on whether there are modified or uncommitted files in the repository may help in noticing uncommited changes in the repository.

Setting up the query infrastructure


Individual shells have specific ways to define the prompt and the information shown. Select the appropriate code snippet according to the shell you are running. If you are unsure which shell you are using, try the following code to identify the shell you are running.

BASH

you@computer:~$ basename $SHELL
bash

As the idea to augment the command prompt with Git information is not new, the Git repository on Github (i.e., the repository hosting the source code for Git itself) also provides the shell code to query different information. You can download it to your home directory with the following commands.

BASH

you@computer:~$ curl https://raw.githubusercontent.com/git/git/refs/heads/master/contrib/completion/git-prompt.sh -o $HOME/.git-prompt.sh
Callout

bash

To use the git-prompt.sh in bash add the following line to $HOME/.bashrc.

BASH

source ~/.git-prompt.sh
Callout

zsh

To use the git-prompt.sh in zsh add the following line to $HOME/.zshrc.

BASH

source ~/.git-prompt.sh

}

Some shells, such as fish, xonsh, and others already have support for displaying Git repository information built-in.

Now you have the infrastructure set up to augment the command prompt with desired information about your Git repository.

Modifying the prompts


With the code to query the information is already available in the shell session, we still need to use the information in the definition of our prompt.

Callout

bash

Add the following to your $HOME/.bashrc.

BASH

PROMPT_COMMAND='__git_ps1 "\u@\h:\w" "\\\$ "'

will show username, at-sign, host, colon, cwd, then various status string, followed by dollar and SP, as your prompt.

Callout

zsh

Add the following to your $HOME/.bashrc.

BASH

setopt PROMPT_SUBST
PS1='[%n@%m %c$(__git_ps1 " (%s)")]\$ '

This will show username, pipe, then various status string, followed by colon, cwd, dollar and SP, as your prompt.

Tweaking the information shown


Using the git-prompt.sh script you can now tweak the information shown in the prompt by setting specific environment variables.

Indicating unstaged and uncommited changes in the working copy

By setting the environment variable GIT_PS1_SHOWDIRTYSTATE to a non-empty value, the prompt will indicate modified files in the working copy with an * character.

BASH

user@computer:my_repo (main *)> git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   a_file.py

no changes added to commit (use "git add" and/or "git commit -a")
user@computer:my_repo (main *)$

Indicating untracked files in the working copy

By setting the environment variable GIT_PS1_SHOWUNTRACKEDFILES to a non-empty value, the prompt will indicate the presence of untracked files in the working copy with a % character next to the branch name.

BASH

user@computer:my_repo (main %)> git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        untracked.pdf

no changes added to commit (use "git add" and/or "git commit -a")
user@computer:my_repo (main %)>

Indicating a stash in the working copy

Git supports saving modifications to the working copy in a so-called stash that can later be reapplied to the working copy. By setting the environment variable GIT_PS1_SHOWSTASHSTATE to a nonempty value, the prompt will indicate wheter something is stashed, with a $ next to the branch name.

BASH

you@computer:my_repo (main $)> git stash
Saved working directory and index state WIP on main: 33528c7 Added report
you@computer:my_repo (main $)> git stash clear
you@computer:my_repo (main)>

Indicating the name of and difference to the upstream repository

The environment variable GIT_PS1_SHOWUPSTREAM can be set to a space-seperated list of options to show relation of the local working copy to an upstream repository. For basic use, you can select between the following options.

  • verbose show the number of commits behind (-) or ahead (+) if not equal (=) to upstream.
  • name show the abbreviated name of the upstream repository
  • auto chooses a sensible set of information depending on the status of the working copy.

BASH

you@itc19060:my_repo (main *)> echo $GIT_PS1_SHOWUPSTREAM

you@computer:my_repo (main *)> GIT_PS1_SHOWUPSTREAM="verbose"
you@computer:my_repo (main *|u+3)> GIT_PS1_SHOWUPSTREAM="verbose name"
you@computer:my_repo (main *|u+3 origin/main)> GIT_PS1_SHOWUPSTREAM="auto"
you@computer:my_repo (main *>)>
Callout

There are more options for advanced usage available. Check inside of git-prompt.sh for documentation.

Colorizing the output

If the environment variable GIT_PS1_SHOWCOLORHINTS is set to any value, the Git-related part of the output in the prompt will be colorized. If the variable is not set, the output will not be colorized.

Key Points
  • Use available scripts for common shell environments.
  • Indicate changes stashed, pending, or commited to the local working copy.
  • Indicate current branch name to aid in multi-branch workflows.

Content from Additional Resources


Last updated on 2025-11-21 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • What didn’t we cover?

Objectives

  • Provide pointers to additional topics that are not currently covered
  • Provide pointers to additional resources available

ExternalGITKGitHub

Key Points
  • GUIs can help with making some aspects of working with Git easier.