Content from Introduction
Last updated on 2025-11-24 | Edit this page
Estimated time: 10 minutes
Overview
Questions
- What do I do when I need to make complex decisions with my git respository?
- How do I collaborate on a software project with others?
Objectives
- Understand the range of functionality that exists in git.
- Understand the different challenges that arrise with collaborative projects.
Introduction
Version control systems are a way to keep track of changes in text-based documents. We start with a base version of the document and then record the changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.
The git version control system, used to manage the code in many millions of software projects, is one of the most widely adopted one. It uses a distributed version control model (the “beautiful graph theory tree model”), meaning that there is no single central repository of code. Instead, users share code back and forth to synchronise their repositories, and it is up to each project to define processes and procedures for managing the flow of changes into a stable software product.
Challenges
Git is powerful and flexible to fit a wide range of use cases and workflows from simple projects written by a single contributor to projects that are millions of lines and have hundreds of co-authors. Furthermore, it does a task that is quite complex. As a result, many users may find it challenging to navigate this complexity. While committing and sharing changes is fairly straightforward, for instance, but recovering from situations such as accidental commits, pushes or bad merges is difficult without a solid understanding of the rather large and complex conceptual model. Case in point, three of the top five highest voted questions on Stack Overflow are questions about how to carry out relatively simple tasks: undoing the last commit, changing the last commit message, and deleting a remote branch.
Mouse-over text: If that doesn’t fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of ‘It’s really pretty simple, just think of branches as…’ and eventually you’ll learn the commands that will fix everything.
With this lesson our goal is to give a you a more in-depth understanding of the conceptual model of git, to guide you through increasingly complex workflows and to give you the confidence to participate in larger projects.
Review of Intro Git Commands
First, lets review the concepts and commands that constitute the basic git workflow.

A commit, or “revision”, is an individual change to a file or set of
files. It’s like when you save a file, except with git,
every time you save it creates a unique ID (a.k.a. the “SHA” or “hash”)
that allows you to keep record of what changes were made when and by
who. Each commit contains several key pieces of information that
uniquely define its state:
Commit message: A description provided by the user explaining the purpose or details of the commit.
Committer: The person who added the commit to the repository.
Commit date: The date and time when the commit was added to the repository.
Author: The original creator of the changes in the commit, which may differ from the committer.
Authoring date: The date and time when the changes were originally made by the author.
Parent commit(s): Reference to the previous commit(s), which allows Git to trace the history and create a chain of commits.
Working directory hash: A unique hash representing the state of all tracked files in the working directory at the time of the commit.

All these elements together generate a unique commit hash, which identifies the commit across the Git repository.
git checkout returns the files not yet committed within
the local repository to a previous state, whereas
git revert reverses changes committed to the local and
project repositories.
Finally, the git fetch command downloads commits, files,
and refs from a remote repository into your local repo. When downloading
content from a remote repo, git pull and
git fetch commands are available to accomplish the task.
You can consider git fetch the ‘safe’ version of the two commands. It
will download the remote content but not update your local repo’s
working state, leaving your current work intact. git pull
is the more aggressive alternative; it will download the remote content
for the active local branch and immediately execute
git merge to create a merge commit for the new remote
content. If you have pending changes in progress this will cause
conflicts and kick-off the merge conflict resolution flow. The following
command will bring down all the changes from the
remote:
It is sometimes useful to only pull the changes from a certain
branch, e.g., main. For a repository that has a lot of
contributors and branches, all the changes may be unnecessary and
overwhelming:
https://www.atlassian.com/git/tutorials/syncing/git-fetch

- Git version control records text-based differences between files.
- Each git commit records a change relative to the previous state of the documents.
- Git has a range of functionality that allows users to manage the changes they make.
- This complex functionality is especially useful when collaborating on projects with others
Content from Branches
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- What are branches?
- How do I view the current branches?
- How do I manipulate branches?
Objectives
- Understand how branches are created.
- Learn the key commands to view and manipulate branches.
Branching is a feature available in most modern version control
systems. Branching in other version control systems can be an expensive
operation in both time and disk space. In git, branches are
a part of your everyday development process. When you want to add a new
feature or fix a bug—no matter how big or how small—you spawn a new
branch to encapsulate your changes. This makes it harder for unstable
code to get merged into the main code base, and it gives you the chance
to clean up your future’s history before merging it into the main
branch.

The diagram above visualizes a repository with two isolated lines of development, one for a little feature, and one for a longer-running feature. By developing them in branches, it’s not only possible to work on both of them in parallel, but it also keeps the main branch free from questionable code.
The implementation behind Git branches is much more lightweight than other version control system models. Instead of copying files from directory to directory, Git stores a branch as a reference to a commit. In this sense, a branch represents the tip of a series of commits—it’s not a container for commits. The history for a branch is extrapolated through the commit relationships.
(https://www.atlassian.com/git/tutorials/using-branches)
What is a branch?
In git a branch is effectively a pointer to a snapshot
of your changes. It’s important to understand that branches are just
pointers to commits. When you create a branch, all Git needs to do is
create a new pointer, it doesn’t change the repository in any other way.
If you start with a repository that looks like this:

Then, you create a branch using the following command:
The repository history remains unchanged. All you get is a new pointer to the current commit:

Note that this only creates the new branch. To start adding commits
to it, you need to select it with git checkout, and then
use the standard git add and git commit
commands.
A branch also means an independent line of development. Branches serve as an abstraction for the edit/stage/commit process. New commits are recorded in the history for the current branch, which results in a fork in the history of the project. However, it is really important to remember that each commit only records the incremental change in the document and NOT the full history of changes. Therefore, while we think of a branch as a sequence of commits, each commit is independent unit of change.
Branching Commands
Creating, deleting, and modifying branches is quick and easy; here’s a summary of the commands:
To list all branches:
To create a new branch named <branch>, which
references the same point in history as the current branch.
To create a new branch named <branch>, referencing
<start-point>, which may be specified any way you
like, including using a branch name or a tag name:
To delete the branch <branch>; if the branch is
not fully merged in its upstream branch or contained in the current
branch, this command will fail with a warning:
To delete the branch <branch> irrespective of its
merged status:
To switch to a different branch <branch>, updating
the working directory to reflect the version referenced by
<branch>.
To create a new branch <new> referencing
<start-point>, and check it out.
The special symbol "HEAD" can always be used to refer to
the current branch. In fact, Git uses a file named HEAD in
the .git directory to remember which branch is current:
Renaming a branch can be done with the -m tag:
- A branch represents an independent line of development.
-
git branchcreates a new pointer to the current state of the repository and allows you to make subsequent changes from that state. - Subsequent changes are considered to belong to that branch.
- The final commit on a given branch is its HEAD.
Content from Remote Repositories
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- How do I connect my code to other versions of the it?
Objectives
- Learn about remote repositories.
https://www.atlassian.com/git/tutorials/syncing
Git’s distributed collaboration model, which gives every developer their own copy of the repository, complete with its own local history and branch structure. Users typically need to share a series of commits rather than a single “changeset”. Instead of committing a “changeset” from a working copy to the central repository, Git lets you share entire branches between repositories.
Git remote
The git remote command lets you create, view, and delete connections to other repositories. Remote connections are more like bookmarks rather than direct links into other repositories. Instead of providing real-time access to another repository, they serve as convenient names that can be used to reference a not-so-convenient URL.

For example, the diagram above shows two remote connections from your repo into the central repo and another developer’s repo. Instead of referencing them by their full URLs, you can pass the origin and john shortcuts to other Git commands.
The git remote command is essentially an interface for
managing a list of remote entries that are stored in the repository’s
./.git/config file. The following commands are used to view
the current state of the remote list.
Git is designed to give each developer an entirely isolated
development environment. This means that information is not
automatically passed back and forth between repositories. Instead,
developers need to manually pull upstream commits into their local
repository or manually push their local commits back up to the central
repository. The git remote command is really just an easier
way to pass URLs to these “sharing” commands.
View Remote Configuration
To list the remote connections of your repository to other
repositories you can use the git remote command:
If you test this in our training repository, you should get only one
connection, origin:
When you clone a repository with git clone,
git automatically creates a remote connection called
origin pointing back to the cloned repository. This is
useful for developers creating a local copy of a central repository,
since it provides an easy way to pull upstream changes or publish local
commits. This behaviour is also why most Git-based projects call their
central repository origin.
We can ask git for a more verbose (-v)
answer which gives us the URLs for the connections:
For our training repository this should return:
BASH
origin https://github.com/user_name/advanced-git-training.git (fetch)
origin https://github.com/user_name/advanced-git-training.git (push)
As expected these point to the original repository we cloned.
Create and Modify Connections
The git remote command also lets you manage connections
with other repositories. The following commands will modify the repo’s
./.git/config file. The result of the following commands
can also be achieved by directly editing the ./.git/config
file with a text editor.
Create a new connection to a remote repository. After adding a
remote, you’ll be able to use <name> as a convenient
shortcut for <url> in other Git commands.
Remove the connection:
Rename a connection:
To get high-level information about the remote
<name>:
Exercise: Add a connection to your neighbour’s repository. Having this kind of access to individual developers’ repositories makes it possible to collaborate outside of the central repository. This can be very useful for small teams working on a large project.
Starting a branch from the main repository state:
Remember that when you create a new branch without specifying a starting point, then the starting point will be the current state and branch. In order to avoid confusion, ALWAYS branch from the stable version. Here is how you would branch from your own origin/main branch:
You must fetch first so that you have the most recent state of the repository.
If there is another “true” version/state of the project, then this
connection may be set as upstream (or something else).
Upstream is a common name for the stable repository, then
the sequence will be:
Now we can set the MPIA version of our repository as the upstream for our local copy.
Setting the upstream repository
Set the https://github.com/mpi-astronomy/advanced-git-training as the upstream locally.
Then, examine the state of your repository with
git branch, git remote -v,
git remote show upstream
- The
git remotecommand allows us to create, view and delete connections to other repositories. - Remote connections are like bookmarks to other repositories.
- Other git commands (
git fetch,git push,git pull) use these bookmarks to carry out their syncing responsibilities.
Content from Undoing Changes
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- How do I undo changes?
Objectives
- How do I roll back a single change?
- How do I get back to a specific state?
Exercise: Creating a branch.
Create a new branch called hotfix. Create a new file and
make 3-4 commits in that file or create 3-4 new files. Check the log to
see the SHA of the last commit.
You can use the touch command to create new files
quickly.
Use git add and
git commit -m "your message" to save your changes.
Git Revert
Reverting undoes a commit by creating a new commit. This is a safe way to undo changes, as it has no chance of re-writing the commit history. For example, the following command will figure out the changes contained in the 2nd to last commit, create a new commit undoing those changes, and tack the new commit onto the existing project.

Note that revert only backs out the atomic changes of the ONE specific commit (by default, you can also give it a range of commits but we are not going to do that here, see the help).
git revert does not rewrite history which is why it is
the preferred way of dealing with issues when the changes have already
been pushed to a remote repository.
Git Reset
Resetting is a way to move the tip of a branch to a different commit.
This can be used to remove commits from the current branch. For example,
the following command moves the hotfix branch backwards by
two commits.

The two commits that were on the end of hotfix are now
dangling, or orphaned commits. This means they will be deleted the next
time git performs a garbage collection. In other words,
you’re saying that you want to throw away these commits.
git reset also reverts the commits but leaves the
uncommitted changes in the repo.
git reset is a simple way to undo changes that haven’t
been shared with anyone else. It’s your go-to command when you’ve
started working on a feature and find yourself thinking, “Oh crap, what
am I doing? I should just start over.”
In addition to moving the current branch, you can also get
git reset to alter the staged snapshot and/or the working
directory by passing it one of the following flags:
–soft – The staged snapshot and working directory are not altered in any way.
–mixed – The staged snapshot is updated to match the specified commit, but the working directory is not affected. This is the default option.
–hard – The staged snapshot and the working directory are both updated to match the specified commit.
It’s easier to think of these modes as defining the scope of a git reset operation.
To just undo any uncommitted changes:
You can add and commit the changes or restore the file.
git reset can also work on a single file:
Git Checkout: A Gentle Way
We already saw that git checkout is used to move to a
different branch but is can also be used to update the state of the
repository to a specific point in the projects history.

This puts you in a detached HEAD state. AGHRRR!
Most of the time, HEAD points to a branch name. When you add a new commit, your branch reference is updated to point to it, but HEAD remains the same. When you change branches, HEAD is updated to point to the branch you’ve switched to. All of that means that, in these scenarios, HEAD is synonymous with “the last commit in the current branch.” This is the normal state, in which HEAD is attached to a branch.
The detached HEAD state is when HEAD is pointing directly to a commit instead of a branch. This is really useful because it allows you to go to a previous point in the project’s history. You can also make changes here and see how they affect the project.
BASH
echo "Welcome to the alternate timeline, Morty!" > new-file.txt
git add .
git commit -m "Create new file"
echo "Another line" >> new-file.txt
git commit -a -m "Add a new line to the file"
git log --oneline
If you haven’t made any changes or you have made changes but you want to discard them you can recover by switching back to your branch:
Alternatively, you want to keep the changes:
https://www.atlassian.com/git/tutorials/resetting-checking-out-and-reverting Also OMG: http://blog.kfish.org/2010/04/git-lola.html
Exercise: Undoing Changes
Exercise: Undoing Changes
- Create a new branch called
hotfix. Create a new file and make 3-4 commits in that file. Check the log to see the SHA of the last commit. - Revert the last commit that we just inserted. Check the history.
- Completely throw away the last two commits [DANGER ZONE!!!]. Check the status and the log.
- Undo another commit but leave it in the staging area. Check the status and log.
- Wrap it up: add and commit the changes.
Step 1:
BASH
git checkout -b hotfix
touch my_file.txt
echo "First line" > my_file.txt
git add my_file.txt
git commit -m "First commit"
echo "Second line" >> my_file.txt
git add my_file.txt
git commit -m "Second commit"
echo "Third line" >> my_file.txt
git add my_file.txt
git commit -m "Third commit"
git status
git log --oneline
Step 2:
Step 3:
Step 4:
Step 5:
-
git resetrolls back the commits and leaves the changes in the files -
git reset --hardroll back and delete all changes -
git resetdoes alter the history of the project. - You should use
git resetto undo local changes that have not been pushed to a remote repository. -
git revertundoes a commit by creating a new commit. -
git revertshould be used to undo changes on a public branch or changes that have already been pushed remotely. -
git revertonly backs out a single commit or a range of commits.
Content from Merging
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- How do I merge a branch changes?
Objectives
- Learn about
git merge.
When you are collaborating, you will have to merge a branch
independent if your branch may or may not have diverged from the main
branch. Most of the Git hosting platform like GiHub or GitLab allows you
to merge a branch from their web interface but you can also merge the
branches from your machine using git merge.
There are 2 ways to merge:
non-fast-forward merged (recommended)
fast forward merged

Reminder: when starting work on a new feature, be careful where you branch from!
BASH
git remote add upstream https://github.com/mpi-astronomy/advanced-git-training.git
git fetch upstream
git checkout -b develop upstream/develop
Non-fast-forwad Merge
Merges branch by creating a merge commit. Prompts for merge commit message. Ideal for merging two branches.
The --no-ff flag causes the merge to always create a new
commit object, even if the merge could be performed with a fast-forward.
This avoids losing information about the historical existence of a
feature branch and groups together all commits that together added the
feature.
Exercise: Creating a non-fast-forwad merge.
Create a new Git repository that has the following tree.
* 69fac81 (main) Merge branch 'gitignore'
|\
| * 5537012 (gitignore) Add .gitignore
|/
* 6ec7c0f Add README
Fast-forward Merge
If there are no conflicts with the main branch, a “fast-forward” merge can be executed with. This will NOT create a merge commit! Aborts merge if it cannot be done. Ideal for updating a branch from remote.
If using the fast-forward merge, it is impossible to see from the
git history which of the commit objects together have
implemented a feature. You would have to manually read all the log
messages. Reverting a whole feature (i.e. a group of commits), is a true
headache in the latter situation, whereas it is easily done if the
–no-ff flag was used.
For a good illustration of fast-forward merge (and other concepts), see this thread: https://stackoverflow.com/questions/9069061/what-effect-does-the-no-ff-flag-have-for-git-merge
No, it is not possible to run a fast-forward merge because of commit
a78b99f.
Three-way Merge
Similar to --no-ff, but there may be dragons. Forced
upon you when there’s an intermediate change since you branched. May
prompt your to manually resolve
See https://git-scm.com/docs/merge-strategies for a zillion options (“patience”, “octopus”, etc), But also git is only so smart and you are probably smarter.
Merging strategies: https://git-scm.com/docs/merge-strategies
comment:
<> (
comment: <>
(
) comment: <>
(
) comment: <>
(
)
https://nvie.com/posts/a-successful-git-branching-model/
Note: there are a number of external tools that have a graphical interface to allow for merge conflict resolution. Some of these include: kdiff3 (Windows, Mac, Linux), Meld (Windows, Linux), P4Merge (Windows, Mac, Linux), opendiff (Mac), vimdiff (for Vim users), Beyond Compare, GitHub web interface. We do not endorse any of them and use at your own risk. In any case, using a graphical interface does not substitute for understanding what is happening under the hood.
-
git merge --no-ffis the best way to merge changes -
git merge --ff-onlyis a good way to pull down changes from remote
Content from Branching Models
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- What is a branching model?
- Why do you need one?
- What are the most common branching models?
Objectives
- Learn about the importance of a branching model.
What is a branching model/strategy?
Branches are primarily used as a means for teams to develop features giving them a separate workspace for their code. These branches are usually merged back to a master branch upon completion of work. In this way, features (and any bug and bug fixes) are kept apart from each other allowing you to fix mistakes more easily.
This means that branches protect the mainline of code and any changes made to any given branch don’t affect other developers.
A branching strategy, therefore, is the strategy that software development teams adopt when writing, merging and deploying code when using a version control system.
It is essentially a set of rules that developers can follow to stipulate how they interact with a shared codebase.
Such a strategy is necessary as it helps keep repositories organized to avoid errors in the application and the dreaded merge hell when multiple developers are working simultaneously and are all adding their changes at the same time. Such merge conflicts would eventually deter the combination of contributions from multiple developers.
Thus, adhering to a branching strategy will help solve this issue so that developers can work together without stepping on each other’s toes. In other words, it enables teams to work in parallel to achieve faster releases and fewer conflicts by creating a clear process when making changes to source control.
When we talk about branches, we are referring to independent lines of code that branch off the master branch, allowing developers to work independently before merging their changes back to the code base.
In this and the following episodes, we will outline some of the branching strategies that teams use in order to organize their workflow where we will look at their pros and cons and which strategy you should choose based on your needs, objectives and your team’s capabilities.
Why do you need a branching model?
As mentioned above, having a branching model is necessary to avoid conflicts when merging and to allow for the easier integration of changes into the master trunk.
A BRANCHING MODEL AIMS TO: - Enhance productivity by ensuring proper coordination among developers - Enable parallel development - Help organize a series of planned, structured releases - Map a clear path when making changes to software through to production - Maintain a bug-free code where developers can quickly fix issues and get these changes back to production without disrupting the development workflow
Git Branching Models
Some version control systems are Very Opinionated about the branching
models that can be used. git is very much (fortunately or
unfortunately) not. This means that there are many different ways to do
development in a team and the team needs to explicitly agree on how and
when to merge contributions to the main branch. So the first rule of
git granching is: “Talk about your branching model.” The
second rule is: “Talk about your branching model.” If in doubt, do what
other people around you are doing. If they don’t do anything, call a
friend.
That said, there are a number of established (and less so) branching
models that are used with git. These include, but are not
limited to:
Centralized workflow: enables all team members to make changes directly to the main branch. Every change is logged into the history. In this workflow, the contributors do not use other branches. Instead they all make changes on the main branch directly and commit to it. This woks for individual developers or small thema which communicate very well, but can be tricky for larger teams: the code is in constant state of flux and developers keep changes local until they are ready to release.
Trunk-based development (cactus flow?): is somewhat similar to the centralized workflow. The development happens on a single branch called
trunk. When changes need to be merged, each developer pulls and rebases from the trunk branch and resolves conflicts locally. This can work if small merges are made frequently and is more successful if there is CI/CD.Feature branch workflow: every small change or “feature” gets its own branch where the developers make changes. Once the feature is done, they submit a merge/pull request and merge it into the main branch. Features branches should be relatively short-lived. The benefit of this model is that the main branch is not poluted by unfinished features. Good for teams.
Gitflow: is a model where the main development happens in a develop branch with feature branches. When the develop branch is ready for a release (or to go into production), a team member creates a release branch which is tested and eventually merged onto the dev and eventually main branch.
GitHub flow (https://docs.github.com/en/get-started/quickstart/github-flow): similar to the branching workflow.
GitLab flow: is a simplified version of Gitflow (https://about.gitlab.com/topics/version-control/what-is-gitlab-flow/)
Oneflow: is similar to Gitflow but relies on the maintanance of one long-lived branch. It is meant to be simpler, without a develop branch but feature branches still exist (https://www.endoflineblog.com/oneflow-a-git-branching-model-and-workflow).
Forking workflow (e.g. astropy): is a model where each contributor creates a
forkor a complete copy of the repository. Every contribitor effectively has two repositories: his own and the main (upstream) one. Changes are made as pull requests against the main repository. This model is propular with open source projects because the vast majority of contributors do not need to have priviledges in the main repository.
A longer description of some of these can be found here: https://about.gitlab.com/topics/version-control/what-is-git-workflow/#feature-branching-git-workflow
In summary, there are many different ways to collaborate on a project. Look at the pros and cons and select one that fits the needs and organization of your team and project. In the following several sections we look at some of these models in more detail.
- A branching model is a pre-agreed way of merging branches into the main branch.
- A branching model is needed when multiple contributors are making changes to a single project.
Content from Forking Workflow
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- What are the common workflows of the forking branching model?
Objectives
- First learning objective. (FIXME)
Preparation: Make sure that the main is clean, everything is committed.
The forking workflow is popular among open source software projects and often used in conjunction with a branching model.
FIXME: Why?
The focus of this workflow is to keep the “upstream main” stable while allowing anyone to work on their own contributions independently. Contributions are then suggested and accepted via pull requests. There is not necessarily a develop branch, but you may have release branches.

In order to understand the forking workflow, let’s first take a look at some special words and roles needed:
upstream - Remote repository containing the “true copy” origin - Remote repository containing the forked copy Pull request(PR) - Merge request from fork to upstream (a request to add your suggestions to the “original copy”) Maintainer - Someone with write access to upstream who vets PRs Contributor - Someone who contributes to upstream via PRs Release manager - A maintainer who also oversees releases
Example release workflow for the astropy Python package Spacetelescope (STScI) style guide for release workflow
{Alt: A brief
refresher from Git Training: The figure shows the local computer (“You”)
with branch1 that includes three files of which one is indicated as
removed. An arrow from the local computer points to the cloud in which
origin and upstream are located, with a picture of GitHubs Octocat. The
arrow from local points to origin with you/code(branch1), also with
three files of which the same is indicated as removed. Origin has an
arrow pointing to upstream with “PR” written on top of it and a
screenshot of the “merge pull request” button from the GitHub
webinterface. Upstream has spacetelescope/code (main) with the same
three files of which the same file is indicated as removed as in local
and origin.}
FIXME: Remove text from image and add as caption, source?
{Alt: …}
FIXME: Alt text. Remove text from image and add as caption, source?
Exercises
FIXME: More description about what is happening at each step in the solution
Exercise 1: Create and push a feature branch
You will be assigned a number by the instructor/helper. Create a
feature branch based on upstream main. Then create a file in the
trainees folder called hello_NNN.txt using the
number you just got (replace NNN with your number, e.g. 007). Then push
your feature branch out to GitHub.
Exercise 2: Suggest your changes via pull request
Go to your repository (your fork) on GitHub and find the tab called “Pull requests”. Klick the green “new pull request” button. Then find and click the blue link uder “Compare changes” called “compare across fork”. Select your username and branch name from the right menus. Then click the big green button under the menus called “create pull request”.

- First key point. Brief Answer to questions. (FIXME)
Content from Data Science Workflow
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- What are the common workflows of the Data Science branching model?
Objectives
- First learning objective. (FIXME)




::::::::::::::::::::::::::::::::::::::::
keypoints
- First key point. Brief Answer to questions. (FIXME)
::::::::::::::::::::::::::::::::::::::::::::::::::
{% include links.md %}
Content from Large Files
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- Why are (large) binary files a problem in Git?
- What is Git LFS?
- What are the problems with Git LFS?
Objectives
- Understanding that Git is not intended for (large) binary files
- Learning about the
git lfscommands - Understanding the disadvantages of
git lfs
Sometimes, you might want to add non-textual data to your Git repositories. Examples for such uses cases in a software project are e.g.
- assets for the project documentation like images
- test data for your test suite
However, such data is stored in binary formats most of the time.
Git’s line-based approach of tracking changes is not suited for this
type of data. While Git will work with binary data without any errors,
it will internally treat each binary file as a file with one (very long)
single line of content. Consequently, if you apply changes to such a
file, Git will store the entire file in the commit even if there was a
lot of similarity between the two versions of the file. As Git does not
“forget” about previous versions of the file, doing this repeatedly
and/or with very large files will quickly make your repository grow in
size. At some point this will severely impact the performance of all
your Git operations from git clone to even
git status. It is therefore generally discouraged
to use Git to track (large) binary files.
However, the problem of binary files in Git repositories cannot be fully neglected: There is a lot of value for a software project in keeping things together that belong together: Documentation assets belong to the documention they are part of. Therefore we will now explore some options on how to integrate large file handling into Git.
The git lfs subcommand is part of an extension to Git.
LFS stands for Large File
Storage. It allows you to mark individual files as
being large. Git does not apply its normal, line-based approach
to tracking changes to these large files, instead they are stored
separately and only referenced in the Git data model. During push and
pull operations, large files are transmitted separately - requiring the
server to support this operation.
For the sake of demonstration, we create a file called
report.pdf. We assume that it is a large, binary file in
order to show how to handle it with git lfs:
Next, we tell Git, that this file should be treated with LFS:
Tracking "report.pdf"
Having done so, we can inspect the repository and we learn that a new
file .gitattributes was added to the repository.
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitattributes
report.pdf
report.pdf filter=lfs diff=lfs merge=lfs -text
Similar to .gitignore this file is part of the
repository itself in order to share it with all your collaborators on
this project. We therefore craft a commit that contains it:
Now, we are ready to add the large file to the repository the same way we would with any other file:
Pushing our commits to the remote repository, we can see in the console output, that our LFS data was transferred to the remote server separately.
Uploading LFS objects: 100% (1/1), 17 B | 0 B/s, done.
Tracking with wildcard patterns
LFS tracking is not limited to explicitly spelled out filenames.
Instead, wildcard patterns can be passed to git lfs track.
However, you should be careful to quote these patterns, as they might
otherwise get expanded by to existing files by your shell. For example,
tracking all PDFs with LFS could be achieved with the following
command:
Disadvantages of Git LFS
Although git lfs by design solves the problem of storing
large files in Git repositories, there are some practical hurdles that
you should consider before introducing LFS into your project:
- The
git lfscommand is a separately maintained extension to the Git core. It is therefore not part of most Git distributions, but needs to be installed separately. Using it in your project will require you to educate your users about LFS and how to install it. Depending on your target audience, you should carefully consider whether the benefits outweigh this disadvantage. - Users that do not have
git lfsinstalled will not be notified by Git. They will see the files, but the content will be Git metadata instead of the actual content. Trying to work with those files will typically produce cryptic error messages. - Some hosting providers - most notably GitHub - apply restrictive quotas to LFS storage. On the free plan, GitHub currently allows 1GB of storage and 1 GB bandwidth per month. As the band width quota counts every single clone by users, LFS should currently be considered unusable on the GitHub free plan.
- (Large) binary files can grow the repository size immensely and make it unusable
-
git lfsis an extension that stores large files outside the Git data model - Use of Git LFS is discouraged in many scenarios.
Content from Undo, Move, Cherrypick
Last updated on 2025-11-24 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- How to undo changes to a repository?
- How to rename a branch?
- How to incoproate specific changes in one branch into another?
Objectives
- Learn how to undo a specific commit.
- Learn to rename an existing branch.
- Learn to pick and incorporate specific changes into a different branch.



- A local repository can still be changed.
- Once pushed to a remote, changing history can create complications.
{% include links.md %}
Content from Rebase, Squash, Bisect, Patch
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- What are rebase, squash, bisect and patch?
Objectives
- First learning objective. (FIXME)




- First key point. Brief Answer to questions. (FIXME)
{% include links.md %}
Content from Hooks and Actions
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- How do I automate my work locally?
- How do I add automations to GitHub?
Objectives
- First learning objective. (FIXME)
Git hooks are scripts that get run when a specific event occurs in git. The scripts can be written in any language and do anything you like, so any executauble script can be a hook.
Git hooks can trigger events on the server size or locally. Examples
of local events that can trigger hooks include commit (pre-
or post-commit hooks), checkout or rebase.
Pre-commit hooks are perhaps the most common and useful ones: they
trigger actions before the code is committed and if the hook script
fails, then the command is aborted. This can be very powerful - you can
automatically run linters, before the code is even committed.
List of pre-written pre-commit hooks: https://github.com/pre-commit/pre-commit-hooks
The executable files are stored in the .git/hooks/
directory in your project directory. A pre-commit hooks will be an
executable file in this directory stored with the magic name
pre-commit. Check the directory, there are already several
examples. Let’s create a new one
And add the following text to it:
#!/usr/bin/env bash
set -eo pipefail
flake8 hello.py
echo "flake8 passed!"
Now let’s make hello.py:
And add some text to it:
The typo is on purpose. Add and commit it to the repository.
GitHub actions are the equivalent of serverside hooks on GitHub.
There are lots of things that can be done with GitHub actions: https://docs.github.com/en/actions
Here is an example of a simple cron job: https://github.com/mpi-astronomy/XarXiv

Materials: https://verdantfox.com/blog/how-to-use-git-pre-commit-hooks-the-hard-way-and-the-easy-way
- First key point. Brief Answer to questions. (FIXME)
{% include links.md %}
Content from Setting up the Command Prompt
Last updated on 2025-11-21 | Edit this page
Estimated time: 5 minutes
Overview
Questions
- How can I visualize Git information at a glance in the command prompt?
Objectives
- Improve your command prompt for working with Git
When working with the command prompt in Git, it may prove helpful to keep some information about the repository available at a glance. As Unix shells allow to modify the prompt, a natural approach is to integrate such information into the prompt itself.
What would be useful information to integrate into the prompt?
Take a minute to think about which information might be helpful to be shown as part of the prompt.
- The active branch. As you can swith between different branches of the same repository, it can sometimes be confusing to know which branch your working copy currently reflects. Presenting the branch name as part of the directory name you are currently in may help as a reminder.
- The state of the branch. An indicator on whether there are modified or uncommitted files in the repository may help in noticing uncommited changes in the repository.
Setting up the query infrastructure
Individual shells have specific ways to define the prompt and the information shown. Select the appropriate code snippet according to the shell you are running. If you are unsure which shell you are using, try the following code to identify the shell you are running.
As the idea to augment the command prompt with Git information is not new, the Git repository on Github (i.e., the repository hosting the source code for Git itself) also provides the shell code to query different information. You can download it to your home directory with the following commands.
BASH
you@computer:~$ curl https://raw.githubusercontent.com/git/git/refs/heads/master/contrib/completion/git-prompt.sh -o $HOME/.git-prompt.sh
Some shells, such as fish, xonsh, and others already have support for displaying Git repository information built-in.
Now you have the infrastructure set up to augment the command prompt with desired information about your Git repository.
Modifying the prompts
With the code to query the information is already available in the shell session, we still need to use the information in the definition of our prompt.
Tweaking the information shown
Using the git-prompt.sh script you can now tweak the
information shown in the prompt by setting specific environment
variables.
Indicating unstaged and uncommited changes in the working copy
By setting the environment variable
GIT_PS1_SHOWDIRTYSTATE to a non-empty value, the prompt
will indicate modified files in the working copy with an *
character.
BASH
user@computer:my_repo (main *)> git status
On branch main
Your branch is up to date with 'origin/main'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: a_file.py
no changes added to commit (use "git add" and/or "git commit -a")
user@computer:my_repo (main *)$
Indicating untracked files in the working copy
By setting the environment variable
GIT_PS1_SHOWUNTRACKEDFILES to a non-empty value, the prompt
will indicate the presence of untracked files in the working copy with a
% character next to the branch name.
BASH
user@computer:my_repo (main %)> git status
On branch main
Your branch is up to date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
untracked.pdf
no changes added to commit (use "git add" and/or "git commit -a")
user@computer:my_repo (main %)>
Indicating a stash in the working copy
Git supports saving modifications to the working copy in a so-called
stash that can later be reapplied to the working copy. By setting the
environment variable GIT_PS1_SHOWSTASHSTATE to a nonempty
value, the prompt will indicate wheter something is stashed, with a
$ next to the branch name.
Indicating the name of and difference to the upstream repository
The environment variable GIT_PS1_SHOWUPSTREAM can be set
to a space-seperated list of options to show relation of the local
working copy to an upstream repository. For basic use, you can select
between the following options.
- verbose show the number of commits behind (-) or ahead (+) if not equal (=) to upstream.
- name show the abbreviated name of the upstream repository
- auto chooses a sensible set of information depending on the status of the working copy.
BASH
you@itc19060:my_repo (main *)> echo $GIT_PS1_SHOWUPSTREAM
you@computer:my_repo (main *)> GIT_PS1_SHOWUPSTREAM="verbose"
you@computer:my_repo (main *|u+3)> GIT_PS1_SHOWUPSTREAM="verbose name"
you@computer:my_repo (main *|u+3 origin/main)> GIT_PS1_SHOWUPSTREAM="auto"
you@computer:my_repo (main *>)>
There are more options for advanced usage available. Check inside of
git-prompt.sh for documentation.
Colorizing the output
If the environment variable GIT_PS1_SHOWCOLORHINTS is
set to any value, the Git-related part of the output in the prompt will
be colorized. If the variable is not set, the output will not be
colorized.
- Use available scripts for common shell environments.
- Indicate changes stashed, pending, or commited to the local working copy.
- Indicate current branch name to aid in multi-branch workflows.
Content from Additional Resources
Last updated on 2025-11-21 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- What didn’t we cover?
Objectives
- Provide pointers to additional topics that are not currently covered
- Provide pointers to additional resources available



- GUIs can help with making some aspects of working with Git easier.
