Introduction


  • Git version control records text-based differences between files.
  • Each git commit records a change relative to the previous state of the documents.
  • Git has a range of functionality that allows users to manage the changes they make.
  • This complex functionality is especially useful when collaborating on projects with others

Branches


  • A branch represents an independent line of development.
  • git branch creates a new pointer to the current state of the repository and allows you to make subsequent changes from that state.
  • Subsequent changes are considered to belong to that branch.
  • The final commit on a given branch is its HEAD.

Remote Repositories


  • The git remote command allows us to create, view and delete connections to other repositories.
  • Remote connections are like bookmarks to other repositories.
  • Other git commands (git fetch, git push, git pull) use these bookmarks to carry out their syncing responsibilities.

Undoing Changes


  • git reset rolls back the commits and leaves the changes in the files
  • git reset --hard roll back and delete all changes
  • git reset does alter the history of the project.
  • You should use git reset to undo local changes that have not been pushed to a remote repository.
  • git revert undoes a commit by creating a new commit.
  • git revert should be used to undo changes on a public branch or changes that have already been pushed remotely.
  • git revert only backs out a single commit or a range of commits.

Merging


  • git merge --no-ff is the best way to merge changes
  • git merge --ff-only is a good way to pull down changes from remote
  • merge conflicts happen when the same part of the same file has been modified in both branches
  • merge conflicts must be resolved manually

Tags


  • git tag allows us to mark a point we can return to.
  • A tag is tied to a commit.

Branching Models


  • A branching model is a pre-agreed way of merging branches into the main branch.
  • A branching model is needed when multiple contributors are making changes to a single project.

Forking Workflow


  • The forking workflow allows third parties to prepare and propose changes without write access to the upstream repository
  • The main branch is not modified but only be updated from the upstream main branch
  • Branch off main to a feature branch, pushing to the forked repository (origin)
  • Update forked main branch using git pull upstream main where upstream is the name of the upstream remote
  • Update your local feature branch by git pull --rebase upstream main
  • Force push to origin branch for pull request updates.

Data Science Workflow


Large Files


  • (Large) binary files can grow the repository size immensely and make it unusable
  • git lfs is an extension that stores large files outside the Git data model
  • Use of Git LFS is discouraged in many scenarios.

Undo, Move, cherry-pickWhat is a cherry-pick?


  • You can cherry-pick specific commits from one branch to another using git cherry-pick <commit-hash>.

Interactive Rebase and Squash


  • Use an interactive rebase to clean up merge requests before the merge.
  • Rebased branches need to be force-pushed due to history changes.
  • Squashing can be used to combine multiple commits
  • Depending on the project policy merge requests may need to be cleaned up before they are allowed upstream.

Hooks


  • Git provides a list of different hooks for you to run tasks at specific times in the commit
  • Use the pre-commit hook to check for change conformity before changes are committed

Setting up the Command Prompt


  • Use available scripts for common shell environments.
  • Indicate changes stashed, pending, or committed to the local working copy.
  • Indicate current branch name to aid in multi-branch workflows.

Additional Resources


  • GUIs can help with making some aspects of working with Git easier.