Introduction
- Git version control records text-based differences between files.
- Each git commit records a change relative to the previous state of the documents.
- Git has a range of functionality that allows users to manage the changes they make.
- This complex functionality is especially useful when collaborating on projects with others
Branches
- A branch represents an independent line of development.
-
git branchcreates a new pointer to the current state of the repository and allows you to make subsequent changes from that state. - Subsequent changes are considered to belong to that branch.
- The final commit on a given branch is its HEAD.
Remote Repositories
- The
git remotecommand allows us to create, view and delete connections to other repositories. - Remote connections are like bookmarks to other repositories.
- Other git commands (
git fetch,git push,git pull) use these bookmarks to carry out their syncing responsibilities.
Undoing Changes
-
git resetrolls back the commits and leaves the changes in the files -
git reset --hardroll back and delete all changes -
git resetdoes alter the history of the project. - You should use
git resetto undo local changes that have not been pushed to a remote repository. -
git revertundoes a commit by creating a new commit. -
git revertshould be used to undo changes on a public branch or changes that have already been pushed remotely. -
git revertonly backs out a single commit or a range of commits.
Merging
-
git merge --no-ffis the best way to merge changes -
git merge --ff-onlyis a good way to pull down changes from remote - merge conflicts happen when the same part of the same file has been modified in both branches
- merge conflicts must be resolved manually
Branching Models
- A branching model is a pre-agreed way of merging branches into the main branch.
- A branching model is needed when multiple contributors are making changes to a single project.
Forking Workflow
- The forking workflow allows third parties to prepare and propose changes without write access to the upstream repository
- The
mainbranch is not modified but only be updated from the upstreammainbranch - Branch off
mainto a feature branch, pushing to the forked repository (origin) - Update forked
mainbranch usinggit pull upstream mainwhereupstreamis the name of the upstream remote - Update your local feature branch by
git pull --rebase upstream main - Force push to origin branch for pull request updates.
Data Science Workflow
Large Files
- (Large) binary files can grow the repository size immensely and make it unusable
-
git lfsis an extension that stores large files outside the Git data model - Use of Git LFS is discouraged in many scenarios.
Undo, Move, cherry-pickWhat is a cherry-pick?
- You can cherry-pick specific commits from one branch to another
using
git cherry-pick <commit-hash>.
Interactive Rebase and Squash
- Use an interactive rebase to clean up merge requests before the merge.
- Rebased branches need to be force-pushed due to history changes.
- Squashing can be used to combine multiple commits
- Depending on the project policy merge requests may need to be cleaned up before they are allowed upstream.
Hooks
- Git provides a list of different hooks for you to run tasks at specific times in the commit
- Use the
pre-commithook to check for change conformity before changes are committed
Setting up the Command Prompt
- Use available scripts for common shell environments.
- Indicate changes stashed, pending, or committed to the local working copy.
- Indicate current branch name to aid in multi-branch workflows.
Additional Resources
- GUIs can help with making some aspects of working with Git easier.