2022-11-25

about git

the version control system

benefits of version control systems

having a history, database and transactions for code changes
see when certain changes have been made and in which context ("when feature x was updated")
see exactly what changes have been made. protect against accidentally losing work or adding unwanted changes
see who made what changes
revert changes
easily manage multiple variants or versions of the codebase and copy changesets between them. for example to maintain older versions, trying out new features, or developing new ones, all while other change histories of the code are unaffected

git has the advantage that it is decentralised (many repositories, each can share between each other) and having been developed for a big and active project like the linux kernel, where it is constantly tested

short descriptions

branch: a history with eventually more, less or different changes from some shared point in history. the name of the initial branch is "master" by default
commit: a commit is a changeset that has become part of the history. commits have database-unique identifiers and descriptions
detached head: when you checkout an old commit for example you leave the current branch and are not on any branch. you can go back by checking out a branch or create a new branch from the current state
fetch: download changesets from a remote repository without modifying local history
git: a system for managing a history, database and transactions for code changes
history: a series of changesets
index: contains the file changes that the user selected to be part of the current transaction
merge: combine two histories by for example adding new commits from a remote or local branch. changes from both histories to the same file or lines (conflicts) are tried to be resolved with several automatic merging strategies before manual intervention becomes necessary
pull: fetch and merge remote history of a branch with the current one
push: upload the current history. it will only be accepted if it extends the history on the server and does not conflict with it
remote: a configured repository identified by name. the default name for the remote a working copy was copied from is "origin"
repository: a working copy of the database with extracted code files or a serve-only database. no matter the kind, repositories have a shared interface and any repository can share changesets with any other
submodule: a repository of another codebase integrated into the current one in a subdirectory. the parent repository tracks only a commit id, typically independent of a branch. typical git commands are possible in the directory of the submodule. use case: including other dependent versioned libraries at a specific codebase version with the possibility to update easily
tag: a word used to identify a specific commit

special files

.git/: working copy repositories have one .git directory with most of the necessary control information for git. .git/config is text configuration and can be edited manually
.gitignore: a list of path patterns that will not be included in commits and changes will not be listed
.gitmodules: submodule configuration. versioned as a separate file outside the .git directory to track changes to the file

workflow

examples for the use of branches for development with multiple people

example branch purposes

next version

branch name: master or next

feature

changes for specific new features. if the feature is working, add the commits with a merge into the master branch. particularly useful if multiple features are developed in parallel and the main branch should still possibly receive changes at the same time

stable

tested versions

release

tested versions that correspond to some kind of release and might still receive fixes without being fully updated to the newest program version

example branch names

master
feature{-name}
stable
stable{-version-number}
next

what should the default master branch be for?

the master branch should contain the most recent development version. the question that can be used to decide what the master branch should be for is what version of the codebase someone who clones a repository without specifying a branch should receive, since they will get the master branch by default. switching branches is easy. should clones without explicit branch copy the development version or a stable version? from the perspective of a developer, who checks out the repository to contribute to the project, it would probaly be ideal to receive the most current development version, so they can start developing immediately without searching for the right branch and switching. from the perspective of an end-user of the application, it would be desirable to have "master" be a stable version, but git repositories are mainly for versioning source code in development, git repositories are primarily for developers also because end-users do not need git features as much and it is more complicated anyway than say downloading and extracting an archive file, besides requiring a git client. so why optimise a git repository for end-users

other arguments

"master should be stable for automatic deployment"

that does not make sense since it does not matter which branch is automatically deployed. automatic deployment is an automated process where the source and branch has to be configured in any case

"master is central and that corresponds to the existence of our unique most current stable version"

the general development state is also unique and it is more likely that there are more stable or release branches than the most current development branch where new features are continuously merged in

git flow

git flow is a set of tools and a specification of how to use git and branches that has gained popularity. but it is overcomplicated and not worth the effort. with extra commands, software wrappers and many forced requirements for the branching process (must use tags, must use hotfix and release branches, master is stable and branch named "develop" is added, et cetera) it introduces many sources of errors and ways to get into an unclean state. every developer has to install the utility applications. the actual benefit is not even that distinct because it is basically just feature branches

how to write commit messages

check out how the commits are done on kernel.org (where git originated), see the "browse" links
use imperative present tense, just because it makes words shorter: "add ipv6 support" instead of "added ipv6 support"
"network module: implement support for ipv6"
"network utilities: ip verifier: ipv6 parsing"
do not prefix tags to every commit message with things like "-feature-, -bugfix-" or similar without a good reason. most of the time there is no benefit situation in having them. they add noise to commit messages, making it harder to choose the right commit when the need arises to choose a previous commit in the worst case, and make readers always skip the first n characters in the best case. if you really need tags, consider adding them to the end of the message
remember that much of the most important information is already tracked by git: the list of files that has changed in a commit. query git or use a git log format to show the files changed per commit, like "git-log" in sph-script

other tips

if changes can be introduced from multiple remotes, use a separate branch for each remote. otherwise frequent merge conflicts are likely if changes from other servers are incorporated at the wrong times
branches can be of different histories. for example you can push the contents of a repository to a new branch on a completely different repository