If you’ve wondered what GitHub is and all the lingo that goes along with Open Source development: pull, merge, fork etc. Here is a good guide written by Eliot Berriot for non-programmers.
https://pad.funkwhale.audio/s/rJAeAstL4#
[copy of this page stored here for posterity, please visit the link for the most up-to-date version]
Building software together with Git
There are not many resources available out there that introduce Git, GitHub, versioning, forks, pull requests and other development-related jargon to non-technical audience. This post is an attempt to help with that.
How do we collaborate on software?
Like most projects, successful software projects work best when multiple people can work on different tasks in parallel. In a typical organization, you expect accountants, managers, secretaries, sales folks, and in fact everyone to work on their own tasks, seamlessly, at the same time.
You want, as much as possible, to avoid situations when someone needs to wait for someone else to proceed. We call those situations bottlenecks. A typical bottleneck would be having a single phone in a 100-people company: everyone have to wait to make a call, which is a waste of time.
Software development work in the same way: developers, designers, translators, and pretty much everyone want to work without worrying about each other tasks, especially as project grows and attract dozens or even hundreds of contributors.
In order to deal with that, people involved in software development usually rely on a few tools and processes I will describe below.
What is software?
What we call software, in its most common form, is just a set of text files, also known as a codebase. Those text files contains instructions than can be executed by computers. Yes, the act of programming is just about writing stuff.
Of course, a a programmer, you have to think about what you are writing, like a storywriter ;)
If you’ve ever worked on on thesis, or any long-form textual essay, you faced a lot of the issues developers encounter when they need to collaborate on the same piece of software.
When you’re in the process of writing an essay, quite often, you’ll need:
- Reviewing: you want someone else to read your work, and possibly suggest or edit it
- Collaboration: you want someone else to work on a section of the documentation, while you’re also working on another one
- Versioning: the ability to go back to a previous version of the document (e.g. because you deleted something by mistake)
You can achieve reviewing and collaboration that by sending email copies of your working version to other people involved, then integrating their changes in your own copy, regularly, via copy pasting.
For versioning, “redo/undo” features of your text editor can help, and also copying your document on a separate medium from time to time.
However, if you’ve worked with more than one or two people on the same document, you now, this is absolutely awful to manage, and very error prone. Did you send the latest versions to your friends? Have you integrated all their suggestions? How do you go back to yesterday’s version of your work when your last copy was made last week?
Software development is exactly the same. But usually with more people involved ;)
Introducing Git
As its core, Git and associated tools like GitHub are an attempt to solve the issues I described in the previous section.
Git allows people to contribute on the same codebase in a sane and efficient way. However, to do so, it completely rethink the way we should collaborate and introduce new concepts. All of this sounds like jargon to newcomers and is frankly overwhelming, so I’ll try to demystifies this a bit.
Commits and Versioning
At its core, Git provides a mechanism to version a codebase.
Each version of the codebase is basically a snapshot of the codebase, associated with the snapshot date. This gives you versioning, because you can go back in time, to any previous snapshot.
Those snapshots are named commits.
However, doing a full copy of the project with each commit would require a lot of space. Git is a bit smarter than that, and it will only store the differences (called diffs) between each commit.
Let’s take a concrete example:
- Alice starts her software project, she create a text file with 10 lines, and creates the first commit
- 3 days later, she makes a change on line 7 and create another commit. The commit will store the fact that line 7 was edited.
- 5 days later, she deletes line 3 and creates another commit. The commit will only store the fact that a line was deleted.
All those commits create a log, or history of what happened in the project:
- Day 1: Alice added 10 lines
- Day 4: Alice edited line 7
- Day 9: Alice removed line 3
And if we want to go back to day 1, we can tell git to undo the changes from day 9 and day 4, in that order, and we’ll obtain the codebase like it was on the first day. Then we can replay the next commit, the one from day 4, to go to the next version of the project, then replay the commit from day 9 to obtain our latest version.
You may remember I mentioned three desirable features: versioning, collaboration and reviewing. Commits bring us versioning, and also auditability: who did what, and when, which is a nice bonus.
Branches and parallelization
On day 10, Alice decides she want to experiment with something new, but she’s not 100% sure this will work.
To start hacking on her idea, she creates what git calls a branch. You can think of branches as roads, splitting off another one. Eventually, two roads can join, but that’s not mandatory.
In git, all commits happened on a branch, the default one is usually named the master branch. So if we represent the current state of the project with that in mind, this is how it could look like:
| master branch |
- Commit from day 1: Alice added 10 lines |
- Commit from day 4: Alice edited line 7 |
- Commit from day 9: Alice removed line 3
So, Alice starts this new branch, named experiment from the branch master branch. The projects now looks like that:
| master branch |
- Commit from day 1: Alice added 10 lines |
- Commit from day 4: Alice edited line 7 |
- Commit from day 9: Alice removed line 3 | |\n| \n| | experiment branch
The master branch still exists, on the left, but Alice is now working on the experiment branch, on the right.
She’s very productive, and makes a few commits on that branch:
| master branch |
- Commit from day 1: Alice added 10 lines |
- Commit from day 4: Alice edited line 7 |
- Commit from day 9: Alice removed line 3 | |\n| \n| | experiment branch | | | * Commit from day 11: Alice added 10 new lines | | | * Commit from day 13: Alice edited lines 5 to 9
Because she’s satisfied with the changes, she decides to merge the experiment branch into the master branch. This git’s way to apply changes from one branch to another. Remember the previous road analogy I used? This is how the merge would look like:
| Main road (master branch) | |\ Roads are splitting | \n| | Secondary road (experiment branch) | | | | | | | | | / |/ Roads are joining | | The main road remains
When the merge is done, the experiment branch is deleted, and all its commits are now present on the master branch:
| master branch |
- Commit from day 1: Alice added 10 lines |
- Commit from day 4: Alice edited line 7 |
- Commit from day 9: Alice removed line 3 |
- Commit from day 11: Alice added 10 new lines (from experiment branch) |
- Commit from day 13: Alice edited lines 5 to 9 (from experiment branch)
If, for any reason, Alice wasn’t satisfied with her experiment, she could could have deleted it without merging it, and the master branch would have remain unaffected.
Branches are a powerful but also hard-to-grasp concept in git. They are useful to experiment without risk, but also to enable cooperation, as we’ll see in the next section.
Repositories and collaboration
In the previous scenario, Alice was alone. But on day 14, her friend Bob wants to help her with this new project. How can he achieve that with git?
When Alice started to work on the project, she was using her local copy, what we call a repository. You can think of a repository as a workspace, belonging to someone (Alice, in this case).
Since Bob wants to start contributing, he will need his own repository. One way to do that is for Alice to push her repository on a platform like GitHub or GitLab, have Bob create an account here, and use the fork button.
Forking essentially means “creating a copy of someone else’s repository”.
When Bob forks Alice repository, he ends up with an exact copy of her repository. It’s git’s equivalent of “sending your thesis by email to a friend”.
So, Bob has a working repository, and starts adding some commits on the master branch:
| Bob’s workspace / master branch | | (previous commits omitted) |
- Commit from day 11: Alice added 10 new lines (from experiment branch) |
- Commit from day 13: Alice edited lines 5 to 9 (from experiment branch) |
- Commit from day 14: Bob edited lines 8 |
- Commit from day 15: Bob deleted line 12 |
Bob added two commits on day 14 and 15. He’d like this to be included in Alice’s repository. One way to do that using platforms like GitHub or GitLab is to create a pull request (named merge request in GitLab, but those are the same thing).
Do you remember when Alice merged her experiment branch in her master branch in the previous section? Well a pull request is essentially asking someone to merge a branch from your repository, into a branch of their repository.
So, Bob creates the Pull Request:
Hello Alice! I’d like to merge the branch master from my repository into the master branch of your repository I’ve added one commit that fixes a typo, and one commit that improve the performance. Let me know if you have any questions, Bob
When Alice receives that pull request, she’ll be able to review Bob’s commits, and decide whether she is fine with those. That’s what we call a code review.
During the code review, Alice will read the changes introduces by Bob’s commit, suggest some changes, and when she’s satisfied with the result, accept the Pull Request.
Accepting the pull request will merge Bob’s master branch into the master branch in her repository:
| Alice’s workspace / master branch | | (previous commits omitted) |
- Commit from day 11: Alice added 10 new lines (from experiment branch) |
- Commit from day 13: Alice edited lines 5 to 9 (from experiment branch) |
- Commit from day 14: Bob edited lines 8 (from bob/master branch) |
- Commit from day 15: Bob deleted line 12 (from bob/master branch) |
Of course, she also could have refused the pull request, in which case her master branch would have been left untouched.
Using branches, repositories and pull requests, Alice and Bob managed to collaborate on the same piece of software. How exciting!
Bonus: Issues and releases
If you’ve read until here, things should be less scary for you. However, there are a few things you may want to know about software development and the way we usually collaborate while working on software.
Issues
Issues are an important part of software development. You may have heard those sentences already: “Please file an issue” or “Please open a bug in our issue tracker”. But what is an issue?
Issues, also known as tickets, bug reports or feature requests are messages posted in a project’s issue tracker. Developers, contributors and software users usually open issues to:
- Keep track of a new bug in the software
- Suggest an improvement or a new feature
- Ask a question about the software behaviour
Other people can usually comment issues, discuss about possible solutions and pitfalls, provide workarounds, etc.
When development is needed to address the issue or the feature request, a developer will usually create a branch, work on a fix, then submit a pull request with the changes. Once this pull request is accepted, the related issue is usually closed.
To sum it up, this is the typical life cycle of an issue:
- Bob encounters a bug in the software
- Bob opens an issue describing the bug
- Maria, who is facing the same bug, adds a comment on the issue, and describes a possible solution
- Alice decides to work on the issue
- She assigns the issue to herself, creates a branch, commit the changes that fix the issues, and open a pull request with that branch
- The pull request is merged into the master branch
- Bob’s issue is closed
Issues are extremely useful, because the constitute the memory of a project, and also gives a lot of insight about future development, popular requests and common problems faced by a community.
On a daily basis, contributors working on a project tend to fix specific issues, which ensure they work on different problems and helps achieving parallelization.
Releases
Releases, also known as tags are the last missing piece of typical software development process.
Most projects tend to follow similar cycles:
- Project maintainers/communities choose a set of issues they deem priority
- Contributors fix those issues
- Once all selected issues are fixed, a release is published
- End users update to the new release
- Back to 1.
A release is a version of a software that is distributed widely and intended to improve or replace previous releases.
Usually, releases are named using a specific pattern, like version 1.2.3, version 1.2.4 and version 1.3.
Conclusion
I hope you enjoyed this essay, and the explanation gave you a clearer view of what’s going on in software development.
As developers, we tend to forget that we’re using obscure jargon that makes us look like occult wizards. There is some kind of elitism behind that, of course, but I think it also happens because software development is really a weird, strange field, with its own problematics and solutions.
It takes some effort to untangle everything and demonstrate the usefulness of all of this in a non-technical way. If you think I failed somewhere, or there is a missing piece, please let me know!
original source: https://pad.funkwhale.audio/s/rJAeAstL4#