Přeložit do češtiny pomocí Google Translate ...

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

The purpose of this lab is to demonstrate a key feature of Git: branches. They provide you with a powerful tool to manage day-to-day coding of team-developed software. But they can be useful in single-developer scenarios too.

Motivation

Software today is rarely a product of a single person: it is a team-driven activity and the team members need tools to collaborate on the software. This lab will show you what Git offers you in terms of cooperation for teams of virtually any size. We will also see how GitLab integrates with Git and what tools it offers to simplify the management part of the job.

Typically, a team effort requires that multiple people can work on a shared codebase in such way that:

  • work of each team member is isolated
  • but it is possible to combine work of several team members effortlessly (at least in cases where their work is truly independent)

Git provides these features via branches. In this lab, we will try to teach you how to use them. Unlike the previous labs that targeted single user, full usefulness of Git branching can be clearly seen only on a development inside a bigger team. We try to provide examples that demonstrate the concepts for a single-user team in as reasonable scenario as possible.

Before you start

Ensure that you do not skip any step in the examples below as otherwise some of things would stop making sense and you would not see the effects we want you to see.

Also note that some of the things mentioned here you already encountered. That is fine and intentional: in this lab they will form a coherent package of knowledge you will be using virtually every day in your future jobs.

If you are familiar with Git branching already, note that we are simplifying technical details where possible. We know that branches are really just pointers to the nodes of the acyclic graph of commits etc. but we believe that is not crucial for this text (and it is covered in NSWI154).

Create a fork of the teaching/nswi177/2021-summer/common/examples. repository. If you already created the fork during last lab, please, pull the changes first to your fork or delete the fork and re-fork.

Git branches

Git has a concept of branches that represent a series of commits. So far, your commits were linear – every commit (except the very first and last, obviously) had one previous commit and one next commit, the ordering represented how in time they were created.

Branching in Git allows you to break this linearity – a commit can have two successors: your work diverges and each branch follows an implementation of a different feature, for example. And commit can also have two parents: you merge changes from two branches together.

A typical example of this is work in a team. Alice and Bob both work on the same project, sharing the same repository. Alice works on feature A, Bob works on feature B. They both started with the same commit (e.g., just after release) but their work diverges: each add functions for the feature they work on. Once they are satisfied with their work, they merge their changes – i.e. they combine their code to contain both features.

Following pictures show examples of simple branches (in MSIM) as well as a more complicated siutation in a mid-size open-source project (HelenOS).

Internally, you have already worked with Git branches without knowing. When you clone a repository, you create an exact copy of the state that was on the server. Adding new commits locally actually works on a branch that is technically different from the branch on the server. Doing the push then joins these two branches again. Because so far, you never diverged in your work, the merge was implicit and you used branches almost transparently.

Running example

Throughout the lab we will work on the example csv_calc in 10 subdirectory of the example repository. Do not forget to clone your fork, not the original repository this time. You will be making quite a lot of changes there.

This example will try to emulate a work in team – whenever we talk about a different feature (or a bug), imagine that you are working in a big team and the features/bugs are not single-line fixes but multi-day effort for individual team members.

Look at the csv_calc.sh script and run it as described in the README file. Also read how the script works (that is in there too).

GitLab issues

Notice that the script prints error message when the expression is invalid. For example, for ./csv_calc.sh 'sum=t01+' <example.csv. However, exit code is always 0, denoting success.

The always-good error code is a bad practice that should be fixed. But it is not the only thing that needs fixing. To keep track of the unresolved issues of our project, let’s record them permanently. Open the homepage of your fork in the browser. In the sidebar, you should see a link to Issues: following the link would tell you that The Issue Tracker is the place to add things that need to be improved or solved in a project. That is exactly what we need ;-).

Let’s create a new issue describing the problem.

Fill-in the title – think about it as an e-mail subject –, it has to summarize what is wrong. Write also a description – in this case you might be thinking that title Bad exit code has it all. It does not! Write an example of bad behaviour there. Note the Markdown link there that brings you help for Markdown formatting - knowing about ` and ``` is a must for every programmer :-).

For further reading, look up the query on how to write a good bug report in your favourite search engine and read at least one article about it. It is worth it.

The bad exit code is not the only issue of the code. Let’s create an empty file precious.txt in our directory and run

./csv_calc.sh 'sum=t01 )); rm -f precious.txt; : $(( 0 + 2' <example.csv

The file is gone because we have injected malicious code into the expression.

This example of a security breach probably made you smile. But issues like this are real enough to not forget about them.

Thus, create another issue for this in your project.

Again: use a descriptive title, provide a meaningful description. Really. Use this as a practice for the graded task.

View a list of your issues. Each issue should now have a number next to it that we can reference later on.

Feature branches

To keep your code healthy, many projects follow a simple rule: commit as often as possible but code in the main branch (while the name is configurable, most of the time you will encounter name master or main) must be always correct (in the sense that all tests are passing).

If you are working on a feature, start a new branch. Work in that branch and merge your code into master only when the feature is completed. For some projects, pre-merge action often involves code-review or load testing of the new feature.

It has the clear advantage that whenever someone starts on a new branch, he can be reasonably sure that he is starting from a healthy code.

Depending on the project, there may be also extra branches like development where merging does not require code review or a branch named production that denotes the code that will be shipped (possibly automatically) to the customer.

We recommend reading A Simple Git Branch Workflow (dev.to) if you are interested in more details how this model works.

We will follow the above for example in this lab. For each new feature (or a bugfix) we will start a new branch and merge it to the main one only after we have tested it.

Creating the branch

We will start by fixing the bug with exit code. Let’s create a branch for that and switch to that branch.

In bigger team, there are even rules for proper branch naming. For this exercise, we will use issue/N for branches that are supposed to fix an issue with number N.

To create the branch, we will use git branch command.

git branch issue/1

This command does not do anything visible. It only marks the current (last) commit as the starting point for a new branch.

To actually switch to a new branch, we need to execute

git checkout issue/1

Right now, the switch has no visible effect - both master and issue/1 branches refer to the same state of files.

Now, write a fix for the issue. Hint.

Commit the change.

Connecting commits with issues and git commit --amend

Git is quite flexible when working with commits. If you realize that you want to change the last commit, you can git add files and then call git commit --amend. It will open your text editor with the commit message already filled-in so you can change it.

Use this feature and add to your last commit fixes #1. This will have two effects once you push this commit to GitLab. First of all, the issue will contain a link to the commit and the #1 in the commit message will become clickable to open the mentioned issue.

Because our commit fixed the issue, we have added the special keyword fixes to the commit message to automatically close the issue (there are plenty of issue closing patterns out there.

Note that it serves too purposes – it saves time (we do not have to switch to the browser at all) and it provides a valuable reference to which commit was actually responsible for fixing the bug. Note that the issue on GitLab is not yet marked as fixed, as we didn’t push any commits to the GitLab yet.

You should not have any uncommitted changes in your project. Let’s switch back to the master branch. Check that the script (after the switch) does not contain your fix.

Note that if you have your script opened in a text editor, it should warn you about file being changed on disk. If not, reload the file manually. Hint.

Technically, git commit --amend creates a new commit in place of the original one. That has several subtle implications, the most important is that the histories before and after the amending are different ones from Git’s perspective. This means that if you have already pushed the original commit, you should not amend it, because it will be difficult to push the new commit (because it doesn’t extend the history on the server).

Pushing a new branch

Switch back to issue/1 branch and push it to GitLab. If you run git push (as you were used to), Git will complain that the current branch has no upstream branch. It means (more or less) that you are pushing this branch for the first time and Git wants to make sure how to name the branch at the server.

Nice thing is that Git offers you the command to run to ensure the branch is pushed.

For now, ignore the link that GitLab sent you back.

Solution.

Open your project in the browser again. Check that your issue now contains a link to the commit that mentioned it and on the homepage of the project, you can select which branch to display.

Exercise

Second issue

Let’s now fix the second issue (the code-injection one). Create a new branch, resolve the issue and commit the fix.

Do not push the branch yet.

Some questions and thoughts:

  • Why do you need to switch to master first? How would the branching look like if you branch from issue/1? Why is that bad?
  • To actually fix the issue, consider using printf command that works similarly to printf you may know from other languages (or to .format from Python). %q directive is the one you are looking for.
  • Do not forget to include closes #2 (or similar) in the commit message.
Hint.

Hot-fix: typos

Let’s assume that you just now noticed the typos in README.md (form is not from and there are two typos).

We want to fix that right away and we will do it (just this one time) directly in master branch. This is often called hot-fix: something you need to fix ASAP and where breaking the usual habit of feature branch, code review, testing etc. is a hinderance instead of help.

So, switch to the master branch (you already committed the fix to issue #2, right?), fix the typos and commit it.

Push your changes from the master branch.

Commit graph

Open the Repository -> Graph page in your browser (from your project). It should show you your branches graphically.

You should see a new branch, issue/1 next to master that stem from the same commit.

The graphical view is a good help if you get lost in a complicated branching model and you are not sure whether some changes should be visible or not in a specific branch.

The purpose is not to create a complicated graphs though sometimes it can be quite wild.

You can also use --graph parameter for git log to have a graphical representation in the terminal.

Merge requests

Switch to branch for the second issue and push it to GitLab too. You will need to use the --set-upstream switch again.

Notice that after the push, you ought to see a text informing you about opening a merge request with a link.

Open that link now in your browser.

Merge-request is an advanced feature of GitLab that targets big teams. In big teams, code review is required before any code can be pushed to the master branch. We will now use it as a way to merge our fix in GitLab.

You will notice that the merge request is not yet submitted. Title and description are pre-filled and they look similar to the form we have seen with issues. Actually, they are very similar: issues describe known problems (or feature requests), merge requests describe how the problem was fixed. As GitLab states it, merge requests are a place to propose changes you’ve made to a project and discuss those changes with others.

It is a good practice to mention which issue the merge request closes (or issues that are related).

Again, Markdown formatting can be used.

Create the merge request now. WARNING: Double-check the destination of the merge request. It has to be branch in your repository, not the repository you forked from.

In big teams, other developers would now comment on your code and also automated checks would be run (you will setup automated checks in some of the next labs).

Even for personal repositories, merge request still make sense: they allow the developer to quickly check that everything is okay (i.e. that all files were actually committed etc.).

Let’s merge the request now (there is a big button for that).

Keep the default and do a merge (i.e. not a rebase or a squash).

The merge request being closed, we should see a new commit in the master branch.

You may also look at the repository graph again to see how the commits look after the merge.

Check issues of your project now and note that the second issue should have been closed now. You can also check the details of the issue and notice how the commit is nicely connected to the issue.

Back in your local clone of the repository: do not forget to pull the latest changes from master (GitLab created the commit on the server only). Hint.

Merging on the command-line

We will now merge the first issue directly on command-line without opening a merge request. Because the merge request is always bound to some kind of a branch, you can always merge on command-line too. Note again the dual-approach that is omnipresent in Linux: you can use nice graphical UI but also a fully automatable command-line interface.

First, we need to ensure that we are on the branch we want to merge into. Usually, that would be the master branch.

The actual merge is quite simple, indeed.

git merge issue/1

And it is done. Push the master branch again and check the repository graph now. Note that the merge is actually just a commit that uses two different commits as parents (previous commits). Indeed, most of the options are similar for both subcommands.

Merging upstream changes

We will now simulate that work in the upstream repository (i.e. the one you forked from) continues and you want to keep your repository (your fork) up-to-date.

That is a common task, by the way. You work on a new feature but you do not want to miss important updates that are happening in master. As a matter of fact, failing to keep your branch up-to-date with master can complicate merging later on. Depending on the size and activity of the project, it might make sense to merge upstream changes every week or even every day.

In some cases, you may even pull changes from different forks. If you see that someone else is working on a new feature, you may want to try it out and test how it works with your changes.

With Git, all this is possible and (maybe surprisingly) there is very little difference whether you merge your own (local) branch or changes of someone else working in a completely different fork.

To merge changes from a different repository than the default one (e.g. a different project on GitLab), we need to set-up so called remotes.

remote is a Git name for saying that your local clone also knows about other forks and can tell you whether there are differences. Again, this is overly simplified way of looking at things but is sufficient for the how-do-you-do of Git remotes. Usually you expect that the remotes share a common ancestor, i.e. the initial commits are the same across remotes.

To see your remotes, run (inside your local clone of your fork of the examples repository)

git remote show

It would probably print only origin. That is the default remote: when you do git pull or git push, it uses origin. Thus, you were using remotes even without knowing about it ;-).

Running it with -v (for verbose) will print what are the specific URLs where the remote is located. As a matter of fact, you will probably see two remotes now: one for push, one for fetch (pull). You can even configure Git to pull from a different repository than you are pushing too. Not very useful for us at the moment, though.

Adding another remote

Add a new remote to our repository will link it with a different project and we would be able to compare changes between them (again, a simplified view of things).

git remote add upstream git@gitlab.mff.cuni.cz:teaching/nswi177/2021-summer/common/examples.git

The above command added a remote named upstream that points to the given address (i.e. the original project). Note that Git is silent in this case.

Run git remote show again. How it changed?

Working with remotes

By adding the remote, no data were exchanged yet. You have to tell Git to do everything, nothing happens automagically. Note that if you ever encounter a different versioning system, Git will feel very low-level and perhaps even tedious to use. It is the price for its effectiveness and flexibility.

Let’s now fetch the changes from our new remote.

git fetch upstream

You should see the typical summary when cloning/pulling changes in Git, this time they referred to data from the upstream repository.

However, in your working tree (i.e., the directory with your project), nothing changed. That is fine, we only asked to fetch the changes, not apply them.

However, run git branch and git branch --all to see which branches you now have access to.

Comparing branches (and merging them too)

We will now investigate how the newly added remote differs.

Let’s start with showing commits on the remote:

git log remotes/upstream/lab/10/csv-calc-tests

As you can see, git log can show commits on certain branch only (yes, the remote/... is actually a branch name: after all, you have seen git branch --all). And it also works on files (e.g. git log README.md). It is quite powerful command indeed.

But we wanted to see how the code differs. That is actually even more important: you want to see which changes to the code were made and whether it would be possible to merge them at all.

git diff remotes/upstream/lab/10/csv-calc-tests

You ought to see a patch that displays that the newly added remote differs in one file only: automated tests were added.

They look pretty good – we want them in our project too.

Let’s merge the remote branch, then.

git merge remotes/upstream/lab/10/csv-calc-tests

Since there shall be no conflicts (i.e. both branches – master and remotes/upstream/lab/10/csv-calc-tests changed different files), the merge should be automatically completed.

Check your project directory: is the tests.bats file there?

Note that you can change the merge-commit message using --amend.

Resolving conflicts

Using the same approach, prepare for merge (i.e. do not run git merge yet) with upstream/lab/10/csv-calc-hotfix-fixed.

As you probably noticed, the second branch contains a typo fix. But you already fixed it (if not, fix it before merging!).

The merge will lead to so-called conflict: two developers touched the same file and made their individual modifications. We would need to resolve that manually.

That is quite common and there is no need to be afraid of it. Git is able to help you a lot – when there are changes to different parts of a file, Git is able to merge the changes without any problems. But when both branches change the same lines, it is up to you to resolve it. That is quite natural and you would be surprised how many times Git is able to merge things automatically.

Enough of theory, run the merge command now:

git merge remotes/upstream/lab/10/csv-calc-hotfix-fixed

This merge will end with an error and Git will inform you about the conflict.

Review the output from the merge command. Note how Git tries to help you what can be done…

Run also git status and investigate its output.

Now comes the tricky part of the whole workflow: you need to resolve the conflict. In our case, it is rather simply. For a complex software, resolving a conflict can be a very tricky operation as you need to check several places and mentally combine the changes first. Having automated tests can help but analytical thinking is certainly a plus.

Once you solve the conflict, you need to call git add (like with a normal commit because merge commit is still a commit, after all) to resolve the conflict.

git add README.md

To finish the merge, run git commit as with any normal commit.

Do not forget to push the changes to your repository.

How would the graphical representation of the commits in GitLab look like now?

Try to sketch it on a paper before opening the Graphs page in GitLab.

Graded tasks

The graded tasks for this lab are a little different as we need to test your understanding of the system that you also use for task submission.

Please, read carefully the task description and follow the instructions closely. Many of the things cannot be tested automatically inside the GitLab pipelines because they do not have access to the GitLab API and committing the API key anywhere is not possible in a secure manner.

10/csv_calc.sh (80 points)

Important: read the whole task description first as some details are explained later on.

Copy csv_calc.sh script into your submission repository. We expect that it will contain a fix for the exit code issue and the printf '%q' fix too.

This task has several subtasks that are listed below.

Issues

Create GitLab issues for the following two problems in the code. Recall that a good issue has a good description and write something reasonable there.

  • The script should be more user-friendly when it is incorrectly invoked: when a user runs it without any arguments or when it is executed with --help.
  • What happens if you re-create precious.txt (it was empty) and append the following line to the input CSV Mayor Humdinger,0,0,0';rm -f precious.txt;:' and run again the command from README.md?

To allow us automatically find your issues, add [task-help] string to the title of the first issue and [task-malicious] to the title of the second issue.

Note that the following one-liner will help you check that you have the right issues (replace API_KEY with your GitLab API key and PROJECT with a path to your project or the project number).

curl --silent --header "PRIVATE-TOKEN: API_KEY" "https://gitlab.mff.cuni.cz/api/v4/projects/PROJECT/issues" | jq -r '.[]|.title' | grep -e 'task-help' -e 'task-malicious'

Branches

Create separate branches for each of the issues, keep the naming issue/N. Start both branches from the same commit!

Fix both issues. Create a merge request for [task-help] and merge it. Include the string [merge-help] into its title.

Keep the branch for [task-malicious] unmerged but push it to GitLab. Ensure that we will be able to see origin/issue/XY branch when fetching your clone (XY will obviously refer to your issue [task-malicious]).

Your (preferably) last commit in [task-malicious] branch shall contain a commit message that automatically closes the issue.

Do not merge the [task-malicious] branch – we will merge it as part of the evaluation and check that it closed the issue.

Tests

Because the solution would be in multiple different branches, some of the tests will always fail. Reason is obvious: [task-help] branch would not contain fixes for [task-malicious] and vice versa (as mentioned above, start both branches from the same commit).

In other words: if all your tests for 10 passes in the same pipeline, you have it wrong.

For testing purposes, ensure --help produces following text:

csv_calc.sh EXPR - Simple CSV calculator

Reads CSV from standard input, adds column corresponding to provided EXPR.

EXPR can refer existing columns and is evaluated inside $(( )).

See project homepage for more details.

Invalid invocation shall terminate with non-zero exit code and error message of Invalid invocation, run with --help for manual. (printed to stderr). This includes invocation without parameters. Print this message even if you have not yet implemented --help.

Malicious code in the CSV shall be completely ignored when it is not part of the evaluated column.

Reverting things…

If you manage to merge branch you were not supposed to merge etc., do not worry. We recommend you remove the branches via GitLab UI (where possible), remove the script and start again with a new commit.

Please, ensure that you rename the related issues so that the query above (with curl) returns only the last issues that we are supposed to use.

Updates

  • We have unified naming to be issue/N across the whole text. If you have already used issues/N in your graded repository, it is okay, we will look for that branch too.

10/UPSTREAM.md (20 points)

You perhaps noticed that your submission repository is actually a fork of another one. (That was for technical reasons as it simplified the creation of the repository for us.)

But it means that you can merge from it a change from us. It now contains file 10/UPSTREAM.md with a Lorem Ipsum content.

Merge this file into your repository. Do not rebase or squash, do a normal merge, please.

Do not copy the file but use Git to perform the merge.

Deadline: May 24, AoE

Solutions submitted after the deadline will not be accepted.

Note that at the time of the deadline we will download the contents of your project and start the evaluation. Anything uploaded/modified later on will not be taken into account!

Note that we will be looking only at your master branch (unless explicitly specified otherwise), do not forget to merge from other branches if you are using them.

Changelog

2021-05-12: Stress that there are two typos to be hot-fixed.

2021-05-03: Unify naming of issue branches to issue/N. Fix problems with lab/10/csv-calc-hotfix, use lab/10/csv-calc-hotfix-fixed instead.