Přeložit do češtiny pomocí Google Translate ...

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

This lab focuses on build systems – tools that streamline the process from a source code to a publishable artefact. This includes creating an executable binary from a set of C sources, HTML generation from Markdown sources or creating thumbnails in different resolutions from the same photo.

We will start with a non-related utility called Pandoc that is not really related to the course but is useful enough to be mentioned and used as a demonstrator here.

Setup

Update your fork of the examples repository (i.e., update with upstream changes). Note that clone of the upstream to your local machine would be fine but use it as another exercise for Git.

We will be using the 12 subdirectory.

Install pandoc (sudo dnf install pandoc).

Pandoc

Pandoc is a universal document converter that can convert between various formats, including HTML, Markdown, Docbook, LaTeX, Word, LibreOffice or PDF.

Basic usage

Start with running (inside 12/pandoc directory of the examples repo).

cat example.md
pandoc example.md

As you can see, the output is conversion of the Markdown file into HTML, though without HTML header.

Update: note that Markdown can be combined with HTML directly (useful if you want a more complicated HTML code: Pandoc will copy it as-is).

<p>This is an example page in Markdown.</p>
<p>Paragraphs as well as <strong>formatting</strong> are supported.</p>
<p>Inline <code>code</code> as well.</p>
<p class="alert alert-primary">
Third paragraph with <em>HTML code</em>.
</p>

If you add --standalone, it generates a full HTML page. Let’s try it (both invocations will have the same end result).

pandoc --standalone example.md >example.html
pandoc --standalone -o example.html example.md

Try opening example.html in your web browser too.

As mentioned, Pandoc can create OpenDocument too (the format used mostly in OpenOffice/LibreOffice suite).

pandoc -o example.odt example.md

Note that we have omitted here the --standalone as it is not needed for anything else than HTML output. Check how the generated document looks like in LibreOffice/OpenOffice or you can even import it to some online office suites.

Note that you should not commit example.odt into your repository as it can be generated. That is a general rule for any file that can be created.

Sidenote about LibreOffice

Did you know that LibreOffice can be used from command-line too? For example, we can ask LibreOffice to convert a document to PDF via the following command.

soffice --headless --convert-to pdf example.odt

The --headless prevents opening any GUI and --convert-to should be self-explanatory.

Combined with Pandoc, three commands are enough to create HTML page and PDF output from a single source.

Pandoc templates

By default, Pandoc uses its own default template for the final HTML. But we can change this template too.

Look into template.html. It looks similar to the Jinja one you have seen earlier but instead of {{ there are “things” in dollars. So when the template is expanded (or rendered), the parts between dollars would be replaced with actual content.

Let’s try it with Pandoc.

pandoc --template template.html index.md >index.html

Check what the output looks like. Notice how $body$ and $title$ was replaced.

Running example

We will use the above mentioned files as a running example. We will use them as a starting point for a very simple website that displays information about a tournament.

Notice that for index and rules there are Markdown files to generate HTML from. Page teams.html – that is mentioned in the menu – does not have a corresponding Markdown file.

Instead, it is generated via following command.

./bin/teams2md.sh teams.csv | cat header-teams.md - | pandoc --template template.html >teams.html

Note that items in teams.csv are separate by spaces and are intended to be read with read -r team_id team_color team_name.

Why build systems

The above steps for creating the simple tournament website serve as the main motivation for this lab.

While the above steps do not build an executable from sources (as is the typical case for software development), they represent a typical scenario.

Building a software usually consists of many steps that can include as different actions as

  • compiling source files to some intermediate format
  • linking final executable
  • creating bitmap graphics in different resolution from single vector image
  • generating source-code documentation
  • preparing localization files with translation
  • creating a self-extracting archive
  • deploying the software on a webserver
  • publishing an artefact in a package repository

Almost all of them are simple by themselves. What is complex is their orchestration. That is, how to run them in the correct order and with the right options (parameters).

For example, before an installer can be prepared, all other files have to be prepared. Localization files often depend on precompilation of some sources but have to be prepared before final executable is linked. And so on.

Even for small-size projects, the amount of steps can be quite high yet they are – in a sense – unimportant: you do not want to remember these, you want to build the whole thing!

Note that your IDE can often help you with all of this – with a single click. But not everybody uses the same IDE and you may even not have a graphical interface at all. Furthermore, you typically run the build as part of each commit – the GitLab pipelines we use for tests are a typical example: they execute without GUI yet we want to build the software (and test it too). Codifying this in a build script simplifies this for virtually everyone. (Note that we will discuss GitLab pipelines in the next lab.)

To illustrate the above points, let’s imagine how a shell script for building the example site would look like.

#!/bin/bash

set -ueo pipefail
set -x

mkdir -p out
pandoc --template template.html index.md >index.html
pandoc --template template.html rules.md >rules.html
...

This looks like a nice improvement: a new member of the team would not need to investigate all the tiny details and just run single build.sh script.

The script is nice but it overwrites all files even if there was no change. In our small example, it is no big deal (you have a fast computer, after all).

But in a bigger project where we, for example, compile thousands of files (e.g. look at source tree of Linux kernel, Firefox or LibreOffice), it matters. If an input file was not changed (e.g. we modified only rules.md) we do not need to regenerate the other files (e.g. we do not need to re-create index.html).

Let’s extend our script a little bit (look-up man test for -nt).

#!/bin/bash

set -ueo pipefail
set -x

mkdir -p out
[ "index.md" -nt "index.html" ] \
    && pandoc --template template.html index.md >index.html
[ "rules.md" -nt "rules.html" ] \
    && pandoc --template template.html rules.md >rules.html
...

We can do that for every command to speed-up the web generation.

But.

That is a lot of work. And probably the time-saved would be all wasted by rewriting our script. Not talking about the fact that the result looks horrible. And is rather expensive to maintain.

And often we need to build just part of the project: e.g., regenerate a documentation only (without publishing an artefact, for example). Although extending the script along the following way is possible, it certainly is not viable for large projects.

if [ -z "$1" ]; then
    ... # build here
elif [ "${1:-}" = "clean" ]; then
    rm -f index.html rules.html teams.html
elif [ "${1:-}" = "publish" ]; then
    cp index.html rules.html teams.html /var/www/web-page/teams/
else
    ...

Luckily, there is better way.

make

Move into 12/make directory first, please. The files in this directory are virtually the same but there is one extra file: Makefile. Notice that Makefile is written with capital M to be easily distinguishable (ls in non-localized version will print files with capitals first).

This file is a control file for a build system named make that does exactly what we tried to imitate in the previous example.

It contains so called dependencies and actions to execute when the dependants are out-of-date (i.e. dependency is newer than the target). Note that these are file-based dependencies – i.e., index.html depends on index.md (and template.html too) – not library dependencies (i.e., that this requires Pandoc).

Let us start with running make. Execute the following command.

make

You will see the following output (if you have executed some of the commands manually, the output may differ).

pandoc --template template.html index.md >index.html
pandoc --template template.html rules.md >rules.html

make prints the commands it executes and runs them. It has built the website for us: notice that the HTML files were generated.

Execute make again.

make: Nothing to be done for 'all'.

As you can see, make was smart enough to recognize that since no file was changed, there is no need to run anything.

Update index.md (touch index.md would work too) and run make again. Notice how index.html was rebuilt while rules.html remained untouched.

pandoc --template template.html index.md >index.html

This is called an incremental build (we build incrementally only what was needed instead of building everything from scratch).

As we mentioned above: not that interesting for our small example. Assume there would be thousands of input files, the difference would become noticeable.

It is also possible to execute make index.html to explicitly specify that we want to rebuild only index.html. Even in this case, make will not rebuild unnecessarily.

If you wish to force a rebuild, execute make with -B. Often, it is also called unconditional build.

In other words, make allows us to capture the simple individual commands needed for project build (no matter whether it is a web site generation or C source code files compilation) into a coherent script. It takes care of dependencies and executes only commands that are really needed.

Makefile explained

Makefile is a control file for the build system named make. In essence, it is a domain-specific language to simplify setting up the script with the [ ".." -nt ".." ] constructs we mentioned above.

It contains so called dependencies and actions to execute when the dependants are out-of-date (i.e. dependency is newer than the target).

We will start with the following fragment:

rules.html: rules.md template.html
      pandoc --template template.html rules.md >rules.html

Important: the indenting in Makefiles have to be done with tabs so make sure your editor does not expand tabs to spaces. It is also a common issue when copying fragments from a web-browser. (Usually, your editor will recognize that Makefile is a special filename and switch to tabs-only policy by itself.) Otherwise, you will get an error Makefile:LINE_NUMBER: *** missing separator. Stop..

The fragment has three parts.

Before the colon is the name of the target. That is usually a filename and describes what we want to build. Here it is rules.html.

The second part is after the colon till the end of the line. It lists dependencies. make looks at the dependencies and if they are newer than the target, it means that the target is out-of-date and needs to be rebuild.

The third part are the following lines that has to be indented by tab and contains commands that has to be executed for the target to be build. Here, it is the call to pandoc.

Together, we can read it as a rule that describes when it is needed to build a target and how.

The rest of the Makefile is similar. There are rules for other files and also several special rules.

Special rules

The special rules are all, clean and .PHONY. They are special because they do not represent a real file like we have seen for other rules.

all is a traditional name for the very first rule in the file. Note that it lists as its dependencies all the generated files.

The first rule is also called default rule and is executed by default. As you have probably guessed, by default we want to build everything (more precisely: update everything that needs to be updated).

clean is a special rule that has no dependencies but instead has only commands that remove everything in out. It is a useful service-style rule for removing generated files (e.g. to start with fresh state, save disk space etc.). Typically, clean removes all files that are not versioned (i.e., under Git control).

As make expects that target name is filename, we need to tell it that all and clean are actually not filenames (i.e. we are not creating file all as one could expect) via the special target .PHONY.

This weird approach is basically a design flaw of make that was originally created as a one-shot utility and somehow survived for more than 40 years. Note that despite the age, make is still used even in new projects and is also often used as a backend. That is, you have something smarter that generates Makefile and lets make do the actual work.

Exercise

One. On your own, extend the Makefile with a calls to the generating script. Do not forget to update the all rule.

Solution.

Two. Notice that there is an empty out/ subdirectory (it contains only single .gitignore that specifies that all files in this directory shall be ignored by Git and thus not show with git status). Update the Makefile to generate files into this directory. The reasons are obvious:

  • The generated files will not clutter your working directory (you do not want to commit them anyway).
  • When syncing to a webserver, we can specify the whole directory to be copied (instead of specifying individual files).
Solution.

Three. Add a phony target upload that will copy the generated files to a machine in Rotunda. Create (manually) a directory there ~/WWW. Its content will be available as http://www.ms.mff.cuni.cz/~LOGIN/.

Note that you will need to add the proper permissions for the AFS filesystem using the fs setacl command.

Solution.

Four. Add generation of PDF from rules.md (using LibreOffice). Note that soffice supports --outdir parameter.

Think about the following

  • Where to place the intermediate ODT file?
  • Shall there be a special rule for the generation of ODT file or shall it be done with a single rule with two commands?
Solution.

Improving the maintainability of the Makefile

The Makefile starts to have too much of a repeated code.

But make can help you with that too.

Let’s remove all the rules for generating out/*.html from *.md and replace them with:

out/%.html: %.md template.html
      pandoc --template template.html -o $@ $<

That is a pattern rule that captures the idea that HTML is generated from Markdown. Here, the percent sign represents so called stem – the variable (i.e., changing) part of the pattern.

In the command part, we use make variables (they start with dollar as in shell) $@ and $<. $@ is the actual target and $< is the first dependency.

Run make clean && make to verify that even with pattern rules, the web is still generated.

Apart from pattern rules, make also understands variables (technically, they are constants because they cannot be changed).

They can improve the readability as you can group configuration at one place and commands at the other.

PAGES = \
      out/index.html \
      out/rules.html \
      out/teams.html

all: $(PAGES) ...
...

Note that variables are expanded in $(VAR) form.

Non-portable extensions

make is a very old tool that exists in many variants built by different vendors. The features mentioned so far should work with any version of make.

The last addition will work in GNU make only (but that is the default on Linux so there shall not be any problem).

We will change the Makefile as follows:

PAGES = \
      index \
      rules \
      teams

PAGES_TMP := $(addsuffix .html, $(PAGES))
PAGES_HTML := $(addprefix out/, $(PAGES_TMP))

We keep only the basename of each page and we compute the output path. Note that there is := used in the computation and the $(addsuffix and $(addprefix are function calls. Arguments are separated by comma but they operate as if $(PAGES) was an array.

Note that we added PAGES_TMP only to improve readability when using this feature for the first time. Normally, you would only have PAGES_HTML assigned directly to this.

PAGES_HTML := $(addprefix out/, $(addsuffix .html, $(PAGES)))

This would be useful especially if we would like to generate PDF for each page too as we would not have to repeat the list of pages.

Graded tasks

The graded tasks are based on the running example from this lab.

The task consists of several parts that are partially independent. For each of these, we provide an approximate ratio of points we will assign – the point assignment is preliminary and is subject to change (take it as a guidance about the complexity of individual parts after you finish the exercises from the lab itself).

12/Makefile (100 points)

Copy the version you have so far here (including all the source files). Do not commit any files that could be generated (except directories out and tmp with their .gitignore files).

Your main task is to add support for generation of task pages and a generation of a scoring page.

But few things needs to be done first.

Note that we expect that your Makefile will be structured in such way to ensure that only minimal set of files is regenerated when any of the input files change. For example, change of index.md shall not trigger regeneration of rules.html; on the other hand, change in template.html shall trigger rebuild of all HTML pages.

Do not commit generated files.

Set-up (about 20%)

Before you tackle the rest of the assignment, the following should work.

Running make shall generate the following “simple” pages (i.e. they are generated from their Markdown sources using the provided template).

  • out/index.html
  • out/teams.html
  • out/rules.html

Page out/teams.html is generated using the teams2md.sh script from teams.csv and header-teams.md.

Note that we do not want to generate PDF (it would complicate the setup of GitLab pipeline).

Update: all information about the teams is stored inside teams.csv that uses white-space separator (see notice above).

Generation of rules.odt (about 10%)

As a first mini-task, add generation of out/rules.odt from rules.md. (We do not want to generate PDF as it would complicate the setup of the pipeline). Add link to the main menu to the ODT file.

Each task is stored inside a single file in tasks (see one.md and two.md).

The list of tasks has to be hardcoded in the Makefile like this (it also takes care of the correct ordering). Do not try to implement any directory listing or something similar.

TASKS = one two

Generate an ODT file for each one (using the default ODT template), from task file tasks/XX.md generate file out/task-XX.odt.

Add an extra script into bin that takes as arguments list of task ids and prepares a simple overview with links.

For the provided tasks, the produced page would look like this (in its Markdown form – feel free to generate this in HTML directly but we expect you would use the common template).

---
title: Tasks
---

* [Task one](task-one.odt)
* [Second task](task-two.odt)

Ensure that Makefile contains rules to generate out/tasks.html from the above Markdown. Generate this page by default.

Add a menu link to the tasks.html page (i.e., edit the template).

Feel free to use Python or Bash for the generation. It is up to you – we will just check that you generate from the list of tasks (i.e., for the tests, we will add a task to the TASKS variable and check that the proper ODT and list item would appear).

Update: note that task name is specified inside the Markdown file and shall not be hard-coded in the scripts.

Update: you can assume that the title would be always present but it does not have to be always on the second line. Consider using Pandoc to extract the title.

Generation of a scoring table in score.html (about 40%)

File score.csv contains scoring of the teams. It uses passwd style format where columns do not have headers and are separated with :.

Extend your Makefile so that it generates score.html. It will use the same template as the rest of the pages, the actual content would contain the following.

Please, keep the id of score-table, we will use it in the tests to compare the generated table with the expected format.

<table id="score-table">
  <thead>
    <tr>
      <th>Team</th>
      <th>Task one</th>
      <th>Second task</th>
      <th>Sum</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Alpha team</td>
      <td>20</td>
      <td>5</td>
      <td>25</td>
    </tr>
    <tr>
      <td>Team bravo</td>
      <td>18</td>
      <td>-</td>
      <td>18</td>
    </tr>
    <tr>
      <td>Zulu</td>
      <td>-</td>
      <td>8</td>
      <td>8</td>
    </tr>
  </tbody>
</table>

It is up to you how you generate this page but ensure that the following changes will trigger regeneration of the out/score.html file.

  • Change in the score.csv file.
  • Change in teams.csv file.
  • Change in any of the used tasks/*.md files.
  • Changing TASKS variable in Makefile.

Again, if you decide to use Python for the scripting part, it is fine.

Note that we expect that you will pass $(TASKS) as a parameter to the generation script. There is no need for the script to dynamically scan contents of the tasks/ directory to find a list of tasks (again, $(TASKS) in the Makefile also takes care of proper ordering).

Note for Python

If you would require an external library for your Python scripts, ensure we can setup a virtual environment using requirements.txt and setup.py (though you will probably not need them at all).

Deadline: June 7, AoE

Solutions submitted after the deadline will not be accepted.

Note that at the time of the deadline we will download the contents of your project and start the evaluation. Anything uploaded/modified later on will not be taken into account!

Note that we will be looking only at your master branch (unless explicitly specified otherwise), do not forget to merge from other branches if you are using them.

Changelog

2021-05-24: Clarification about task titles.

2021-05-19: Markdown can be combined with HTML.

2021-05-17: ACL permissions on Rotunda; mention data format explicitly.