Přeložit do češtiny pomocí Google Translate ...

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

This lab will be the last one fully devoted to shell scripting. We will introduce several interesting programs that are worth knowing and then devote the time to several bigger scripts.

Assorted notes

There are hundreds of little programs on a machine with installed Linux. It is not possible to remember all of them – use your favorite search engine because it is highly probable that a program doing exactly what you want already exists :-). Many of the small programs are contained in either GNU coreutils or util-linux collections. Following programs are those that you should know that exist (and find information about their arguments in the manual).

xargs

xargs in its simplest form reads standard input and converts it to program arguments for a user-specified program.

Assume we have the following files in a directory:

2021-03-10.txt  2021-03-16.txt  2021-03-22.txt  2021-03-28.txt
2021-03-11.txt  2021-03-17.txt  2021-03-23.txt  2021-03-29.txt
2021-03-12.txt  2021-03-18.txt  2021-03-24.txt  2021-03-30.txt
2021-03-13.txt  2021-03-19.txt  2021-03-25.txt  2021-03-31.txt
2021-03-14.txt  2021-03-20.txt  2021-03-26.txt
2021-03-15.txt  2021-03-21.txt  2021-03-27.tx

The following script removes files that are older than 15 days:

cutoff_date="$( date -d "15 days ago" '+%Y%m%d' )"
for filename in 2021-[01][0-9]-[0-3][0-9].txt; do
    date_num="$( basename "$filename" .txt | tr -d '-' )"
    if [ "$date_num" -lt "$cutoff_date" ]; then
        echo rm "$filename"
    fi
done

This means that the program rm would be called several times, always removing just one. Note that we have echo rm there to not actually remove the files but to demonstrate the operation. The overhead of starting a new process could become a serious bottleneck for larger scripts (think about thousands of files, for example).

It would be much better if we would call rm just once, giving it a list of files to remove (i.e., as multiple arguments).

xargs is the solution here. Let’s modify the program a little bit:

cutoff_date="$( date -d "15 days ago" '+%Y%m%d' )"
for filename in 2021-[01][0-9]-[0-3][0-9].txt; do
    date_num="$( basename "$filename" .txt | tr -d '-' )"
    if [ "$date_num" -lt "$cutoff_date" ]; then
        echo "$filename"
    fi
done | xargs echo rm

Instead of removing the file right away, we just print its name and pipe the whole loop to xargs where any normal arguments refer to the program to be launched.

Instead of many lines with rm ... we will se just one long line with single invocation of rm.

Of course, tricky filenames can still cause issues as xargs assumes that arguments are delimited by whitespace. (Note that for above, we were safe as the filenames were reasonable.) That can be changed with --delimiter.

If you are piping input to xargs from your program, consider delimiting items with zero byte (i.e., the C string terminator, \0). That is the safest option as this character cannot appear anywhere inside any argument. And tell xargs about it via -0 or --null.

Note that xargs is smart enough to realize when the command-line would be too long and splits it automatically (see manual for details).

It is also good to remember that xargs can execute the command in parallel (i.e., split the stdin into multiple chunks and call the program multiple times with different chunks) via -P. If your shell scripts are getting slow but you have plenty of CPU power, this may speed things up quite a lot for you.

find

While ls(1) and wild-card expansion are powerful, sometimes we need to select files using more sophisticated criteria. There comes the find(1) program useful. Without any arguments, it lists all files in current directory, including files in nested directories. Do not run it on root directory (/) unless you know what you are doing (and definitely not on the shared linux.ms.mff.cuni.cz machine).

With -name parameter you can limit the search to files matching given wildcard pattern. Following command finds all alpha.txt files in current directory and in any subdirectory (regardless of depth).

find -name alpha.txt

Why the following command for finding all *.txt files would not work?

find -name *.txt

Hint. Answer.

find has many options – we will not duplicate its manpage here but mention those that are worth remembering.

-delete immediately deletes the found files. Very useful and very dangerous.

-exec runs a given program on every found file. You have to use {} to specify the found filename and terminate the command with ; (since ; terminates commands in shell too, you will need to escape it).

find -name '*.md' -exec wc -l {} \;

Note that for each found file, new invocation of wc happens. This can be altered by changing the command terminator (\;) to +. See the difference between invocation of the following two commands:

find -name '*.md' -exec echo {} \;
find -name '*.md' -exec echo {} +

Caveats

By default, find prints one filename per-line. However, filename can even contain the newline character (!) and thus the following idiom is not 100% safe.

find -options-for-find | while read filename; do
    do_some_complicated_things_with "$filename"
done

If you want to be really safe, use -print0 and read -d$'\0' as that would use the only safe delimiter – \0. Alternatively, you can pipe the output of find -print0 to xargs --null

However, if you are working with your own files or the pattern is safe, the above loop is fine (just do not forget that directories are files too and they can contain \n in their names too).

Note that shell allows you to export a function and call back to it from inside xargs.

#!/bin/bash

my_function() {
    echo ""
    echo "\$0 = $0"
    echo "\$@ =" "$@"
}
export -f my_function

find . -print0 | xargs -0 -n 1 bash -c 'my_function "$@"' arg_zero arg_one

tmux

tmux is a terminal multiplexer. That means that inside one terminal it opens several terminals for you that are running in parallel. It also allows you to send the session into background so that it remains there even if you log out or your remote connection is interrupted (i.e., useful for running long scripts). In other words, tmux gives you tabs (called windows) inside your existing session that can be minimized/iconified (if borrowing terms from GUI would explain the usefulness better).

The simplest way how to start tmux session is simply:

tmux

Alternatively, we can start session with some meaningful name:

tmux new -s <session_name>

To list all sessions run:

tmux ls

To connect/attach to the running session run:

tmux a -t <session_name>

And finally, in order to kill the session we use:

tmux kill-session -t <session_name>

Inside the session we are able to create multiple windows, split the screen and much more. In order to invoke a tmux command, we need firstly to type tmux prefix. The default key binding is C-b.

In order to detach from session we can simply press (do not forget to type the prefix!):

d  detach from session

Operation with windows:

c  create window
w  list windows
n  next window
p  previous window
f  find window
,  name window
&  kill window

Sometimes it is useful to split the screen into several terminals. These splits are called panes.

Operation with panes (splits):

%  vertical split
"  horizontal split

o  swap panes
q  show pane numbers
x  kill pane
←  switch to left pane
→  switch to right pane
↑  switch to upward pane
↓  switch to downward pane

Other feature is that you can toggle writing simultaneously to all panes. Performing the same operation multiple times may seem not much useful, but you can for example open several different ssh connections in advance and then interactively control all these computers at the same time.

To toggle it, type the prefix and then write :set synchronize-panes. If you want to try this in Rotunda, please, do not run too computational-intensive tasks…

As usual with linux tools, you can highly modify its behavior via rc configuration. For instance, in order to navigate among the panes with vim shortcuts, modify your .tmux.conf so it contains

bind-key C-h run "tmux select-pane -L"
bind-key C-j run "tmux select-pane -D"
bind-key C-k run "tmux select-pane -U"
bind-key C-l run "tmux select-pane -R"

Personal tip № 0: tmux is excellent tools for collaboration, as multiple users can attach to the same session.

*Personal tip № 1 *: when you give a lecture, you can attach to the tmux session from two terminals. Later on, you push the first one to the projector, while the second one is on you laptop screen. This eliminates the necessity of mirroring your screen. Together with pdfpc and tiling window manager we get a Swiss Army knife for presentation.

There is much more to say. For more details see this tmux cheatsheet or manual pages.

. or source

We have seen that program behavior can be changed using switches and we have used getopt for that. But often it is better to load user defaults from a configuration file.

We can certainly design our own format for such configuration file but for a shell script, shell script is actually the simplest approach.

We can allow the user to specify the configuration like this (e.g., into ~/.nswi177/example.rc) and then we can simply load this into our main script.

DEFAULT_OUTPUT=html

You have probably realized that adding the following to your script will not work (even if we assume that the file always exist).

bash "$HOME/.nswi177/example.rc"

This is because then rc file is run in new process (subshell). The variable would not be visible once the nested script terminates.

Luckily, shell offers . (or source) command to include the file without starting a nested shell.

if [ -r "$HOME/.nswi177/example.rc" ]; then
    . "$HOME/.nswi177/example.rc"
fi

This allows the user to include arbitrary shell code in the example.rc but it is generally a small price to pay (it is the user that is executing the code, after all).

Note that ~/.bashrc is source-d into your shell the same way – that is why it can be a plain shell script and the variable values etc. are not lost.

Pro hint: reusable configuration

Soon you will acquire plenty of small aliases or functions that you use regularly and your .bashrc will start to grow beyond maintainability.

Using . you can split it into multiple files. Many people then keep their aliases and scripts in a separate repository and only include it to their local .bashrc with one command – keeping the same aliases available across multiple machines. Some linux distributions actually prepare (empty) .bash_aliases or .bash_functions for you that are also sourced when you launch bash.

Here-doc syntax

Often we need to redirect stdin from a multi-line constant. For example, generating HTML header is possible with the following fragment but is definitely not very readable.

echo '<html>' >file.html
echo '<head>' >>file.html
echo '<title>Title</title>' >>file.html
...

Shell offers a so-called here-doc syntax (the marker is user-defined).

cat >file.html <<'MARKER'
<html>
<head>
<title>Title<title>
...
MARKER

Without the apostrophes, variable expansion happens even inside the here-doc block.

jq

We assume you have already seen JSON somewhere – it is the format used to define JavaScript objects (data) and also the transportation format for majority of web applications.

{
  "file_path": "01/GRADING.md",
  "branch": "master"
}

This file format is notoriously difficult to parse with standard shell tools as JSON can be specified all-on-one-line and standard tools use end-of-line as a major separator.

However, there is jq that offers a mini-language to extract or transform a JSON. And it works as a classic filter (i.e. reads stdin and prints to stdout) – a perfect match for shell scripts.

Understanding jq fully is out-of-scope for this course but following examples would give you a basic overview of the capabilities it offers.

We will use the following file as input.

{
    "description": "Alphabet",
    "list": [
        {
            "name": "alpha",
            "value": "able",
            "id": 1
        },
        {
            "name": "bravo",
            "value": "baker",
            "id": 2
        },
        {
            "name": "charlie",
            "value": "castle",
            "id": 3
        }
    ]
}

jq uses .field to access certain field (key) and [] for accessing arrays. Internally, it uses | to pipe data inside itself.

jq -r '.description'
jq -r '.list'
jq -r '.list|.[]|.name'
jq -r '.list|.[]|(.name + " => " + .value)'
jq -r '.list|.[]|(.id + ": " + .name)' # does not work
jq -r '.list|.[]|((.id|tostring) + ": " + .name)'

Please, refer to the on-line manual for more examples and details.

Demos

We will describe the following scripts in a bit more detail to explain typical idioms you can encounter. We will also build the scripts incrementally to give you an idea how to approach building bigger scripts.

But we provide complete script as well for you to check that you have build it from the fragments correctly.

System information

Task description

Write a script that prints basic system information (hardware platform, kernel version, number of CPUs and RAM size). The user would be able to choose different output formats.

Solution.

Solution description

The core of our script is simple.

echo "Hardware platform: $( uname -m )"
echo "Kernel version: $( uname -r )"
echo "CPU count: $( nproc )"
echo "RAM size: $( sed -n 's#^MemTotal:[ ]*\([0-9]*\) kB#\1#p' </proc/meminfo )"

This output is useful for a human reader but not for machine processing. So let’s add a version that prints the output as assignment to shell variables that can be later used. I.e., in the following format.

PLATFORM="x86_64"
KERNEL_VERSION="5.10.16-arch1-1"

Of course, duplicating the script to contain the following is not a nice solution.

if [ "$format" = "shell" ]; then
    echo "PLATFORM=$( uname -m )"
    ...
else
    echo "Hardware platform: $( uname -m )"
    ...
fi

But it is possible to convert between these two formats. Let’s convert our script like this:

if [ "$format" = "shell" ]; then
    column_no=1
else
    column_no=2
fi
(
    echo "PLATFORM:Hardware platform:$( uname -m )"
    echo "KERNEL_VERSION:Kernel version:$( uname -r )"
    echo "CPU_COUNT:CPU count:$( nproc )"
    echo "RAM_TOTAL:RAM size:$( sed -n 's#^MemTotal:[ ]*\([0-9]*\) kB#\1#p' </proc/meminfo )"
) | cut '-d:' -f $column_no,3-

Not perfect but we are getting there. Let’s hide the conversion into a separate shell function.

format_normal() {
    cut '-d:' -f 2,3
}

format_shell() {
    cut '-d:' -f 1,3 | sed 's#:\(.*\)#="\1"#'
}

Then the script would contain the following pipe.

(
    ...
    echo "RAM_TOTAL:RAM size:$( sed -n 's#^MemTotal:[ ]*\([0-9]*\) kB#\1#p' </proc/meminfo )"
) | "format_${format}"

In a sense, we have used a polymorphism in our script as the $format variable is technically a replacement of a virtual method table.

Adding JSON is a bit more complicated but still doable. Note that we down-case the variable names for nicer output. The final sed is used to replace trailing comma (JSON is a very strict format).

format_json() {
    local varname
    local varvalue
    echo "{"
    cut '-d:' -f 1,3 | sed 's#:# #' | while read -r varname varvalue; do
        echo -n "$varname" | tr 'A-Z' 'a-z' | sed 's#.*#  "&": #'
        echo "\"$varvalue\"",
    done | sed '$s#,$##'
    echo "}"
}

We can certainly use getopt to allow the user to select the output format but we will opt for using a configuration file or setting an environment variable. Then, the default format can be specified in "$HOME/.nswi177/sysinfo.rc" or the script can be launched with env.

env FORMATTER=json ./sysinfo.sh

Many programs offer you all three options where the script first loads the settings from a configuration file, optionally overrides them with a environment variable and getopt can override these.

The loading in the script then looks like this (we switched to capitals to emphasize that the variable comes from the user and thus will be exported).

if [ -r "$HOME/.nswi177/sysinfo.rc" ]; then
    . "$HOME/.nswi177/sysinfo.rc"
fi

if [ -z "${FORMATTER:-}" ]; then
    FORMATTER="${DEFAULT_FORMATTER:-normal}"
fi

Task description

Write a shell script for creating simple HTML gallery. The script shall create thumbnails and a simple HTML overview page. The script shall not assume anything about original file names (i.e., coming from the camera) but shall produce a gallery that uses only reasonable filenames.

Each photo should be accompanied by a timestamp as stored in EXIF.

The script shall give the user a flexible way to specify which files to include in the gallery.

Solution.

If you want several files with EXIF information, you can download them here. But obviously, your own photos would be much better :-).

Solution description

If we want to give the user a flexible way how to specify the list of images, we will use find. We will specify that first argument to the script is path to the directory where to create the gallery and rest will be passed as-is to find.

output_dir="$1"
shift

mkdir -p "$output_dir"

find "$@" -print0 | ...

Obviously, to protect us from weird file names, we need to use xargs and --null separator. But we would like to call our function because we need to parse EXIF, resize the image etc.

Following will do the trick.

make_thumb() {
    local output_dir="$1"
    local input_image="$2"

    # Show progress
    echo -n "." >&2

    # rest of the code goes here
}
export -f make_thumb

echo -n "Preparing " >&2
find "$@" -print0 | xargs --null -n 1 bash -c 'make_thumb "$@"' _ "$output_dir"
echo " done." >&2

Note that find does not sort the files and we need to take care of that ourselves.

We will use a simple trick that actually simplifies the code and also allows us to run the code in parallel. For each file, we will create a small HTML fragment with the <img /> tag that we will later join for the final HTML.

Let’s extend make_thumb the following way. Note that we use md5sum(1) to convert the filename to a reasonable name. You have already seen convert so its use shall not be surprise.

identify is another ImageMagick utility that is able to print EXIF data.

make_thumb() {
    local output_dir="$1"
    local input_image="$2"
    local safe_name exif_timestamp
    safe_name="$( realpath "$input_image" | md5sum | cut '-d ' -f 1 )"
    exif_timestamp="$( identify -format '%[EXIF:DateTime]' "$input_image" )"

    # Show progress
    echo -n "." >&2

    convert "$input_image" -resize 400x300 "$output_dir/$safe_name.thumb.jpg"

    echo "$safe_name $exif_timestamp" >"$output_dir/$safe_name.csv.fragment"
    (
        echo "<div>"
        echo "<p>$exif_timestamp</p>"
        echo "<p><img src=\"$safe_name.thumb.jpg\"></p>"
        echo "</div>"
    ) >"$output_dir/$safe_name.html.fragment"
}

As you can see, we will have .html.fragment with HTML code and .csv.fragment with file name and the timestamp.

Creating the final HTML is then easy. Note how we use here-doc for adding HTML header and how sorting can be asked to sort by a specific column.

(
    cat <<'EOF_HEADER'
<html>
<head>
<title>Gallery</title>
</head>
<body>
EOF_HEADER
    cat "$output_dir/"*.csv.fragment | sort -t ' ' -k 2,3 | while read -r safe_name _; do
        cat "$output_dir/$safe_name.html.fragment"
    done
    cat <<'EOF_FOOTER'
</body>
</html>
EOF_FOOTER
) >"$output_dir/index.html"

Finally, we remove the fragments file and we are done.

GitLab forum lister

Task description

Create a skeleton command-line client for the Forum that we use for solving technical issues in this course.

The initial implementation must support listing opened issues and searching in existing ones.

Solution.

Solution description

While the task may seem daunting, it is less than 100 lines of code and you will see that we will be able to build it from small blocks that are quite easy to understand.

We will start by generating an GitLab API token. GitLab allows the users to use a programmable API that returns JSON – we will definitely not parse HTML.

Create the access token. For this, we require only read_api scope, store the generated key inside KEY file.

To get a taste how the API works, the following get list of your SSH keys.

curl -X GET --silent --header "PRIVATE-TOKEN: $( cat KEY )" "https://gitlab.mff.cuni.cz/api/v4/user/keys"

Pipe it to jq '.[]|.title' to see their titles.

This API is extremely powerful and very easy to script (there are bindings for many languages available if you do not want to parse the JSON manually).

We will start with setting few variables to make our script more readable.

GITLAB_URL="https://gitlab.mff.cuni.cz/"
GITLAB_API_URL="$GITLAB_URL/api/v4/"
GITLAB_API_KEY="$( cat KEY )"

PROJECT_PATH="teaching/nswi177/2021-summer/common/forum"

And we will add a simple wrapper around curl to simplify writing the requests.

gitlab_get() {
    local url_path="$1"
    shift
    curl -X GET --silent --header "PRIVATE-TOKEN: $GITLAB_API_KEY" "$@" "${GITLAB_API_URL}${url_path}"
}

Getting the keys then requires only the following.

gitlab_get "user/keys" | jq '.[]|.title'

Issues are under project/PROJECT_PATH/issues. However, the PROJECT_PATH needs to be URL-encoded. We will use Python for that.

urlencode() {
    python3 -c 'import sys, urllib.parse; print(urllib.parse.quote_plus(sys.argv[1]))' "$1"
}

Listing the issues is thus very easy. Note that we print them as their ID (number) followed by title.

get_issues() {
    gitlab_get "projects/$( urlencode "PROJECT_PATH" )/issues?state=$1" \
        | jq -r '.[]|((.iid|tostring) + " " + .title)'
}

get_issues opened

We obviously want to add links to the list and format it in a bit nicer way.

format_list() {
    local iid title
    while read -r iid title; do
        echo "$title (#$iid)"
        echo "$GITLAB_URL/$PROJECT_PATH/-/issues/$iid"
        echo
    done
}

Hence the main of the script would be just the following function.

list_opened_issues() {
    get_issues "opened" | format_list
}

This works quite well unless we want to view the closed issues too. Because then we get only few of them. For obvious reasons, GitLab uses paging.

Re-run the first curl with --dump-header - added to see HTTP headers too.

curl --verbose -X GET --silent --dump-header - "PRIVATE-TOKEN: $( cat KEY )" "https://gitlab.mff.cuni.cz/api/v4/user/keys"

Notice that it contains the following fragment.

x-next-page:
x-page: 1
x-per-page: 20
x-prev-page:
x-total: 4
x-total-pages: 1

The X-Total header says that there are four items total. These fit onto a single page. Our issues in the Forum do not. Therefore, we need to update our gitlab_get to iterate through all pages.

We will do that in two phases. First request will be used only to get number of items and following requests will iterate the pages.

get_http_header() {
    grep -i "^$1:" | cut '-d:' -f 2 | sed -e 's/\r$//' -e 's/ //g'
}

gitlab_get_with_paging() {
    local per_page=50
    local object_count max_page page
    object_count="$( gitlab_get "$1" -d "per_page=1" --dump-header - -o /dev/null | get_http_header "X-Total" )"
    max_page="$(( object_count / per_page + 1 ))"
    (
        echo "["
        for page in $( seq 1 $max_page ); do
            [ "$page" -gt 1 ] && echo ","
            gitlab_get "$1" -d "per_page=$per_page" -d "page=$page"
        done
        echo "]"
    ) | jq 'flatten'
}

This requires a bit of an explanation. Getting the object count gets the X-Total header from the response. The extra function takes care of that and strips white space too. The -d is used to add extra parameters to the HTTP GET query. We then iterate over pages and retrieve the objects.

We build a list of list from these that we flatten with jq.

Because the paging is actually a technical detail, we will rename the functions a bit: gitlab_get_with_paging becomes gitlab_get and the original one will be gitlab_get_one.

For searching through the issues, we will create following function.

grep_all_issues() {
    local regex="$1"
    local iid title
    get_issues "all" | while read -r iid title; do
        $LOGGER_DEBUG "Getting notes for $iid ($title)..."
        if get_notes_bodies_for_issue "$iid" | grep -e "$regex" >/dev/null; then
            echo "$iid" "$title"
        fi
    done | format_list
}

This one shall be obvious. We go through all issues and we retrieve their bodies (this will be shown later) and if grep is successful, we pipe it again to format_list. Note how we reuse format_list and get_issues for easier maintainability.

Getting the issue body is actually very simple.

get_notes_bodies_for_issue() {
    gitlab_get "projects/$( urlencode "PROJECT_PATH" )/issues/$1/notes" \
        | jq -r '.[]|.body'
}

And that is it. The main body of the script is rather simple.

case "${1:-list}" in
    list)
        list_opened_issues
        ;;
    grep)
        if [ -z "${2:-}" ]; then
            echo "Regex missing" >&2
            exit 1
        fi
        grep_all_issues "$2"
        ;;
    *)
        echo "Usage: ...">&2
        exit 2
esac

Not bad for few lines of code, right?

Examples

More examples, this time without detailed explanation. But with a solution from us.

Counting total points for your graded tasks

Use API at projects/$( urlencode "$PROJECT_PATH" )/repository/files/$( urlencode "$FILE_PATH" )/raw?ref=master to download all GRADING.md files in your repository and compute total sum of points.

Solution.

Use API at https://freegeoip.app/ to get your location by GeoIP and use it to get times of sunrise/sunset from https://sunrise-sunset.org/api.

Solution.

Finding duplicates

Write a script that finds duplicates (i.e., same files) from a set of files specified using find(1) switches.

For example, the user will run find-duplicates ~/books -name '*.txt' and the script will try to find all files in ~/books with txt extension that have the same content.

Solution.

Templating e-mail generator

Assume you have input file with fields in the following format.

dusty@alpha.example.d3s.mff.cuni.cz:Dusty:Developer
windlifter@alpha.example.d3s.mff.cuni.cz:Windlifter:Reporter
pulaski@alpha.example.d3s.mff.cuni.cz:Pulaski:Guest
maru@alpha.example.d3s.mff.cuni.cz:Maru:Guest

And a template where {{N}} denotes a replacement of N-th field from the file above.

From: chug@alpha.example.d3s.mff.cuni.cz
Subject: {{3}} account activation

Hello, {{2}},

I just activated your account at https://service.example.d3s.mff.cuni.cz.

Please, check that you can access it with role {{3}}.

/Chug

Write a script that expands the {{N}} macros in the template and sends e-mails to all recipients from the first file.

While it is possible to fully automate this (we will show this in some of the later labs), in this excercise we will compose the e-mail in Thunderbird and send it manually. That is fine for low number of the e-mails to send and we have full control over the e-mail (i.e., for extra corrections just before sending it).

Note that the following command opens the Compose window and fills-in the data.

thunderbird -compose "to=address@example.com,subject='The subject',body='Body of the e-mail',from='youraddress@example.com'"
Solution.

Graded tasks

07/web_machine_status.sh (25 points)

For this script we expect generally the same behaviour as for the previous task.

However, instead of reading the data from a file, you will be reading them from a remote API over HTTP (i.e., like we used GitLab in the examples above).

The API (actually, it is a set of pre-genarated files but normally such output would be taken from database of some monitoring tool and represent current state of your cluster) is available here: https://d3s.mff.cuni.cz/f/teaching/nswi177/tasks/07/machines/.

Information about specific machine is at /MACHINE_ID/, list of services is at /MACHINE_ID/services/ (e.g., here and here).

Note that you will not need all the information from the API.

The switch -i sets the base URL (instead of the file). So, by default it will be https://d3s.mff.cuni.cz/f/teaching/nswi177/tasks/07/machines/ but some of the tests set it to https://d3s.mff.cuni.cz/f/teaching/nswi177/tasks/07/mini to test on a even smaller dataset (and graded tests will use a bigger dataset).

Use curl to fetch the data (we will depend on its behavior and protocol support during testing).

Your script does not need to handle situation when d3s.mff.cuni.cz is not available. You can safely assume, that machine name is a reasonable one (alphanumberic, dots and dashes) and that machine name is unique.

Note that the help text was updated a little bit, it shall now read the following. We (again) forgot to update the header to reflect the assignment name but we believe it is better to keep it like this instead of changing it in the middle of the assignment.

Usage: machine_state.sh [options] [machine-filter]
 -h --help            Print this help.
 -i --input URL       Read state from URL.
 -q --quiet --silent  No output, only exit code.

The machine filter is a regular expression: for tests, we will be using only the basic parts of regular expressions: matching a substring (e.g., alpha) or matching a set of characters (e.g., worker[0-9][0-9]). These should be the same across all tools and dialects.

07/file_preview.sh (25 points)

Write a shell script printing preview of files given as arguments based on their types.

For each file it should write one line with its name in format --- FILE_PATH --- (as present on the command line) followed by the first 30 lines of file preview, which is: for text file (text/*) just the file content; for pdf file (application/pdf) the text obtained by pdftotext; for multimedia file (image/*, video/*) the output of exiftool; for other files the human readable description of their types obtained from file command. Keep one empty line after each preview.

The exiftool is part of the perl-Image-ExifTool package (sudo dnf install perl-Image-ExifTool).

Hint: --mime-type option of file command may be useful.

Example call assuming existence of the mentioned files:

./file_preview.sh UK-8916-version1.pdf archive.zip /etc/fstab photo.jpg

Output:

--- UK-8916-version1.pdf ---
Studijní a zkušební řád

V. ÚPLNÉ ZNĚNÍ STUDIJNÍHO A ZKUŠEBNÍHO ŘÁDU
UNIVERZITY KARLOVY

ze dne 17. prosince 2020
Akademický senát Univerzity Karlovy se podle § 9 odst. 1 písm. b) a § 17 odst. 1 písm. g) zákona č. 111/1998 Sb.,
o vysokých školách a o změně a doplnění dalších zákonů (zákon o vysokých školách), ve znění pozdějších předpisů,
usnesl na tomto Studijním a zkušebním řádu Univerzity Karlovy, jako jejím vnitřním předpisu:

Část I. - Základní ustanovení

Čl. 1
Úvodní ustanovení
Tento řád navazuje na příslušná ustanovení zákona č. 111/1998 Sb., o vysokých školách a o změně a doplnění dalších
zákonů (zákon o vysokých školách), ve znění pozdějších předpisů, a statutu Univerzity Karlovy, ve znění pozdějších
změn, (dále jen „statut“) a upravuje pravidla studia na Univerzitě Karlově (dále jen „univerzita“).
Čl. 2
Vysokoškolské vzdělání
1. Na univerzitě se uskutečňují tyto typy studijních programů:
a. bakalářský,
b. magisterský, který navazuje na bakalářský studijní program,
c. magisterský, který nenavazuje na bakalářský studijní program,
d. doktorský.
2. Profil bakalářského nebo magisterského studijního programu může být
a. profesně zaměřený s důrazem na zvládnutí praktických dovedností potřebných k výkonu povolání podložených
nezbytnými teoretickými znalostmi, nebo
b. akademicky zaměřený s důrazem na získání teoretických znalostí potřebných pro výkon povolání včetně
uplatnění v tvůrčí činnosti a poskytující rovněž prostor pro osvojení nezbytných praktických dovedností.
3. Seznam akreditovaných studijních programů včetně jejich typu, profilu, formy výuky, standardní doby studia a

--- archive.zip ---
Zip archive data, at least v2.0 to extract

--- /etc/fstab ---

#
# /etc/fstab
# Created by anaconda on Mon Mar 29 19:35:11 2021
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
UUID=0e7494b9-739d-4f7c-b44f-6c54c8633be0 /                       btrfs   subvol=root     0 0
UUID=c3ffa9a0-c289-47a3-9563-816cca2e2efa /boot                   xfs     defaults        0 0
UUID=ca676f7f-7415-46db-add2-259a5bf73264 none                    swap    defaults        0 0

--- photo.jpg ---
ExifTool Version Number         : 12.16
File Name                       : photo.jpg
Directory                       : .
File Size                       : 28 KiB
File Modification Date/Time     : 2019:05:14 12:39:51+02:00
File Access Date/Time           : 2021:04:09 00:58:22+02:00
File Inode Change Date/Time     : 2020:06:05 22:31:34+02:00
File Permissions                : rw-------
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
JFIF Version                    : 1.01
Exif Byte Order                 : Little-endian (Intel, II)
Make                            : SONY
Camera Model Name               : ILCE-6000
Orientation                     : Horizontal (normal)
X Resolution                    : 350
Y Resolution                    : 350
Resolution Unit                 : inches
Software                        : GIMP 2.10.6
Modify Date                     : 2018:10:24 13:50:50
Y Cb Cr Positioning             : Co-sited
Exposure Time                   : 1/60
F Number                        : 0
Exposure Program                : Aperture-priority AE
ISO                             : 400
Exif Version                    : 0230
Date/Time Original              : 2017:10:27 16:17:01
Create Date                     : 2017:10:27 16:17:01
Components Configuration        : Y, Cb, Cr, -

Update: if some of the input files do not exist, the script must terminate with exit code 1 but shall print preview of all existing files first. The format of error message is not prescribed.

07/find_complementary.sh (25 points)

On the linux.ms.mff.cuni.cz machine you will find a file /srv/nswi177/arabidopsis.fasta.

This file contains a nucleic acid sequence of the thale cress (Arabidopsis thaliana). Except for the first line it simply contains the sequence wrapped into 70 characters wide columns.

Your task is to write a shell script that finds complements for a given subsequence, including numbers of the matching lines.

The script will take a regular expression as an argument. That will represent a sequence to find in the arabidopsis.fasta file. Found sequences must be converted to their complement using simple rule that nucleotide A is converted to T (and vice versa) while C is converted to G (and vice versa). Others are left intact. As the final action representing the desired output, the script find the complements in the original arabidopsis.fasta file, printing them together with their line numbers.

Note that the script should order the matches by the original (i.e. not converted) sequence alphabetically and should filter out duplicates (in the original match, not in the complements!).

As an example, the following runs produces this output.

./find_complementary.sh A[TCA]GC
AAGC
3:TTCG
6:TTCG
6:TTCG
8:TTCG
13:TTCG
19:TTCG
ATGC
3:TACG
19:TACG

Note that the sequences matching the regex (A[TCA]GC) are printed in alphabetical order (AAGC and ATGC) while the complements (TTCG and TACG) are sorted by rows.

For a string that has no complement in the file, only the original string will be displayed.

Use uniq to get rid of multiplicities. Note that uniq works only on sorted data.

Your script shall use /srv/nswi177/arabidopsis.fasta directly. If the file does not exist, the script shall abort with arabidopsis.fasta file missing message (obviously printed to stderr) and exit code 7.

The sequence might be splitted over several lines. In that case, print the row where the sequence begins (note that we will distinguish this when grading – solution that works on single-line matches only will receive less points).

07/extract_snippets.sh (25 points)

Write a script that processes Markdown files and extracts code snippets into separate files.

As a simplification, you are required only to find fenced blocks (the ones with three backticks after an empty line), not the original “indented code” blocks – these you must ignore. Also, treat any HTML markup as plain text.

The files to process come as arguments to the script. A special argument of - means to process the standard input instead. The separate snippets should go into directory named after the file, without .md suffix; the snippet directory should be located in the same directory as the source Markdown file (see the example below). The snippets should be then called by two-digit numbers, counting from 01, with possible extensions (see below). For the special case with input from stdin, call the directory just stdin in the current working directory.

A common Markdown extension is language specification for fenced code blocks. In that case, the initial three backticks are followed (without space) with name of the languge, for example ```python3. In that case put a reasonable shebang into the snippet (if not already present) and attach the most common file extension to the snippet file. You should recognize following languages: bash, sh, shell, python, python3.

For example, if the file ./tutorials/hello-worlds.md contains following text:

Let's start simple and create a program to just print

```
Hello world!
```

and exit.

# Shell

```shell
echo "Hello world!"
```

# Python

```python
#!/usr/bin/env python3

print('Hello world!')
```

That is all!

After running 07/extract_snippets.sh tutorials/hello-worlds.md, the folder tutorials/hello-worlds/ would contain three files: 01, 02.sh and 03.py with following contents (Shebang in 02.sh could differ):

Hello world!
#!/bin/sh
echo "Hello world!"
#!/usr/bin/env python3

print('Hello world!')

Update: if some of the input files do not exist, the script must terminate with exit code 1 but shall extract snippets of all existing files first. The format of error message is not prescribed.

Deadline: May 3, AoE

Solutions submitted after the deadline will not be accepted.

Note that at the time of the deadline we will download the contents of your project and start the evaluation. Anything uploaded/modified later on will not be taken into account!

Note that we will be looking only at your master branch (unless explicitly specified otherwise), do not forget to merge from other branches if you are using them.

Changelog

2021-05-03: Mention package owning the exiftool utility.

2021-04-29: Explicitly specify the help text for 07/web_machine_status.sh, clarify which regular expression dialect will be used.