Answers to the Linux Sysadmin Bash Challenge

Congratulations to Henning Phan and Christian Jenei, the winners of our thrilling Bash challenge! Their exceptional problem-solving and troubleshooting skills impressed the judges and set them apart from the competition.

Below you can find the correct answers to the questions.

1.Failing pipes

Challenge

A Linux command will typically return with exit code 0 if it’s been successful, and non-zero if there was an error (often exit code 1, unless the process is trying to tell us something specific). When using a pipe, more than one command is involved. Consider:

 grep == requirements.txt | sort | uniq

Will the command as a whole be successful if (a) grep fails, but sort and uniq run successfully, (b) sort fails, but grep and uniq run successfully, (c) uniq fails, but grep and sort run successfully, (d) all sub-commands are successful, (e) all sub-commands fail.

How can I test this? Is there any shell option to tweak this behavior?

Solution

The command runs successfully in cases (a), (b) and (d). The final command in the pipeline determines the exit code. You can see the exit status of the entire pipeline in $? and the exit status of individual commands in the array PIPESTATUS:

 echo "The pipe exits with status $?. The individual commands had these status codes: ${PIPESTATUS[*]}."

You can play around using the commands true and false, either substituting the commands in the pipe entirely, like so:

 false | true | true

or injecting them using a group command at the desired position in the pipe:

 grep == requirements.txt | { sort; false; } | uniq

If we wish the pipe to fail if any command in the pipe fails, you can set that using

 set -o pipefail

prior to starting the pipe. It is also possible to set the flag already when invoking the shell, which is often done when writing Dockerfiles (see https://github.com/hadolint/hadolint/wiki/DL4006).

2.Sorting standard error output

Challenge

Assume there’s a command named mytestcommand that writes several lines to both standard output and standard error output. If we want to sort the lines that go to the standard output, we can do it with

 mytestcommand | sort

Now, suppose we want to sort the standard error output instead. Can you do it (a) so that both stdout and stderr are sorted together? Can you do it (b) so that only stderr is sorted, but the stdout is left unaffected?

Solution

First, we may want to define mytestcommand so that we have something to play with. For example

 function mytestcommand
{
    echo heuristic
    echo jovial
    echo peaceful >&2
    echo sharp >&2
    echo friendly
    echo hardcore >&2
    echo elated >&2
}

(a) Combining stdout and stderr and sorting them together can be done with

 mytestcommand 2>&1 | sort

This redirects file descriptor 2 (stderr) to file descriptor 1 (stdout), which is connected to sort with a pipe. Both file descriptors will henceforth write to the pipe and get sorted.

(b) A naive approach would be to simply try to switch places between file descriptors 1 and 2, using something like mytestcommand 2>&1 1>&2 | sort. That would however end up sorting both stderr and stdout, as redirections are processed in the order they appear: First file descriptor 2 is connected to the pipe using 2>&1, and then file descriptor 1 is connected to the place 2 goes, which at this point is the pipe. Doing it in the other order doesn’t help either, as 1>&2 2>&1 would first remove file descriptor 1 from the pipe and have it go to the terminal, and then connect file descriptor 2 to the place 1 currently goes: both end up in the terminal, and nothing gets sorted.

As with swap in any programming language, you need a temporary variable to hold the old value so it doesn’t get overwritten. This might be something like

 mytestcommand 3>&1 1>&2 2>&3 | sort

This first opens file descriptor 3, letting it go where 1 goes (to the pipe). Then it redirects file descriptor 1 where 2 goes (to the terminal). At this point, file descriptor 3 goes to the sort, and 1 and 2 go to the terminal. Then file descriptor 2 gets redirected where file descriptor 3 goes, which is to the sort.

While this has the desired effect on the sorting, it has affected mytestcommand in a way that it could possibly notice: it runs with an extra open file descriptor (file descriptor 3). The proper solution should also close the extra file descriptor:

 mytestcommand 3>&1 1>&2 2>&3- | sort

All of this was assuming that file descriptor 3 was free to play around with. If it were connected to something interesting, we lose that. To make the shell use a free file descriptor we can tell it to use a variable instead:

 mytestcommand {fd}>&1 1>&2 2>&$fd- | sort

There’s only one problem with this approach: we’ve switched stdout and stderr. If you put this in a shell script and the user redirects stdout, it would be confusing to get stderr instead. For this, we need to switch back. However, just appending another {fd}>&1 1>&2 2>&$fd- to sort won’t work, as mytestcommand is writing it’s stdout redirected to stderr bypassing sort entirely. The solution is a subshell:

 ( mytestcommand {fd}>&1 1>&2 2>&$fd- | sort; ) {fd}>&1 1>&2 2>&$fd-

Interestingly, the same command using a group command { } instead of a subshell ( ) leaves file descriptor 10 open after the command has run. Maybe I’ve here reached the limit of my bash expertise, or could that possibly be a bug in bash itself?

3.Map that file

Challenge

The mapfile command is built into Bash. It’s description is

Read lines from the standard input into the indexed array variable ARRAY, or from file descriptor FD if the -u option is supplied. The variable MAPFILE is the default ARRAY.

What is the difference between this pipe:

 grep == requirements.txt | mapfile; echo

… and the following redirection?

 mapfile < <(grep == requirements.txt); echo

Will they do the same thing? What is the advantage of using the < <() syntax? What is the advantage of using the pipe syntax? If I really want to use the pipe syntax, can we work around any potential issue with the first approach? (In a realistic scenario, echo may be replaced with some for loop over the entries in the array.)

Solution

The first one will most likely not do what you intend; in the pipeline case mapfile operates in a subshell, which cannot change the MAPFILE array of the parent shell; once the pipeline has run the result is discarded, and the MAPFILE array contains whatever it contained before you ran the pipeline — the array is probably empty, in which case expands to 0.

In the second example, mapfile operates in the main shell process, and the MAPFILE array will contain the lines that grep produced, and expands to the number of lines with == in the file. The <( ) syntax creates a temporary file to hold the output of grep, which is used as input for mapfile with a normal file input redirection <.

The downside with the <( ) syntax is mainly error handling: There’s no way for the main shell to react to a failing grep command. With pipes, at least if set -o pipefail is active, we can detect and react to things going wrong.

To use the pipe syntax and benefit from the superior error handling of pipes (relatively speaking), you can use a subshell or group command in the pipe, to make sure that mapfile and the code using the MAPFILE array are in the same shell context:

 grep == requirements.txt | { mapfile; echo ; }

4.exec redirections

Challenge

What on Earth is happening here? Why would anyone do it like that? Will it even work?

     local test_targets_file_write
    local test_targets_file_read
    declare -r test_targets_file=$(mktemp)
    exec {test_targets_file_write}>"$test_targets_file"
    exec {test_targets_file_read}<"$test_targets_file"
    rm "$test_targets_file"

    find tests -name 'test*.py' | \
        testfile2targetname     | \
        sort >&"$test_targets_file_write"
    declare -a test_targets
    mapfile -t test_targets <&"$test_targets_file_read"
    readonly test_targets

Solution

A temporary file is created with mktemp, and opened for reading and writing by the current shell, and then immediately removed, before any reading or writing is done. Then we write the output of find to the file (which by this point is already removed), and read that output back into mapfile. It is a curious fact, and a very useful one, that a file removed in Linux is only actually removed from the disk once the last open file handle to the file has been closed. By removing the file before we use it, we can guarantee that the file gets removed, even if the computer crashes while we were using the file. This mechanism is independent of the programming language, and is the feature underlying Python’s tempfile.TemporaryFile on platforms that support it.

Why would anyone do it like that? The alternatives are mainly, as discussed in the previous section, to use a pipe and put the mapfile in a subshell together with any code that wants to use the test_targets array, which might be cumbersome, or to use the < <( ) construct (process substitution, as it’s called), in which case we miss out on error handling.

Arguably, we should close the extra file descriptors explicitly, though:

 exec {test_targets_file_write}>&-
exec {test_targets_file_read}>&-

5.The greater pipe

Challenge

What does this do?

 grep == requirements.txt >| sort

Is this a file redirection or a command pipe? What’s the difference between > and | and >|? When would you use one or the other?

Solution

This is a file redirection, not a command pipe. Normally output redirection is done with the operator >. The bash manual says:

The general format for redirecting output is:
        [n]>word
If the redirection operator is >, and the noclobber option to the set builtin has been enabled, the redirection will fail if the file whose name results from the expansion of word exists and is a regular file. If the redirection operator is >|, or the redirection operator is > and the noclobber option to the set builtin command is not enabled, the re‐ direction is attempted even if the file named by word exists.

The noclobber option is something you’d set in your ~/.bashrc file, and is very useful in interactive shells. It is normally enabled by bash experienced users after they overwrite a sufficiently important file by mistake.

6. :(){ :|:;};:

Challenge

What does this command do? Think it through before you run it.

 :(){ :|:;};:

Solution

This is a fork bomb. The first thing to notice is that : is a command. It’s actually a built-in command that doesn’t do anything (successfully, rather like the command true).

The first part, :(){ :|:;} is defining a function named :, overriding the built-in command :. The second part, :, executes said function. When the function executes, it runs :|:, which means the shell forks into two subprocesses, connecting standard input of the right one to the standard output of the left one, and then each subprocess independently executes the : function again. This leads to an exponential increase in the amount of resources used, which can be enough to bring the entire system to a halt by eating all the memory.

If you wish to try it, it is wise to isolate it from the rest of the system and limit its available resources, for example using docker and/or ulimit.

Answers to the Linux Sysadmin Bash Challenge

1.Failing pipes

Challenge

Solution

2.Sorting standard error output

Challenge

Solution

3.Map that file

Challenge

Solution

4.exec redirections

Challenge

Solution

5.The greater pipe

Challenge

Solution

6. :(){ :|:;};:

Challenge

Solution

Like it? Share it:

You may also like

Try your scripting skills

The Future of LLMs: from Pieter Abbeel and John Schulman

Vision Transformers vs CNNs at the Edge