There are abundant resources online trying to scare programmers away from using shell scripts. Most of them, if anything, succeed in convincing the reader to blindly put something that resembles
set -euo pipefail
at the top of their scripts. Let’s focus on the “-e” flag. What does this do? Well, here are descriptions of this flag from the first two results on Google for “writing safe bash scripts”:
- “If a command fails,
set -e
will make the whole script exit, instead of just resuming on the next line” (https://sipb.mit.edu/doc/safe-shell/) - “This tells bash that it should exit the script if any statement returns a non-true return value.” (http://www.davidpashley.com/articles/writing-robust-shell-scripts/)
Unfortunately, this is bash we are talking about and the story is never that simple.
A couple months ago, a particular production bash script (if that doesn’t sound horrifying, hopefully it will by the end of this post) failed in the worst kind of way: silently. The script generates a list of valid users at Jane Street and pushes this out to our inbound mail servers. It looks something like:
set -euo pipefail
...
echo "($(ldap-query-for-valid-users))" > "/tmp/all-users.sexp"
...
push-all-users-if-different
On this one particular day, a file was deployed with the contents “()”. But why
didn’t set -e
cause the script to exit when ldap-query-for-valid-users
failed? A quick look at the bash man page answers this question. It turns out
that there are a couple surprising subtleties to this flag. Here are a couple:
set -e
works on “simple commands”
A script will exit early if the exit status of a simple command is nonzero. So
how is a simple command executed? In short, bash does all expansions and checks
to see if there is still a command to run. If there is a command to run, the
exit status of the simple command is the exit status of the command. If there is
not a command to run, the exit status of the simple command is the exit status
of the last command substitution performed. Here are some example commands that
all have exit status 0, so would not cause a set -e
script to exit:
# echo, local and export are commands that always have exit status 0
echo "$(/bin/false)"
local foo="$(/bin/false)"
export foo="$(/bin/false)"
# the last command substitution has exit status 0
foo="$(/bin/false)$(/bin/true)"
set -e
does not get passed to subshells in command substitution (without --posix
)
Here is an example consequence of this:
set -e
foo() {
/bin/false
echo "foo"
}
echo "$(foo)"
Running this script with bash
will print “foo” while running this with
bash --posix
(or sh
) will not. Both scripts will exit with status 0.
Tangible takeaway
This is not to say that something like set -euo pipefail
should not be used at
the top of all bash scripts, but it should not give you a false sense of
security. Like all production code, you must reason about all failure conditions
and ensure they are handled appropriately. Even if you are some kind of bash
expert who knows all these subtleties, chances are your peers do not. The
execution of shell scripts is subtle and confusing, and for production code,
there is likely a better tool for the job.