How I write shell scripts
Hey there. Lately I found myself writing more and more shell scripts solving both work-related and personal tasks. Pretty fast I got fed up with googling info about this and that in the shell language every time I write a script, so I thought it might be a good idea to gather all the information I've found in one place. This guide doesn't aim to be "comprehensive" or "definitive", rather personal and opinionated; this is mostly a note to self, as are other posts in this blog, though I'm not losing hope that it might be interesting for somebody else out there :)
Assumptions
I'd start with assumptions about the environment a script is executed in:
-
Use only features that exist in the original Bourne shell, with the notable exception of those that make scripting safer, i.e.
[[ ]]
comparisons,local
variables, and some other that will come along later. It allows writing easily portable scripts. -
Assume that by default there are only functions and commands that are built in an interpreter. Any other dependency should be checked using
hash
command before it is allowed to be used. -
Many POSIX utils, though standardized, come in different flavors and versions. It's better to either use only standard features or explicitly state which flavor and version of the util your script expects.
Basics
Let's quickly recap the syntax of the shell language and the structure of a shell script.
Preamble
A shell script starts with a shebang denoting the desired interpreter for it:
#!/usr/bin/env sh
For the sake of portability, it should be set indirectly via
/usr/bin/env
utility. The default interpreter should be sh
unless you're
using features specific to a concrete interpreter, i.e. bash arrays.
Safe Defaults
Before any actual code a script should set safer defaults for the interpreter:
set -euf
This line sets the following options:
-
-e
tells interpreter that it should exit the shell immediately should an error occur or a command fail. -
-u
enables reporting usage of an unset parameter and failing instead of executing the command referencing the parameter. -
-f
disables filename pattern expansion, e.g.*
,?
and[]
won't have a special meaning in filenames and will be treated as regular symbols instead
When using Bash I also add pipefail
option which propagates return code of the
failed command in a pipeline rather than the code of the last command.
Variables and Functions
Use functions to structure the code; use local
variables whenever possible. My
personal style guide for naming variables is that local variables are in lower
case and global are in upper case. Unless script is extremely small (under 10
LOC), I use main
function as an entry point:
GLOBAL_OPT=1
function main {
local param_a="$1"
local param_b="$2"
if [[ "${GLOBAL_OPT}" == "1" ]]; then
...
}
main "$@"
Special Variables
Shell has several special variables that are set automatically:
-
$0
The name of the shell script (if executed) or the shell (if invoked via
source
command) -
$1 .. $n
Positional parameters
-
$@
All positional parameters
-
$#
The number of positional parameters
-
$?
The exit status of the most recent command
Control Structures
There are several control structures in the shell language:
For Loop
for varname in generator_or_sequence; do
body
done
While Loop
while condition; do
body
done
Case Conditional
case input in
pattern1 | pattern2 | * )
body
;;
...
esac
If-Else Conditional
if condition1; then
body1
elif condition2; then
body2
else
body3
fi
I prefer to always use ;
and keep do
and then
on the same line as the
start of the control structure.
One thing about shell is that the only built-in condition in it is the exit
status code: 0 means true
, while anything else means false
. All other
conditions are made via built-in test
utility, which compares strings,
integers, files, etc.; its exit code is then used by the shell. test
is a huge topic by itself; I'll dive into it below.
Variable Expansion
Whenever it is necessary to expand a variable I use only ${VAR}
syntax and
avoid $VAR
for the sake of simplicity: this way I don't need to remember two
different syntax forms. The latter one doesn't work anyway when you want to
concatenate the contents of the variable with a string, i.e. echo "${VAR}s"
is
correct, but echo "$VARs"
produces nothing.
Other handy variable expansions are:
-
${param:-[word]}
Substitute
[word]
whenparam
is unset or empty -
${param:=[word]}
Same as above, but also assigns
[word]
toparam
-
${param:?[errmsg]}
Writes
[errmsg]
to stderr and exits with a non-zero status ifparam
is unset or empty -
${#param}
Returns length of
param
in characters -
${param%[word]}
Removes the suffix of
param
matching[word]
-
${param#[word]}
Removes the prefix of
param
matching[word]
Other Expansions
For the command expansion I use only $(cmd)
form because it allows nesting
commands easily.
Another helpful expansion is arithmetic expansion which
looks like $((2 + 3 * 5))
; it is very convenient, though I tend not to use
its assignment capabilities and assign to the variable outside the expansion,
i.e. local var="$((2 + 3))"
.
Redirection
D<F
, D>F
and D>>F
forms redirect input from a file F
into
a descriptor D
and output from a descriptor D
to a file F
(>>
form
appends to a file instead overwriting it). I prefer to explicitly set
descriptors; by default 0
is stdin, 1
is stdout and 2
is stderr.
D1<&D2
and D1>&D2
are essentially the same and redirect output from
one descriptor to another. I prefer the latter form.
Heredoc
Used mostly for help messages and embedding of the code in other languages, Heredoc looks like:
cat <<EOF
Here go the contents.
EOF
EOF
can be any string; it's just a delimiter denoting the end of the input.
Heredoc also comes in another handy form, <<-
, which removes leading tabs
from the lines in the contents.
Command Groups
Commands in the shell can be grouped in several ways allowing to execute them
sequentially (separated by ;
or newline), asynchronously (separated by
&
), in a pipeline (separated by |
), or conditionally based on the exit
status ( separated by &&
for AND and ||
for OR conditions). Commands can
also be grouped using ()
.
This was most of the shell language. There are various extensions from different interpreters, but they are out of the scope of the post.
Conditions
Let's move on to the test
command. As I mentioned earlier, shell doesn't have
any built-in conditions except the exit code, so all the heavy-lifting is done by
test
. To look more plausibly in code, test
is aliased to [
, so that you
can write:
if [ "$var" -ne "something" ]; then
...
fi
instead of
if test "$var" -ne "something"; then
...
fi
Because [
form allows several misusing, namely treating variables which
content starts with -
as flags to test
itself, [[
form was introduced
later. I strongly tend to use only [[
form because it's more safe and
convenient, i.e. you can use &&
and ||
to group conditions in [[
form
instead of -a
and -o
flags in [
form.
test
has a lot of flags; I'll list only the ones I use the most:
-
-eq, -ne, -gt, -ge, -lt, -le
Binary operators for integer comparison
-
=, !=, <, >
Binary operators for string comparison
-
! expr
Negates the expression
-
-n string
True if string length is nonzero
-
-z string
True if string length is zero
-
-e path
True if
path
exists, regardless of its type -
-f path
True if
path
exists and it is a file -
-d path
True if
path
exists and it is a directory
The rest of test
flags are in its manpage.
External Utilities
Now it's time to talk about external utilities because without them shell scripting wouldn't be very useful.
Awk
I'll start with awk
because I use it instead of grep
, sed
, tr
and maybe
some other tools I don't know about. This Swiss-knife of a tool is intensively
covered by "The AWK Programming Language"
book from its original authors Aho, Weinberger and Kernighan. It is highly
recommended to read the book if you'd like to master Awk.
An Awk program is a sequence of statements in the form
pattern1 { action1; action2; ... }
pattern2 { action3; action4; ... }
...
These statements can also be separated by semicolon; if pattern
is omitted,
then actions are applied to every line of the input. Awk splits each matching
line into fields using separator regexp defined in builtin variable FS
(whitespace by default). The values of the fields are available via $1
, $2
,
... special variables; $0
has the whole line; $NF
has the last field; NF
has the number of fields.
pattern
could be a regular expression in the form input ~ /regexp/
(or !~
to negate matching) or some other expression of the language, i.e. $2 * $3 > 5
will match lines where the 2nd field multiplied by the 3rd field is more than 5.
Expressions can be grouped together using ||
, &&
and !
logical operators.
There are also special keywords BEGIN
and END
that execute corresponding
actions before and after an input is processed.
Awk is a full-fledged programming language; it has if-then-else
conditionals,
for
and while
loops, variables, a lot of built-in functions, etc. You can
find detailed information about Awk functions and control structures in
its man page; I'll just show some example programs.
Emulate cat
awk '{print}' input.txt
Emulate grep pattern
awk '/pattern/' input.txt
Emulate tr -d pattern
awk '/pattern/ { gsub(/pattern/, "", $0); } { print }'
Count the number of requests to routes in a webapp
# Assuming that log entry looks like "Oct 15 12:00:00 path=/some/route ..."
/path=/ {
sub(/path=/, "", $4);
path_counts[$4] += 1;
}
END {
for (path in path_counts) {
printf("%2d %s\n", path_counts[path], path);
}
}
In order to bring popular results on top we can use it together with sort
:
awk -f program.awk access.log | sort -rn
# Given access log:
# Oct 15 12:00:00 path=/route/1 ...
# Oct 15 12:01:00 path=/route/1 ...
# Oct 15 12:02:00 path=/route/2 ...
# Oct 15 12:03:00 path=/route/3 ...
# will produce:
# 2 /route/1
# 1 /route/2
# 1 /route/3
Xargs
xargs
is a wonderful little tool for parallelizing shell computations, as well
as feeding stdin to the commands that only support positional arguments, i.e
cp
or rm
. As always, complete information about its flags and options is in
xargs man page; I'll only present some handy
examples.
Remove everything in the current directory
ls -A | xargs rm -rfv
Download a list of URLs in 10 parallel processes
# Flag -n controls how many arguments are passed to the command on each
# invocation.
# Flag -P controls the number of parallel invocations of the command.
cat urls.txt | xargs -n1 -P10 wget
Copy everything in the current directory into another one
# Flag -I denotes the replacement string; its occurrences
# in the command will be replaced with the args from stdin
ls -A | xargs -n1 -I{} cp -rv {} some-other-dir/
Find
find
searches for files and directories using various filters, such as type,
name, creation time, permissions, and many others. The complete list of filters
is in find's man page; below are some examples.
Find all Haskell source files in the current directory
find . -type f -name "*.hs"
Find all executables in the filesystem
# Flag '-perm -mode' tests whether all of the permission bits are set
# for the entry. It is inexact match; for the exact one use
# '-perm mode' flag.
find / -type f -perm -o=x
Find all temporary files and directories belonging to the current user
find /tmp -user $(id -u)
Find and remove all backup files from the current directory
# Flag -iname performs case-insensitive match on the name of a file.
# Flag -print0 makes find output results using \NUL character
# as a separator, instead of a newline. It helps in the cases
# where a filename has whitespaces because xargs will try to split it
# incorrectly. Separating results with \NUL will keep whitespaces
# in filenames.
find . -type f -iname "*.bak" -print0 | xargs -0 rm -rfv
Curl
Though curl
is a very powerful command, I use it mostly for exchanging data
over HTTP. Below is a list of options I find to be the most useful:
-
-#
Replaces default verbose progress meter with a simple progress bar
-
-s
Makes curl completely silent
-
-v
Make curl verbose (useful for debugging)
-
-L
Enables following 3xx redirects from the server
-
-X method
Specifies the HTTP method
-
-H "X-Header-Name: Value"
Specifies the request header
-
-F key=value
or-F key=@path/to/somefile
Specifies the form content for a POST request
-
-d 'raw data'
Specifies the raw content for a POST request. Useful when querying services that use JSON or XML input format.
-
-u username:password
Specifies the authentication for a request
-
-o filepath
Redirects output to a file
Xsltproc
xsltproc
is a very handy tool when you need to parse HTML pages reliably.
Recently I used it to parse song history from Soma.fm. It doesn't have a lot of
options; the most relevant for HTML parsing is --html
which makes the tool
more tolerable to what would be an error when parsing XML. Instead of the tool
itself you'd need to master XSLT stylesheets to use it well. Luckily, the idea
behind them is quite simple: you match content using XPath queries and apply
transformations to selected objects. A lot of examples for XSLT are in
this wiki book.
JQ
Another useful utility, this time for JSON data. Allows you to query JSON documents, aggregate, transform, and do lots of other things. It has detailed manual as well as online tool where you can play with JQ filters.
Column
Last but not least is this little command that prettifies your output by
formatting it into columns. I usually print the output using some specific
separator <sep>
that doesn't appear in it, and then pipe it to column -s <sep> -t
which will split the data using the separator and re-format it into a
nice table:
# Given output.txt:
# foo:23:a
# quix:7:b
# baz:156:c
# will produce:
# foo 23 a
# quix 7 b
# baz 156 c
cat output.txt | column -s : -t
In The End
Wow, that turned out to be a lengthy post. I just slightly opened the topic of shell programming; there is plenty of other useful tools and commands that can be used in shell scripts. Overall I think shell programming is relatively unsafe and unmaintainable programming technique, but in some situations it saves a lot of time and effort. I believe a good software developer should, if not master, then at least get comfortable with the shell and its language.