Parsing command line options in bash
I just started a new job a few weeks ago, so I’ve been pretty negligent regarding this website. However, I think I’ve finally gotten my footing a bit better, so hopefully I can spend a little more time here. I wanted to mark my return by writing about terrible shell scripts, some techniques for writing less terrible shell scripts, and – most importantly – deciding when its appropriate to use these techniques. Just for clarification, when I’m talking about shells in this post, I’m talking about bash, though some of the methods mentioned may work for other shells as well.
Terrible shell scripts – a brief history
Throughout most of graduate school, I wrote a lot of really bad shell scripts. This is likely because I started coding in R. As an interpreted language, R allows for relatively interactive sessions, with the ability to run code line-by-line. The excellent RStudio IDE can also make writing R code enjoyable, if writing code is your thing. There’s nothing interactive about shell scripting, so at first I really struggled when I transitioned to it. In my early attemps, if I needed to set the value of a variable, I simply performed the assignment inside of the script, up near the top. For instance, to set the path to an input file, I would write something like:
infile=/path/to/my/input/file.txt
This is how everyone learns to write shell scripts. However, I quickly grew disenchanted with this approach. For one thing, I always had to look inside of a script to remember how it worked. In addition, storing a script like this inside of a git repository was annoying, because git would always be tracking any changes made to assignments of constants.
I eventually progressed to using positional arguments. For instance, typing something like the following:
./script.sh in_file.txt foo bar
would assign three strings (“in_file.txt,” “foo,” and “bar”) to the variables $1, $2, and $3 respectively. However, one downside of using this approach was that I still had to look inside the script to remember how to use it, and it got pretty cumbersome if there were more than a few arguments.
I also experimented with reading arguments in from files. For example, the following code will read in lines from the second column of a tab-delimited file, place them in an array, and then extract the elements of the array into variables:
mapfile -t args_array < <(cut -d$'\t' -f 2 args_in.txt) echo "Arguments read in from args_in.txt:" printf '%s\n' "${args_array[@]}" infile=${args_array[0]} foo=${args_array[1]} bar=${args_array[2]}
This method works well if there are a large number of arguments to be input. Your argument input file can consist of a table, with one column containing input labels for reference, and another column containing their values (column 2 in the example above). However, one downside is that the argument input file must always tag along with the script. In addition, you’ll still have to look inside either the argument file or the script itself to figure out what is going on.
Parsing command line options
Finally, I decided that I wanted to write scripts that more closely resemble all the bash built-ins and C programs that I typically work with. For instance, below is a grep command to count the number of lines in a file that do not contain the phrase “foo,” written with short options:
grep -v -c "foo" file.txt
or the same command using the equivalent long options:
grep --invert-match --count "foo" file.txt
Notably, I never have to dive into the source code for grep to change its operation. In addition, if I don’t know how to use grep for a particular purpose, I can just type “grep -h” to get some advice.
After some research and some trial and error, I finally realized my goal… mostly. I will say that I don’t think there is currently any perfect solution for parsing command line options in shell scripts, but there are some that get us most of the way there. Confusingly, the two programs that we can use to parse options in bash have almost identical names:
- getopts
- getopt
To make matters more confusing, there is an older version of getopt, and an enhanced GNU version. Whether getopts or getopt is preferable is a matter of eternal debate. I won’t go into much detail on it, but here are some main points to consider:
- getopts is a bash built-in, and is compatible with multiple POSIX-compliant shells
- The GNU version of getopt is available in the util-linux package, which comes installed on nearly all Linux distributions. However, it does not come installed on Mac OS or FreeBSD. On Mac, it can be installed using MacPorts.
- The GNU getopt is the only one that can handle long option names – this is technically possible using getopts, but the solution is a bit hackish
- getopts can make life a bit easier, because it does the actual parsing of command options – getopt only formats supplied options to make them easier to parse.
- The most important thing is to avoid the old version of getopt – at this point there’s no reason to ever use it.
Example
I’m going to just show an example bash script, called getopt_example.sh, that parses command line arguments using getopt here, and then give some explanation as to what it’s doing. Note that the code is also posted here. This script just takes two floating point numbers as input, and returns their sum. It can optionally output the text to a file instead of to standard out, and can insert some custom text along with the summation output.
#!/bin/bash # Set script Name variable script=$(basename ${BASH_SOURCE[0]}) ## Template getopt string below ## Short options specified with -o. option name followed by nothing indicates ## no argument (i.e. flag) one colon indicates required argument, two colons ## indicate optional argument ## ## Long arguments specified with --long. Options are comma separated. Same ## options syntax as for short options opts=$(getopt -o a:b:lt:: --long numa:,numb:,log,text:: -n 'option-parser' -- "$@") eval set -- "$opts" ## Set fonts used for help function. norm=$(tput sgr0) bold=$(tput bold) ## help function function help { echo -e \\n"help documentation for ${bold}${script}.${norm}" echo " REQUIRED ARGUMENTS: -a or --numa Floating point number - first number to be summed -b or --numb Floating point number - second number to be summed OPTIONAL ARGUMENTS -t or --text Optional descriptive text to include in output FLAGS -l or --log If included, prints output to a file named 'sum.log', otherwise prints to stdout -h or --help Displays this message." echo -e \\n"USAGE EXAMPLE: ${bold}./$script --numa=2 --numb=3 --text='Here is the output' --log ${norm}"\\n exit 1; } ## If no arguments supplied, print help message and exit if [[ "$1" == "--" ]]; then help; fi ## Set initial values for arguments num_a="init" num_b="init" text="Floating Point Sum:" log="false" ## Parse out command-line arguments while true; do case "$1" in -a | --numa ) case "$2" in "" ) shift 2;; * ) num_a="$2"; shift 2;; esac ;; -b | --numb ) case "$2" in "" ) shift 2;; * ) num_a="$2"; shift 2;; esac ;; -t | --text ) case "$2" in "" ) shift 2;; * ) text="$2"; shift 2;; esac ;; -l | --log ) log="true"; shift ;; -h | --help ) help ;; -- ) shift; break;; * ) echo "Internal Error!"; break;; esac done ## Check if both number a and b were supplied if [[ "$num_a" == "init" ]] || [[ "$num_b" == "init" ]]; then echo "--numa and --numb arguments are both required. Type './getopt_example.sh -h' for help" exit 1; fi ## Create the sum of the numbers - a little more involved for floating point numbers if [[ $log == "true" ]]; then echo "$text" > sum.log echo "The sum of $num_a and $num_b is $(echo $num_a + $num_b | bc)" >> sum.log else echo "$text" echo "The sum of $num_a and $num_b is $(echo $num_a + $num_b | bc)" fi exit 0;
Explanation
That’s a lot to take in – we’ll break it down piece by piece below.
The call to getopt
getopt just performs one function – it takes a string supplied after the script name, and breaks it apart into an array of “words” – i.e. the options and (possibly) their arguments. I say “possibly” because not all options have arguments. In our case, the output array is stored in variable $opts.
In our call to getopt, we supply the short versions of arguments after the -o option. Pay attention to the colons – if there are no colons after an option letter (l in this case) it signifies that the option doesn’t take an argument (i.e. it operates as a flag). One colon means that the option requires an argument, and two colons means that it can take an optional argument. We then list out the corresponding long versions of the arguments after –long. Everything is the same as for the short options, except we need to place commas between each option name.
Here’s a good place to point out one of the idiosyncrasies of getopt – for some reason that I don’t understand, it cannot handle a space between a long option and an optional argument. For instance, in our call to the above script, if we wanted to customize the output header text, we would have to write –text=”Output below” instead of simply –text “Output below.” For that reason, I recommend always using the equal sign assignment for long options (whether they have required or optional arguments), and writing the usage example in the help function (more on this below) to reflect this.
The -n option is used to give a name to the parser, to be output along with any error messages should option parsing fail. The “–” signifies the end of the options, and the $@ is used to store any additional positional arguments, which can be supplied after the named options.
Finally, that line beginning with eval set is used to preserve whitespace within any positional parameters. Interestingly, you could delete everything through the line starting with eval set, and the script would still technically work, though it would easily get confused by more intricate command line inputs.
The help function
This is really the most important part of this whole exercise. You don’t have to write a help message, but in my mind having one is the whole point of going to the trouble of parsing arguments in the first place. The help function is pretty simple – its job is to print out information on each of the possible options/flags, and then exit the script. You’ll notice below the function definition I have a single line that tests whether $opts starts with “–“; if it does, the script prints the help message and exits. $opts will start with “–” if no arguments are supplied, so in this case, if the user just types:
./getopt_example.sh
the script will just print the help message. This is why I mentioned above that this script can’t take any positional arguments.
Initial values
Next we set the starting values for all arguments. My one piece of advice here is to set the initial value for any options requiring an argument to some standard string (such as “init”) – more on this later. You should set the initial value for any option with an optional argument to some sensible default value. Finally, the initial values for any flags should be set to some version of “false” – it could be “F” or “FALSE” – whatever works best for you.
The Parsing Loop
Now – what is going on inside of the while loop? Remember that the output of the call to getopt is an array of strings, called $opts in our example. So, the while loop will simply loop through this array. For each iteration of the loop, an option is assigned to the variable $1, and its argument (if present) is assigned to the variable $2. The case…esac statement is a way of performing conditional tests that is equivalent to the statement switch in many languages. Basically, it is a way of testing for a match to multiple possible patterns, with syntax that is easier than using a long string of if… elif… else statements. You’ll notice all the shift commands in there – for any option that has an argument, we will assign the argument value to a variable, and then shift two positions over in the getopt-supplied array (the option name, and its argument). For any option without an argument (i.e. a flag), we only need to shift one position through the array.
There are a few different ways to go about handling required vs. optional arguments, but I prefer to handle them the same way, to make it possible to copy and paste code within the option parsing block. For each option with an argument, I have a nested case…esac statement. If the supplied argument is anything (“*”), it is assigned to the relevant variable. If it is an empty string (“”), nothing happens and the while loop moves on.
Finally, the –log/-l and –help/-h options are flags with no arguments. In the case of –log, the value of the variable $log is set to “true,” while –help/-h calls up the help function.
When “–” is encountered, it signals the end of the entered options, and the loop is broken. Any other value present aside from those listed above signals that something has gone wrong, raising an error message.
Checking user input
getopt can make life easier, but it still isn’t very smart. For instance, right below the option parsing block in the code is a test to make sure that neither –numa/-a nor –numb/-b are still set to their initial values of “init.”getopt will only catch an omission of a required argument if the user types a short or long option with a space, and then the argument, for instance:
./getopt_example.sh --numa 2.5 --numb
However, it won’t catch the problem is the user uses an equal sign for a long option:
./getopt_example.sh --numa=4 --numb=
or if the user never enters the option for number b in the first place:
./getopt_example.sh --numa=4
Therefore, it’s up to us to make sure that all required arguments are supplied. By setting a default value of “init” for all required arguments, and by setting up our parsing block to do nothing if it encounters an empty string for a required argument, we can reliably exit the script if any required argument was not supplied with the following test:
## Check if both number a and b were supplied if [[ "$num_a" == "init" ]] || [[ "$num_b" == "init" ]]; then echo "--numa and --numb arguments are both required. Type './getopt_example.sh -h' for help" exit 1; fi
Conclusion
That’s all there is to it – you can play around with this simple script to see how different inputs affect its output; I’m sure there are some ways to break it that I haven’t thought of. Now you know everything you need to write nice, professional-looking scripts in bash. That being said, you should think about when to use this technique. There is something to be said for crappy scripts – sometimes you need to just write something, run it once or twice, and move on. The boilerplate code required to use getopt is somewhat long and tedious to modify, especially if the script has a large number of options. However, if you are writing something that you will need to run many times in the future, it probably pays to take the extra time to make it a little more professional.