Shell scripting patterns: paths, file globs and and current dir

Mon, 2011-07-04 22:01 by quest

The basis of this pattern is simple: never ever change current working directory in your scripts. In fact, never ever assume that you have a working directory unless your command explicitly works from local directory (e.g. find). Consequently, you should never use relative paths in your script unless you got it from the caller; since your script doesn't change working directory, it's fine to just use whatever path the caller handed you.

There are numerous reasons for this. The most straightforward reason for this is that cd executed without an argument will change directory to the executing user's home directory, which is arguably where the script can do the most damage.

There is one case where your script needs to refer to a relative path, namely if your script is distributed together with other files (e.g. in a zip file) and should be runnable after being unzipped. The script must be able to find files/directories relative to wherever the script resides, so relative to the current working directory does not cut it if the user tries ./zipdir/script.sh. Fortunately, all shells have a $0 that is the path of the script itself. Thus, you do this:

DATADIR=$(dirname $0)/data

While you are avoiding things, also avoid shell expanded glob expressions. They are simply too imprecise and it is too easy to get bitten by paths with space in them. In 99 of 100 cases, find is better than shell glob. find is your friend.

find /dir/with/cruft -mindepth 1 -maxdepth 1 \
     -type f -regex '.*\.sh' | frobnicate_script

There are numerous things going on here. mindepth/maxdepth describes how deep to look; a depth of 0 means the path supplied to find, so 1/1 means "look in the directory /dir/with/cruft". type f means "look only for files". regex uses regular expressions to match files, which is a lot more expressive than globs (normally supplied to find via -name). It is tempting at this point to reach for -exec, but please refrain. The exec option creates code that is very difficult to read.

Instead xargs is the friend of your find. If all you want to do is a simple operation on each found file, you can do something like this:

find $rootdir -mindepth 1 -type d -cmin -10 -print0 | xargs -0 chmod o-rwx

Or, in English: remove world permissions on all directories under $rootdir that I created the last 10 minutes. Find option -print0 will terminate each line with a null character instead of a newline and -0 will tell xargs to ignore whitespace and instead react to null characters. Presto, no problem with space (or even newlines) in paths.

Discouraging users from using delimiter characters (such as space, tab, newline, comma, pipe, et.c.) in file and directory names is part of the holy cause. Nevertheless, some file will have them, sometimes even for good reasons. So your script needs to be prepared for them. Quote all data that comes from outside.

As an aside, it is an annoying aspect of shell scripting that most syntax highlighters will treat e.g. "$1" as a string and give it string colour, but $1 as a variable and give it variable colour. Most unhelpful.

Tags:

quest-blog

shell-scripting

software-development

Shell scripting patterns: paths, file globs and and current dir

Main menu