Chapter 30. Debugging

 

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

 Brian Kernighan

The Bash shell contains no debugger, nor even any debugging-specific commands or constructs. [1] Syntax errors or outright typos in the script generate cryptic error messages that are often of no help in debugging a non-functional script.


Example 30-1. A buggy script

   1 #!/bin/bash
   2 # ex74.sh
   3 
   4 # This is a buggy script.
   5 # Where, oh where is the error?
   6 
   7 a=37
   8 
   9 if [$a -gt 27 ]
  10 then
  11   echo $a
  12 fi  
  13 
  14 exit 0

Output from script:
 ./ex74.sh: [37: command not found
What's wrong with the above script (hint: after the if)?


Example 30-2. Missing keyword

   1 #!/bin/bash
   2 # missing-keyword.sh: What error message will this generate?
   3 
   4 for a in 1 2 3
   5 do
   6   echo "$a"
   7 # done     # Required keyword 'done' commented out in line 7.
   8 
   9 exit 0  

Output from script:
 missing-keyword.sh: line 10: syntax error: unexpected end of file
 	
Note that the error message does not necessarily reference the line in which the error occurs, but the line where the Bash interpreter finally becomes aware of the error.

Error messages may disregard comment lines in a script when reporting the line number of a syntax error.

What if the script executes, but does not work as expected? This is the all too familiar logic error.


Example 30-3. test24, another buggy script

   1 #!/bin/bash
   2 
   3 #  This script is supposed to delete all filenames in current directory
   4 #+ containing embedded spaces.
   5 #  It doesn't work.
   6 #  Why not?
   7 
   8 
   9 badname=`ls | grep ' '`
  10 
  11 # Try this:
  12 # echo "$badname"
  13 
  14 rm "$badname"
  15 
  16 exit 0

Try to find out what's wrong with Example 30-3 by uncommenting the echo "$badname" line. Echo statements are useful for seeing whether what you expect is actually what you get.

In this particular case, rm "$badname" will not give the desired results because $badname should not be quoted. Placing it in quotes ensures that rm has only one argument (it will match only one filename). A partial fix is to remove to quotes from $badname and to reset $IFS to contain only a newline, IFS=$'\n'. However, there are simpler ways of going about it.
   1 # Correct methods of deleting filenames containing spaces.
   2 rm *\ *
   3 rm *" "*
   4 rm *' '*
   5 # Thank you. S.C.

Summarizing the symptoms of a buggy script,

  1. It bombs with a "syntax error" message, or

  2. It runs, but does not work as expected (logic error).

  3. It runs, works as expected, but has nasty side effects (logic bomb).

Tools for debugging non-working scripts include

  1. echo statements at critical points in the script to trace the variables, and otherwise give a snapshot of what is going on.

  2. using the tee filter to check processes or data flows at critical points.

  3. setting option flags -n -v -x

    sh -n scriptname checks for syntax errors without actually running the script. This is the equivalent of inserting set -n or set -o noexec into the script. Note that certain types of syntax errors can slip past this check.

    sh -v scriptname echoes each command before executing it. This is the equivalent of inserting set -v or set -o verbose in the script.

    The -n and -v flags work well together. sh -nv scriptname gives a verbose syntax check.

    sh -x scriptname echoes the result each command, but in an abbreviated manner. This is the equivalent of inserting set -x or set -o xtrace in the script.

    Inserting set -u or set -o nounset in the script runs it, but gives an unbound variable error message at each attempt to use an undeclared variable.

  4. Using an "assert" function to test a variable or condition at critical points in a script. (This is an idea borrowed from C.)


    Example 30-4. Testing a condition with an "assert"

       1 #!/bin/bash
       2 # assert.sh
       3 
       4 assert ()                 #  If condition false,
       5 {                         #+ exit from script with error message.
       6   E_PARAM_ERR=98
       7   E_ASSERT_FAILED=99
       8 
       9 
      10   if [ -z "$2" ]          # Not enough parameters passed.
      11   then
      12     return $E_PARAM_ERR   # No damage done.
      13   fi
      14 
      15   lineno=$2
      16 
      17   if [ ! $1 ] 
      18   then
      19     echo "Assertion failed:  \"$1\""
      20     echo "File \"$0\", line $lineno"
      21     exit $E_ASSERT_FAILED
      22   # else
      23   #   return
      24   #   and continue executing script.
      25   fi  
      26 }    
      27 
      28 
      29 a=5
      30 b=4
      31 condition="$a -lt $b"     # Error message and exit from script.
      32                           #  Try setting "condition" to something else,
      33                           #+ and see what happens.
      34 
      35 assert "$condition" $LINENO
      36 # The remainder of the script executes only if the "assert" does not fail.
      37 
      38 
      39 # Some commands.
      40 # ...
      41 echo "This statement echoes only if the \"assert\" does not fail."
      42 # ...
      43 # Some more commands.
      44 
      45 exit 0

  5. trapping at exit.

    The exit command in a script triggers a signal 0, terminating the process, that is, the script itself. [2] It is often useful to trap the exit, forcing a "printout" of variables, for example. The trap must be the first command in the script.

Trapping signals

trap

Specifies an action on receipt of a signal; also useful for debugging.

Note

A signal is simply a message sent to a process, either by the kernel or another process, telling it to take some specified action (usually to terminate). For example, hitting a Control-C, sends a user interrupt, an INT signal, to a running program.

   1 trap '' 2
   2 # Ignore interrupt 2 (Control-C), with no action specified. 
   3 
   4 trap 'echo "Control-C disabled."' 2
   5 # Message when Control-C pressed.


Example 30-5. Trapping at exit

   1 #!/bin/bash
   2 # Hunting variables with a trap.
   3 
   4 trap 'echo Variable Listing --- a = $a  b = $b' EXIT
   5 #  EXIT is the name of the signal generated upon exit from a script.
   6 #
   7 #  The command specified by the "trap" doesn't execute until
   8 #+ the appropriate signal is sent.
   9 
  10 echo "This prints before the \"trap\" --"
  11 echo "even though the script sees the \"trap\" first."
  12 echo
  13 
  14 a=39
  15 
  16 b=36
  17 
  18 exit 0
  19 #  Note that commenting out the 'exit' command makes no difference,
  20 #+ since the script exits in any case after running out of commands.


Example 30-6. Cleaning up after Control-C

   1 #!/bin/bash
   2 # logon.sh: A quick 'n dirty script to check whether you are on-line yet.
   3 
   4 
   5 TRUE=1
   6 LOGFILE=/var/log/messages
   7 #  Note that $LOGFILE must be readable
   8 #+ (as root, chmod 644 /var/log/messages).
   9 TEMPFILE=temp.$$
  10 #  Create a "unique" temp file name, using process id of the script.
  11 KEYWORD=address
  12 #  At logon, the line "remote IP address xxx.xxx.xxx.xxx"
  13 #                      appended to /var/log/messages.
  14 ONLINE=22
  15 USER_INTERRUPT=13
  16 CHECK_LINES=100
  17 #  How many lines in log file to check.
  18 
  19 trap 'rm -f $TEMPFILE; exit $USER_INTERRUPT' TERM INT
  20 #  Cleans up the temp file if script interrupted by control-c.
  21 
  22 echo
  23 
  24 while [ $TRUE ]  #Endless loop.
  25 do
  26   tail -$CHECK_LINES $LOGFILE> $TEMPFILE
  27   #  Saves last 100 lines of system log file as temp file.
  28   #  Necessary, since newer kernels generate many log messages at log on.
  29   search=`grep $KEYWORD $TEMPFILE`
  30   #  Checks for presence of the "IP address" phrase,
  31   #+ indicating a successful logon.
  32 
  33   if [ ! -z "$search" ] #  Quotes necessary because of possible spaces.
  34   then
  35      echo "On-line"
  36      rm -f $TEMPFILE    #  Clean up temp file.
  37      exit $ONLINE
  38   else
  39      echo -n "."        #  The -n option to echo suppresses newline,
  40                         #+ so you get continuous rows of dots.
  41   fi
  42 
  43   sleep 1  
  44 done  
  45 
  46 
  47 #  Note: if you change the KEYWORD variable to "Exit",
  48 #+ this script can be used while on-line
  49 #+ to check for an unexpected logoff.
  50 
  51 # Exercise: Change the script, per the above note,
  52 #           and prettify it.
  53 
  54 exit 0
  55 
  56 
  57 # Nick Drage suggests an alternate method:
  58 
  59 while true
  60   do ifconfig ppp0 | grep UP 1> /dev/null && echo "connected" && exit 0
  61   echo -n "."   # Prints dots (.....) until connected.
  62   sleep 2
  63 done
  64 
  65 # Problem: Hitting Control-C to terminate this process may be insufficient.
  66 #+         (Dots may keep on echoing.)
  67 # Exercise: Fix this.
  68 
  69 
  70 
  71 # Stephane Chazelas has yet another alternative:
  72 
  73 CHECK_INTERVAL=1
  74 
  75 while ! tail -1 "$LOGFILE" | grep -q "$KEYWORD"
  76 do echo -n .
  77    sleep $CHECK_INTERVAL
  78 done
  79 echo "On-line"
  80 
  81 # Exercise: Discuss the relative strengths and weaknesses
  82 #!          of each of these various approaches.

Note

The DEBUG argument to trap causes a specified action to execute after every command in a script. This permits tracing variables, for example.


Example 30-7. Tracing a variable

   1 #!/bin/bash
   2 
   3 trap 'echo "VARIABLE-TRACE> \$variable = \"$variable\""' DEBUG
   4 # Echoes the value of $variable after every command.
   5 
   6 variable=29
   7 
   8 echo "Just initialized \"\$variable\" to $variable."
   9 
  10 let "variable *= 3"
  11 echo "Just multiplied \"\$variable\" by 3."
  12 
  13 # The "trap 'commands' DEBUG" construct would be more useful
  14 # in the context of a complex script,
  15 # where placing multiple "echo $variable" statements might be
  16 # clumsy and time-consuming.
  17 
  18 # Thanks, Stephane Chazelas for the pointer.
  19 
  20 exit 0

Of course, the trap command has other uses aside from debugging.


Example 30-8. Running multiple processes (on an SMP box)

   1 #!/bin/bash
   2 # multiple-processes.sh: Run multiple processes on an SMP box.
   3 
   4 # Script written by Vernia Damiano.
   5 # Used with permission.
   6 
   7 #  Must call script with at least one integer parameter
   8 #+ (number of concurrent processes).
   9 #  All other parameters are passed through to the processes started.
  10 
  11 
  12 INDICE=8        # Total number of process to start
  13 TEMPO=5         # Maximum sleep time per process
  14 E_BADARGS=65    # No arg(s) passed to script.
  15 
  16 if [ $# -eq 0 ] # Check for at least one argument passed to script.
  17 then
  18   echo "Usage: `basename $0` number_of_processes [passed params]"
  19   exit $E_BADARGS
  20 fi
  21 
  22 NUMPROC=$1              # Number of concurrent process
  23 shift
  24 PARAMETRI=( "$@" )      # Parameters of each process
  25 
  26 function avvia() {
  27 	local temp
  28 	local index
  29 	temp=$RANDOM
  30 	index=$1
  31 	shift
  32 	let "temp %= $TEMPO"
  33 	let "temp += 1"
  34 	echo "Starting $index Time:$temp" "$@"
  35 	sleep ${temp}
  36 	echo "Ending $index"
  37 	kill -s SIGRTMIN $$
  38 }
  39 
  40 function parti() {
  41 	if [ $INDICE -gt 0 ] ; then
  42 		avvia $INDICE "${PARAMETRI[@]}" &
  43 		let "INDICE--"
  44 	else
  45 		trap : SIGRTMIN
  46 	fi
  47 }
  48 
  49 trap parti SIGRTMIN
  50 
  51 while [ "$NUMPROC" -gt 0 ]; do
  52 	parti;
  53 	let "NUMPROC--"
  54 done
  55 
  56 wait
  57 trap - SIGRTMIN
  58 
  59 exit $?
  60 
  61 : <<SCRIPT_AUTHOR_COMMENTS
  62 I had the need to run a program, with specified options, on a number of
  63 different files, using a SMP machine. So I thought [I'd] keep running
  64 a specified number of processes and start a new one each time . . . one
  65 of these terminates.
  66 
  67 The "wait" instruction does not help, since it waits for a given process
  68 or *all* process started in background. So I wrote [this] bash script
  69 that can do the job, using the "trap" instruction.
  70   --Vernia Damiano
  71 SCRIPT_AUTHOR_COMMENTS

Note

trap '' SIGNAL (two adjacent apostrophes) disables SIGNAL for the remainder of the script. trap SIGNAL restores the functioning of SIGNAL once more. This is useful to protect a critical portion of a script from an undesirable interrupt.

   1 	trap '' 2  # Signal 2 is Control-C, now disabled.
   2 	command
   3 	command
   4 	command
   5 	trap 2     # Reenables Control-C
   6 	

Notes

[1]

Rocky Bernstein's Bash debugger partially makes up for this lack.

[2]

By convention, signal 0 is assigned to exit.