rc.local
doesn’t tolerate errors.
rc.local
doesn’t provide a way to intelligently recover from errors. If any command fails, it stops running. The first line, #!/bin/sh -e
, causes it to be executed in a shell invoked with the -e
flag. The -e
flag is what makes a script (in this case, rc.local
) stop running the first time a command fails within it.
You want rc.local
to behave like this. If a command fails, you do not want it to continue on with whatever other startup commands might be relying on it having succeeded.
So if any command fails, the subsequent commands will not run. The problem here is that /script.sh
did not run (not that it failed, see below), so most likely some command before it failed. But which one?
Was it /bin/chmod +x /script.sh
?
No.
chmod
runs fine at any time. Provided the filesystem that contains /bin
was mounted, you can run /bin/chmod
. And /bin
is mounted before rc.local
runs.
When run as root, /bin/chmod
rarely fails. It will fail if the file it operates on is read-only, and mightfail if the filesystem it’s on doesn’t supported permissions. Neither is likely here.
By the way, sh -e
is the only reason it would actually be a problem if chmod
failed. When you run a script file by explicitly invoking its interpreter, it doesn’t matter if the file is marked executable. Only if it said /script.sh
would the file’s executable bit matter. Since it says sh /script.sh
, it doesn’t (unless of course /script.sh
calls itself while it runs, which could fail from it not being executable, but it’s unlikely it calls itself).
So what failed?
sh /home/incero/startup_script.sh
failed. Almost certainly.
We know it ran, because it downloaded /script.sh
.
(Otherwise, it would be important to make sure it did run, in case somehow /bin
was not in PATH—rc.local
doesn’t necessarily have the same PATH
as you have when you’re logged in. If /bin
weren’t in rc.local
‘s path, this would require sh
to be run as /bin/sh
. Since it did run, /bin
is in PATH
, which means you can run other commands that are located in /bin
, without fully qualifying their names. For example, you can run just chmod
rather than /bin/chmod
. However, in keeping with your style in rc.local
, I’ve used fully qualified names for all commands exceptsh
, whenever I am suggesting you run them.)
We can be pretty sure /bin/chmod +x /script.sh
never ran (or you would see that /script.sh
was executed). And we know sh /script.sh
wasn’t run either.
But it downloaded /script.sh
. It succeeded! How could it fail?
Two Meanings of Success
There are two different things a person might mean when s/he says a command succeeded:
- It did what you wanted it to do.
- It reported that it succeeded.
And so it is for failure. When a person says a command failed, it could mean:
- It did not do what you wanted it to do.
- It reported that it failed.
A script run with sh -e
, like rc.local
, will stop running the first time a command reports that it failed. It doesn’t make a difference what the command actually did.
Unless you intend for startup_script.sh
to report failure when it does what you want, this is a bug in startup_script.sh
.
- Some bugs prevent a script from doing what you want it to do. They affect what programmers call its side effects.
- And some bugs prevent a script from reporting correctly whether or not it succeeded. They affect what programmers call its return value (which in this case is an exit status).
It’s most likely that startup_script.sh
did everything it should, except reported that it failed.
How Success Or Failure Is Reported
A script is a list of zero or more commands. Each command has an exit status. Assuming there are no failures in actually running the script (for example, if the interpreter couldn’t read the next line of the script while running it), the exit status of a script is:
0
(success) if the script was blank (i.e., had no commands).N
, if the script ended as a result of the commandexit N
, whereN
is some exit code.- The exit code of the last command that ran in the script, otherwise.
When an executable runs, it reports its own exit code–they’re not just for scripts. (And technically, exit codes from scripts are the exit codes returned by the shells that run them.)
For example, if a C program ends with exit(0);
, or return 0;
in its main()
function, the code 0
is given to the operating system, which provides it to calling process (which may, for example, be the shell from which the program was run).
0
means the program succeeded. Every other number means it failed. (This way, different numbers can sometimes refer to different reasons the program failed.)
Commands Meant to Fail
Sometimes, you run a program with the intention that it will fail. In these situations, you might think of its failure as success, even though it is not a bug that the program reports failure. For example, you might use rm
on a file you suspect already does not exist, just to make sure it’s deleted.
Something like this is probably happening in startup_script.sh
, just before it stops running. Thelast command to run in the script is probably reporting failure (even though its “failure” might be totally fine or even necessary), which makes the script report failure.
Tests Meant to Fail
One special kind of command is a test, by which I mean a command run for its return value rather than its side effects. That is, a test is a command that is run so that its exit status can be examined (and acted upon).
For example, suppose I forget if 4 is equal to 5. Fortunately, I know shell scripting:
if [ 4 -eq 5 ]; then
echo "Yeah, they're totally the same."
fi
Here, the test [ -eq 5 ]
fails because it turns out 4 ≠ 5 after all. That doesn’t mean the test didn’t perform correctly; it did. It’s job was to check if 4 = 5, then report success if so, and failure if not.
You see, in shell scripting, success can also mean true, and failure can also mean false.
Even though the echo
statement never runs, the if
block as a whole does return success.
However, supposed I’d written it shorter:
[ 4 -eq 5 ] && echo "Yeah, they're totally the same."
This is common shorthand. &&
is a boolean and operator. A &&
expression, which consists of &&
with statements on either side, returns false (failure) unless both sides return true (success). Just like a normal and.
If someone asks you, “did Derek go to the mall and think about a butterfly?” and you know Derek didn’t go to the mall, you don’t have to bother figuring out if he thought of a butterfly.
Similarly, if the command to the left of &&
fails (false), the whole &&
expression immediately fails (false). The statement on the right side of &&
is never run.
Here, [ 4 -eq 5 ]
runs. It “fails” (returning false). So the whole &&
expression fails. echo "Yeah, they're totally the same."
never runs. Everything behaved as it should be, but this command reports failure (even though the otherwise equivalent if
conditional above reports success).
If that were the last statement in a script (and the script got to it, rather than terminating at some point before it), the whole script would report failure.
There’s lots of tests besides this. For example, there are tests with ||
(“or”). However, the above example should be sufficient to explain what tests are, and make it possible for you to effectively use documentation to determine if a particular statement/command is a test.
sh
vs. sh -e
, Revisited
Since the #!
line (see also this question) at the top of /etc/rc.local
has sh -e
, the operating system runs the script as though it were invoked with the command:
sh -e /etc/rc.local
In contrast, your other scripts, such as startup_script.sh
, run without the -e
flag:
sh /home/incero/startup_script.sh
Consequently they keep running even when a command in them reports failure.
This is normal and good. rc.local
should be invoked with sh -e
and most other scripts–including most scripts run by rc.local
–should not.
Just make sure to remember the difference:
- Scripts run with
sh -e
exit reporting failure the first time a command they contain exits reporting failure.It’s as though the script were a single long command consisting of all the commands in the script joined with
&&
operators. - Scripts run with
sh
(without-e
) continue running until they get to a command that terminates (exits out of) them, or to the very end of the script. The success or failure of every command is essentially irrelevant (unless the next command checks it). The script exits with the exit status of the last command run.
Helping Your Script Understand It’s Not Such a Failure After All
How can you keep your script from thinking it failed when it didn’t?
You look at what happens just before it’s done running.
- If a command failed when it ought to have succeeded, figure out why, and fix the problem.
- If a command failed, and that was the right thing to happen, then prevent that failure status from propagating.
- One way to keep a failure status from propagating is to run another command that succeeds.
/bin/true
has no side effects and reports success (just as/bin/false
does nothing and also fails). - Another is to make sure the script is terminated by
exit 0
.That’s not necessarily the same thing as
exit 0
being at the end of the script. For example, there might be anif
-block where the script exits inside.
- One way to keep a failure status from propagating is to run another command that succeeds.
It’s best to know what causes your script to report failure, before making it report success. If it really is failing in some way (in the sense of not doing what you want it to do), you don’t really want it to report success.
A Quick Fix
If you can’t make startup_script.sh
exit reporting success, you can change the command in rc.local
that runs it, so that command reports success even though startup_script.sh
did not.
Currently you have:
sh /home/incero/startup_script.sh
This command has the same side effects (i.e., the side effect of running startup_script.sh
), but always reports success:
sh /home/incero/startup_script.sh || /bin/true
Remember, it’s better to know why startup_script.sh
reports failure, and fix it.
How the Quick Fix Works
This is actually an example of a ||
test, an or test.
Suppose you ask if I took out the trash or brushed the owl. If I took out the trash, I can truthfully say “yes,” even if I don’t remember whether or not I brushed the owl.
The command to the left of ||
runs. If it succeeds (true), then the right-hand side doesn’t have to run. So if startup_script.sh
reports success, the true
command never runs.
However, if startup_script.sh
reports failure [I didn’t take out the trash], then the result of /bin/true
[if I brushed the owl] matters.
/bin/true
always returns success (or true, as we sometimes call it). Consequently, the whole command succeeds, and the next command in rc.local
can run.
An additional note on success/failure, true/false, zero/nonzero.
Feel free to ignore this. You might want to read this if you program in more than one language, though (i.e., not just shell scripting).
It is a point of great confusion for shell scripters taking up programming languages like C, and for C programmers taking up shell scripting, to discover:
- In shell scripting:
- A return value of
0
means success and/or true. - A return value of something other than
0
means failure and/or false.
- A return value of
- In C programming:
- A return value of
0
means false.. - A return value of something other than
0
means true. - There is no simple rule for what means success and what means failure. Sometimes
0
means success, other times it means failure, other times it means that the sum of the two numbers you just added is zero. Return values in general programming are used to signal a wide variety of different sorts of information. - A program’s numeric exit status should indicate success or failure in accordance with the rules for shell scripting. That is, even though
0
means false in your C program, you still make your program return0
as its exit code if you want it to report it succeeded (which the shell then interprets as true).
- A return value of