Linux2

Leveraging the power of the Linux command line =Introduction=

Roll call: Jonny, Lauren, Emma, Guy, Tim, Rita, SarahS, Jenny, Laura Edwards, Jeff

This practical follows Linux1 which introduced the fundamentals of the Linux command line.

During this practical, we will learn how to combine some commands together to create scripts that perform more complex actions.

= Getting the content for this practical = The necessary files for this practical are hosted in a version control system. To obtain them, just type the following command: $ svn export http://source.ggy.bris.ac.uk/subversion-open/linux2/trunk linux2

This will fetch all necessary files and put them in a folder called linux2/. Ignore the cryptic syntax so far, an introduction to version control using subversion (svn) will be given later on.

= Output redirection = In the Linux1 practical, we have discovered a few Linux commands. Some of these commands use input from the keyboard (standard input) and output data to the screen (standard output). It is possible to (a) redirect input and output and (b) link commands together to perform complex actions. The files for this section are in the example1 directory. $ cd ../example1

Redirecting standard input and output
Let's start with a simple example. By default, the diff command outputs to the screen, for instance try: $ diff file1 file2

This is not convenient if there is a lot of output. It is easy to redirect its output to a file so that the output can be saved for later. This is done by using the sign">":

$ diff file1 file2 > diff12.txt $ diff file2 file3 > diff23.txt

You can then look at the respective files in a text editor or by using more or less.

Now imagine we want to put the outputs of the two diff operations into one single file. Using the syntax above and the same filename will not work as the second call would overwrite the first one. However, it is also possible to append the output of one command to a file. Note the second call below, it uses a double ">>":

$ diff file1 file2 > diff.txt $ diff file2 file3 >> diff.txt

Just remember that a single "></tt>" will overwrite the content of a file, a double ">></tt>" will append.

Note that we could also concatenate the two initial files into one big file rather easily too...

$ cat diff12.txt diff23.txt > diff.txt

In the examples above, we redirected the output to a file. It is also possible to redirect the input although it not used as often as most commands accept a file as an argument. For instance consider the function sort</tt> which can be used to ... sort alphabetically the lines in a file. You could specify which file to use by using a "<</tt>". $ sort < file4

Note that the example above is a bit tedious as  would work just as well. However, you will probably encounter input redirection sometimes so you might as well know how it is done. Note, you can use the option -n</tt> to sort</tt> to make to use numerical sorting instead of alphabetical.

Both types of redirection can also be combined: $ sort file4-sorted.txt

The writing above starts to get complex and leads nicely to the notion os command pipeline which is explained below.

Important note: there are more than just standard input (stdin) and standard output (stdout), there is also standard error (stderr). Which is used by commands to report problems (compiler warnings, errors etc...). It is also possible to redirect standard error, not necessarily to the same place as standard output. This is beyond the scope of this practical.

Pipelines
Most commands we have seen so far are fairly powerful but have a limited scope. This is intentional as the Linux command line allows to create a pipeline of commands to achieve a complex behaviour. For instance, ls</tt> is good at listing things and more</tt> is good at displaying things so let's pipe them together. This is done by using the pipe sign "|</tt>". $ ls -l ~ | more -> more takes over the window if the output spans more than one screen

This lists the content of your home directory and makes sure the output does not overflow a page. Use space to scroll down. You could substitute more</tt> by less</tt> also.

The uniq</tt> command remove duplicate lines from its input. Let's combine it to sort</tt> to really start to tidy up file4</tt>. $ sort file4 | uniq > file4-sorted-and-cleaned.txt

How many times was "Scene" written in the first act of Hamlet? grep</tt> can find them and wc</tt> can count words and lines so let's combine them: $ grep -i scene file1 | wc -l 5

5 Scenes, correct!

For the last pipe example let's learn a new useful command. <tt>du</tt> calculates the size of files and folders given as input. <tt>sort</tt> can sort things numerically and <tt>head</tt> can display so to find the 3 biggest files or folder inside our directory, we could do:

$ du --exclude .svn --human-readable ./* | sort -nr | head -n 3 184K   ./file5 44K    ./file3 44K    ./file2

Yes, file5 is bigger. It contains the integrality of hamlet actually! You could use <tt>du</tt> to find which file are clogging up your file space.

= Automating things = Although pipelines can be used to perform complex tasks, they are often difficult to read after a few pipes. To performs more complex task, it is possible to put a list of commands in a file and execute this file.

$ cd ../example2

<tt>convert</tt> is a small utility from the program Imagemagick which allows the manipulation of images at the command line. For instance, to resize an image to 2000 pixels max and rename it, you could use:

$ convert image-large.jpg -resize 2000 image-2000.jpg

Now let's say you want to scale an image at five different sizes and zip the whole lot. You could enter each command repeatedly but if you use them often, you could also put them together in a file. Have a look at the file <tt>create_thumbnails</tt>:

$ ls -l create_thumbnails -rwxr-xr-x 1 jp jp 474 2008-02-27 11:58 create_thumbnails

The first thing to notice is that the execute flag is set on this file. If it was not, it could not be executed.

Now look at the content. It starts with the shebang, a line specifying which syntax will be used. This is not mandatory but you are advised to put it to make sure the right shell is used. We used the bash shell here.


 * 1) !/bin/bash

Then the commands are listed one after the other, in sequential order. Note that we could put two commands on one line by seperating them with a semi colon.

Nothing new here except than <tt>echo</tt> is used to print things to standard output and <tt>zip</tt> can be used to create a zip file of a folder. Now try to execute the file. We do that by typing the name of the file and the preceding <tt>./</tt> makes sure we use the one in our directory:

$ ./create_thumbnails Create thumbnails. Move thumbnails. Compress thumbnails. updating: thumbnails/ (stored 0%) updating: thumbnails/image-1000.jpg (deflated 0%) updating: thumbnails/image-100.jpg (deflated 6%) updating: thumbnails/image-10.jpg (deflated 8%) updating: thumbnails/image-500.jpg (deflated 1%) Clean up. All done.

Now use the file explorer to look into <tt>thumbnails.zip</tt>.

This is a very simple script but already it shows that a simple batch file like this can perform some complex operations and make your life simpler. Let's go a bit further now.

In the <tt>images</tt> folder, there are a few images and I want a set of thumbnails for each of them. I could use the supplied script which does the following:
 * copy the first image into <tt>image-large.jpg</tt>
 * execute the <tt>create_thumbnails</tt> script from above.
 * rename the zip file appropriately
 * do the same thing for the next image...

This is done by the file <tt>create_all_thumbnails</tt>. It is very straightforward. One section requires explanation:

../create_thumbnails 2&>1 /dev/null

Here we execute the script <tt>create_thumbnails</tt> by giving its relative path. The scribble that follows means that both the standard output and standard error from <tt>create_thumbnails</tt> will be redirected to oblivion: <tt>/dev/null</tt>. So this script is not too verbose. Try removing the  to see the difference.

You see that we really are starting to automate things now. But we could do better. A lot better. For instance, we still had to hardcode the name of the images and our current script needs to copy data which could be a expensive operation. It is actually possible to write a script that would loop on all the pictures in the directory automatically but before we get to that, we should really look at script execution and how to control it.

=Launching, monitoring and controlling jobs=

In this section we will look at which tools exist to control how jobs are running on our Linux machine.

$ cd ../example3

This directory contains a very simple script called <tt>infinite_loop</tt>. Although the script is very simple, we have not covered yet the fundamental aspect of it; the loop. Nonetheless, accept that this script will loop indefinitely. At each iteration of the loop, it will execute: date sleep 2

So basically, it will write the date and time to standard output with <tt>date</tt> and then wait two seconds via <tt>sleep</tt>. It is a very silly thing to do but its characteristics for this section about controlling the execution of scripts are that:
 * it will never stop
 * it will clog up the screen with output after a while.

Try it:

$ ./infinite_loop Thu Feb 28 08:25:57 GMT 2008 Thu Feb 28 08:25:59 GMT 2008 Thu Feb 28 08:26:01 GMT 2008

... the script carries on and on ...

The script is running, it won't stop on its own. How can we stop it?

The easiest way to stop it right now is to press together <tt>CTRL-C</tt>. This send the interruption signal SIGINT. If the script behaves well, it should just stop. Try it.

Thu Feb 28 09:54:45 GMT 2008 Thu Feb 28 09:54:47 GMT 2008 Thu Feb 28 09:54:49 GMT 2008 ... hit CTRL-C and the script will stop ... $

In the example above, the script was outputting to the screen. We already know how to redirect the output to a file but one other trick is to run the script directly in the background so that we don't loose the control of our terminal:

$ ./infinite_loop > dates.txt & [1] 5483

Now the script is running but we cannot see it. What we could do is check that the file <tt>dates.txt</tt> is receiving data:

% tail -f dates.txt Thu Feb 28 09:15:15 GMT 2008 Thu Feb 28 09:15:17 GMT 2008 Thu Feb 28 09:15:19 GMT 2008 Thu Feb 28 09:15:21 GMT 2008 Thu Feb 28 09:15:23 GMT 2008 ... and new entries keep appearing ... ... hit CTRL-C to stop tail...

So our script is running in the background. To list which scripts are running in the background right now, use the command <tt>jobs</tt>:

$ jobs -l [1]+ 5483 Running                ./infinite_loop > dates.txt &

Note that <tt>jobs</tt> will only list the scripts started from a given shell window. If you try <tt>jobs</tt> in another window, it will not list our running script. There are other options for that that we will see later.

There are different options to stop our running script. We could bring it back to the foreground using the command <tt>fg</tt> and then use <tt>CTRL-C</tt> to send the interruption signal SIGINT:

$ fg 1 ./infinite_loop > dates.txt

... now hit CTRL-C and the script will stop ...

$

We could also have killed it directly with the command <tt>kill</tt>:

$ ./infinite_loop > dates.txt & [1] 5990 $ kill -9 5990 $

Several signals can be sent to a process using <tt>kill</tt>. Option 9 is for SIGKILL, the strongest. The equivalent of CTRL-C would be to use  which would send SIGINT also.

We have brought a background job back to the foreground. The opposite is also possible but for this we first need to know how to suspend a job to get back the control of the terminal. This is done by using <tt>CTRL-Z</tt> which sends the suspend signal SIGTSP. We then get the control of the shell back and we can send the job to the background with the command <tt>bg</tt>:

$ ./infinite_loop > dates.txt ... hit CTRL-Z ... [1]+ Stopped                 ./infinite_loop > dates.txt $ bg 1 [1]+ ./infinite_loop > dates.txt & $ jobs -l [1]+ 6030 Running                 ./infinite_loop > dates.txt & $ fg 1 ./infinite_loop > dates.txt ... hit CTRL-C to stop ...

We have gone a long way now. There are still a few very useful commands for job control. The first one is <tt>top</tt> which gives you a summary of what processes are running on your machine and how much resources they consume. It is very useful when you machine grinds to a near halt and you don't know why. You can then find the PID of the CPU greedy processes and kill them. Press Q to exit.

<tt>ps</tt> is in a way similar to <tt>top</tt> except that it only lists processes. Use the syntax  to see only all processes running in your name. Note that by default <tt>ps</tt> is limited to the running shell, like <tt>jobs</tt>. You need to use the option <tt>-u</tt> to see all your processes.

Now you can start and stop jobs and also send them to the background so that you can carry on working. However, if you logout or if you close the shell window (the same thing really), your running jobs will die. This is an issue for jobs that might take days to finish. The trick is to use the command <tt>nohup</tt>. It makes sure that your script will carry on running after you logout. The script will only stop if it finishes, if the machine reboots ... or if it is killed by an admin because it clogs up the machine (this happens too!).

By default, all output is sent to a file <tt>nohup.out</tt> so use redirection to make sure it is sent somewhere appropriate instead.

Let's start our script via <tt>nohup</tt>: $ nohup ./infinite_loop > dates.txt & [2] 7390 $ nohup: ignoring input and redirecting stderr to stdout $

Now, to check all is well, you could:
 * close all shell windows
 * login again
 * check if you script is still running via <tt>ps</tt>
 * kill it

Now that you can stop jobs and control their behaviour, it is time to learn how to build some more advanced shell scripts.

=Shell Scripting=

The <tt>bash</tt> shell allows the creation of complex scripts using conditionals, loops, arithmetics etc... However, keep in mind that shell scripting should only be done when required. Don't program your whole model in <tt>bash</tt>, it would probably be slow, inefficient and hard to maintain. Do use shell scripting to deal with your numerical models, manage input and output for it etc...

The examples for this section are in the <tt>example4</tt> directory.

$ cd ../example4

We could spend hours talking about shell scripts. Instead, as you are already, aware of programming concepts, we will simply see how the main building block of a programs can be built using the <tt>bash</tt> shell.

One thing to keep in mind is that a shell script is interpreted, it is not compiled. So you find problems as you run the script. So be careful as sometimes, errors in the script could have consequences...

Variables
The <tt>bash</tt> shell allows the use of variables. They are not typed as in Fortran so you just declare them and use them as you go. When recalling a variable, use a dollar sign before its name. For instance:

MYVAR=123 echo $MYVAR

One thing to be aware of at this stage is how quoting is handled. Remember that:
 * double quotes expand variable names:  is equivalent to 123
 * single quotes do not expand variable names:  is simply the text string $MYVAR.
 * use back ticks to use a command:

Look at the script <tt>var.sh</tt> and execute it. It illustrate a basic use of variables in a script.

Arithmetic
Now that we can use variables, we can do operations on them. Note that it is not recommended to use the shell to do complex calculations ... but you can do basic operations on variables. There are two main ways of manipulating variables:

MYVAR=3 let MYVAR=MYVAR*9
 * Use the command <tt>let</tt>:

MYVAR=3 MYVAR=$(( MYVAR*9 ))
 * Use "arithmetic expansion":

Both options do exactly the same things. Note that you might need to use double quotes with <tt>let</tt> in the case of complex operations.

The list of operators which can be used is quite large, for instance:

numerical operations: =, +, -, *, /, **, % and their shortcuts:  +=, -=, *=, /+, %= logical operations:   &&, ||, !

Look at the script <tt>arith.sh</tt> and execute it. It illustrate a basic use of arithmetic on variables in a script.

Conditionals
We can use variables and perform operations on them. We have seen earlier how to perform some logical operations. <tt>bash</tt> also allows us to write conditional statements too. The syntax is very simple:

if [ condition ] ; then do something else do something else fi

This is quite simple really. The tricky bit is to write the condition properly. <tt>bash</tt> gives us some useful operators: -eq : is equal to -ne  : is not equal to -le  : is less or equal to -lt  : is less than -ge : is greater or equal to -gt  : is greater than

Here is a simple example: if [ 2 -eq 3 ]; then this will never get done else this will always get done fi

As well as these "standard" tests, <tt>bash</tt> provides some very useful tests for data management. -a : exists -f : exists and is a file -d : exists and is a directory

the particularity of these files is that they have only one operand, for instance: if [ -d folder ]; then cd folder fi

Look at the script <tt>if.sh</tt> and execute it. It illustrate a basic use of tests in a shell script.

Loops
<tt>bash</tt> also support the notion of loops. It is actually rather powerful and can be used to loop on the elements of a directory for instance. The basic syntax is: for VAR in LIST; do do something done;

In the example above, the LIST contains the elements to loop on. It could be given in the script or the result from a command. For instance, the output from <tt>ls</tt> can be used to loop on all elements of a given directory. Then the variable VAR is given the value of an element in the list. It is a standard variable and hence can be accessed at any time via <tt>$VAR</tt>.

It is also possible to create loops based on conditionals. For instance a loop which stops when a condition is not satisfied anymore can be create with <tt>while</tt>: while [ condition ]; do do something done

Note that it is also possible to use <tt>until</tt>.

Look at the script <tt>loop.sh</tt> and execute it. It illustrate a basic use of tests in a shell script.

Arguments
Often we want to give some input to a script so that you don;t have to rewrite it all the time. Shell scripts can accept arguments. The arguments can then be handled inside the script. The first argument is <tt>$1</tt>, the second <tt>$2</tt> etc... <tt>$0</tt> actually contains the name of the script and <tt>$#</tt> is the number of arguments, handy to check before doing operations on the arguments.

Look at the script <tt>loop.sh</tt>. It illustrate a basic use of arguments in a shell script. Run it with some arguments, for instance:

$ simple-args.sh foo bar ./simple-args.sh foo bar foobar ./simple-args.sh expects two arguments: <FILEA> <FILEB>

Of note in that simple example is that we introduced the notion of return value. If the script fails, it returns a value which is not zero so that the problem can be handled appropriately.

Functions
When you are going to perform the same operations many times in a script, it is possible to put this operation in a function and just call the functions. Functions are very simple to use and the only thing to be careful about is that the function declaration must come before its use in the script. The general syntax is:

function { whatever the function does }
 * 1) A simple function

function
 * 1) Main script
 * 2) Call my function

Look at the script <tt>func.sh</tt> and execute it. It illustrate the use of a function inside a shell script.

A final example
To finish this section about scripting, have a look at the directory <tt>flickr_mossaic</tt>. Try to execute the script <tt>create_mossaic</tt> and give it an argument with the extension <tt>.html</tt>. Try to run it more than once also.

$ create_mossaic test.html

Then look at the html file in your web browser.

Although this script is a bit silly, it combines most things you will ever need to do, namely:
 * check the validity of the arguments (if)
 * loop on a number of items (while, for)
 * download material from the Internet
 * perform operations on that material
 * output some text to a file (cat, redirection)

=Environment Variables= The variables we have seen so far lived only inside our shell scripts. It is possible to have persistent variables. They are called environment variables. Actually, there are already plenty of environment variables declared for you. Just type <tt.env</tt> and be surprised at how much is there already.

You can ignore most of them but some of them are really important. They can ve re-used in shell scripts for instance.


 * <tt>HOME</tt> contains the location of your home directory. very useful in scripts.
 * <tt>PWD</tt> contains your current location. very useful in scripts.
 * <tt>PATH</tt> contains all the locations that will be searched when you are trying a command. Can make your life easier.
 * <tt>LD_LIBRARY_PATH</tt> contains the location of the shared libraries on the system. Best left alone but might need modifying in some cases.

Most of these variables should be set in a login script. When using <tt>bash</tt>, the file <tt>.bash_profile</tt> is executed once when you login and <tt>.bashrc</tt> is run everytime you start a new shell window. Therefore you could put your configuration in <tt>.bashrc</tt>. For instance, let's assume you have a directory bin in your home folder in which you put useful scripts that you use very often. You could add this directory to your <tt>PATH</tt> so that you don't have to re-type the full path of the script all the time. You could put in <tt>.bashrc</tt>:

PATH=$PATH:$HOME/bin export PATH
 * 1) Local bin folder containing all my useful scripts

So <tt>$HOME/bin</tt> is appended to <tt>PATH</tt> and a semi-colon inserted for separation of the directories. Note also the export command which makes sure that <tt>PATH</tt> becomes an environment variable, i.e. it is not destroyed when the script finishes.

To test the changes, create the <tt>bin</tt> folder and put a script in it. Then modify <tt>.bashrc</tt> accordingly. Finally either re-open a new shell window or type. The script in <tt>bin</tt> should be available automatically now!

= Useful commands not covered in this practical = We could not cover everything in two practicals and you know enough to self learn a lot about the Linux command line now. below is a list of commands that you might need sometimes so it's nice to look at what they do now ...

Text Processing
<tt>sed</tt>, the stream editor is great to manipulate text. want to list all the files ending in .jpg but without the extension?, try: $ ls *.jpg | sed 's/\.jpg//g'

<tt>awk</tt> is more apt to manipulate column type data.

<tt>cut</tt> and <tt>paste</tt> can also be useful.

Managing Data
Managing data and file space will consume too much of your time. Use <tt>df</tt> to know how full the hard disks are.

When you need room, you can delete stuff but you can also compress data. There are a few (!) compression utilities installed on our Linux machines:
 * <tt>tar</tt>
 * <tt>zip</tt>
 * <tt>gzip</tt>

<tt>zip</tt> is nice as the zipped files can be unzipped under Microsoft Windows very easily. It's rather easy to use to compress a folder:

zip folder folder -> will compress folder and its content into a zip file called folder.zip unzip folder.zip  -> will uncompress the zip file folder.zip

To compress data further, use <tt>tar</tt> and combine it with <tt>gzip</tt> (option <tt>z</tt> in the <tt>tar</tt> arguments): tar cfvz folder.tar.gz folder -> will compress folder and its content into a file called folder.tar.gz tar xvfz folder.tar.gz       -> will uncompress the file folder.tar.gz

= To go further = The Pragmatic Programming course continues with an practical about more advanced features of the Fortran programming language: Fortran2.

Now you should have some solid foundations about Linux commands. Remember to use <tt>man</tt> and <tt>info</tt> to find help.

There is a very extensive <tt>bash</tt> user guide available at http://steve-parker.org/sh/sh.shtml and remember that for technical issues, Google is your friend...