Linux2
Leveraging the power of the Linux command line
Introduction
Roll call: Jonny, Lauren, Emma, Guy, Tim, Rita, SarahS, Jenny
This practical follows Linux1 which introduced the fundamentals of the Linux command line.
During this practical, we will learn how to combine some commands together to create scripts that perform more complex actions.
Getting the content for this practical
The necessary files for this practical are hosted in a version control system. To obtain them, just type the following command:
$ svn export http://source.ggy.bris.ac.uk/subversion-open/linux2/trunk linux2
This will fetch all necessary files and put them in a folder called linux2/. Ignore the cryptic syntax so far, an introduction to version control using subversion (svn) will be given later on.
Output redirection
In the Linux1 practical, we have discovered a few Linux commands. Some of these commands use input from the keyboard (standard input) and output data to the screen (standard output). It is possible to (a) redirect input and output and (b) link commands together to perform complex actions. The files for this section are in the example1 directory.
$ cd ../example1
Redirecting standard input and output
Let's start with a simple example. By default, the diff command outputs to the screen, for instance try:
$ diff file1 file2
This is not convenient if there is a lot of output. It is easy to redirect its output to a file so that the output can be saved for later. This is done by using the sign">":
$ diff file1 file2 > diff12.txt $ diff file2 file3 > diff23.txt
You can then look at the respective files in a text editor or by using more or less.
Now imagine we want to put the outputs of the two diff operations into one single file. Using the syntax above and the same filename will not work as the second call would overwrite the first one. However, it is also possible to append the output of one command to a file. Note the second call below, it uses a double ">>":
$ diff file1 file2 > diff.txt $ diff file2 file3 >> diff.txt
Just remember that a single ">" will overwrite the content of a file, a double ">>" will append.
Note that we could also concatenate the two initial files into one big file rather easily too...
$ cat diff12.txt diff23.txt > diff.txt
In the examples above, we redirected the output to a file. It is also possible to redirect the input although it not used as often as most commands accept a file as an argument. For instance consider the function sort which can be used to ... sort alphabetically the lines in a file. You could specify which file to use by using a "<".
sort < file4
Note that the example above is a bit tedious as sort file4
would work just as well. However, you will probably encounter input redirection sometimes so you might as well know how it is done. Note, you can use the option -n to sort to make to use numerical sorting instead of alphabetical.
Both types of redirection can also be combined:
sort < file4 > file4-sorted.txt
The writing above starts to get complex and leads nicely to the notion os command pipeline which is explained below.
Important note: there are more than just standard input (stdin) and standard output (stdout), there is also standard error (stderr). Which is used by commands to report problems (compiler warnings, errors etc...). It is also possible to redirect standard error, not necessarily to the same place as standard output. This is beyond the scope of this practical.
Pipelines
Most commands we have seen so far are fairly powerful but have a limited scope. This is intentional as the Linux command line allows to create a pipeline of commands to achieve a complex behaviour. For instance, ls is good at listing things and more is good at displaying things so let's pipe them together. This is done by using the pipe sign "|".
$ ls -l ~ | more -> more takes over the window if the output spans more than one screen
This lists the content of your home directory and makes sure the output does not overflow a page. Use space to scroll down. You could substitute more by less also.
The uniq command remove duplicate lines from its input. Let's combine it to sort to really start to tidy up file4.
$ sort file4 | uniq > file4-sorted-and-cleaned.txt
How many times was "Scene" written in the first act of Hamlet? grep can find them and wc can count words and lines so let's combine them:
$ grep -i scene file1 | wc -l 5
5 Scenes, correct!
For the last pipe example let's learn a new useful command. du calculates the size of files and folders given as input. sort can sort things numerically and head can display so to find the 3 biggest files or folder inside our directory, we could do:
$ du --exclude .svn --human-readable ./* | sort -nr | head -n 3 184K ./file5 44K ./file3 44K ./file2
Yes, file5 is bigger. It contains the integrality of hamlet actually! You could use du to find which file are clogging up your file space.
Automating things
Although pipelines can be used to perform complex tasks, they are often difficult to read after a few pipes. To performs more complex task, it is possible to put a list of commands in a file and execute this file.
$ cd ../example2
convert is a small utility from the program Imagemagick which allows the manipulation of images at the command line. For instance, to resize an image to 2000 pixels max and rename it, you could use:
$ convert image-large.jpg -resize 2000 image-2000.jpg
Now let's say you want to scale an image at five different sizes and zip the whole lot. You could enter each command repeatedly but if you use them often, you could also put them together in a file. Have a look at the file create_thumbnails:
$ ls -l create_thumbnails -rwxr-xr-x 1 jp jp 474 2008-02-27 11:58 create_thumbnails
The first thing to notice is that the execute flag is set on this file. If it was not, it could not be executed. Now look at the content. It starts with the shebang, a line specifying which syntax will be used. This is not mandatory but you are advised to put it to make sure the right shell is used. We used the bash shell here.
#!/bin/bash
Then the commands are listed. Nothing new here exept than echo is used to prints things to standard output and zip can be used to create a zip file of a folder. Now try to execute the file. We do that by typing the name of the file and the preceding ./ makes sure we use the one in our directory:
$ ./create_thumbnails Create thumbnails. Move thumbnails. Compress thumbnails. updating: thumbnails/ (stored 0%) updating: thumbnails/image-1000.jpg (deflated 0%) updating: thumbnails/image-100.jpg (deflated 6%) updating: thumbnails/image-10.jpg (deflated 8%) updating: thumbnails/image-500.jpg (deflated 1%) Clean up. All done.
Now use the file explorer to look into thumbnails.zip.
This is a very simple script but already it shows that a simple batch file like this can perform some complex operations and make your life simpler. Let's go a bit further now.
In the images folder, there are a few images and I want a set of thumbnails for each of them. I could use the supplied script which does the following:
- copy the first image into image-large.jpg
- execute the create_thumbnails script from above.
- rename the zip file appropriately
- do the same thing for the next image...
This is done by the file create_all_thumbnails. It is very straightforward. One section requires explanation:
../create_thumbnails 2&>1 /dev/null
Here we execute the script <tt?create_thumbnails by giving its relative path. The scribble that follows means that both the standard output and standard error from create_thumbnails will be redirected to oblivion: /dev/null. So this script is not too verbose. Try removing the 2&>1 /dev/null
to see the difference.
You see that we really are starting to automate things now. But we could do better. A lot better. For instance, we still had to hardcode the name of the images and our current script needs to copy data which could be a expensive operation. It is actually possible to write a script that would loop on all the pictures in the directory automatically but before we get to that, we should really look at script execution and how to control it.
Launching, monitoring and controlling jobs
In this section we will look at how we can control how jobs are running on our Linux machine.
$ cd ../example3
Background, bg, jobs, fg, nohup, top, kill
Shell Scripting
Variables
Conditionals
For loops
Functions
Arithmetic
Putting everything together in one script.
Environment Variables
SHELL PWD PATH (LD_LIBRARY_PATH)
Text Processing
sed, awk.
Managing Data?
du, df, file. Use of symbolic links. tar, zip, gzip, bzip2