Difference between revisions of "StartingC"
Line 3: | Line 3: | ||
=Introduction= | =Introduction= | ||
+ | |||
+ | Welcome to 'StartingC', a tutorial aimed at getting you up and running with the C programming language. The tutorial is split into sections in each of which there will be some explanatory text (maybe even some diagrams!) but most importantly some working example programs that you can easily download and run. To get these, just cut and paste the text below onto your command line: | ||
<pre> | <pre> | ||
svn co http://source.ggy.bris.ac.uk/subversion-open/startingC/trunk ./startingC | svn co http://source.ggy.bris.ac.uk/subversion-open/startingC/trunk ./startingC | ||
</pre> | </pre> | ||
+ | |||
+ | We will not be assuming any previous programming experience, just an enquiring mind and the rudiments of using the command line on a Linux-based computer. If you have any doubts about the latter, take a look at the [Linux1] tutorial, which is also part of the 'pragmatic programming' set. | ||
=A Quintessential First Program= | =A Quintessential First Program= |
Revision as of 09:15, 26 August 2009
startingC: Learning the C Programming Language
Introduction
Welcome to 'StartingC', a tutorial aimed at getting you up and running with the C programming language. The tutorial is split into sections in each of which there will be some explanatory text (maybe even some diagrams!) but most importantly some working example programs that you can easily download and run. To get these, just cut and paste the text below onto your command line:
svn co http://source.ggy.bris.ac.uk/subversion-open/startingC/trunk ./startingC
We will not be assuming any previous programming experience, just an enquiring mind and the rudiments of using the command line on a Linux-based computer. If you have any doubts about the latter, take a look at the [Linux1] tutorial, which is also part of the 'pragmatic programming' set.
A Quintessential First Program
OK, now that we have the example code, let's get cracking and run our first C program. First of all, move into the example directory:
cd startingC/examples/example1
We'll use of a Makefile for each example, so as to make the build process painless (hopefully!). All we need do is run make (see the [make tutorial about make] if you're interested in this further):
make
Now, we can run the classic program:
./hello.exe
and you should get the friendly response:
hello, world!
Bingo! We've just surmounted, in some ways, our biggest step--running our first C program. Programming is like playing with mechano or lego. Remember how much fun it was to assemble all those building blocks into something new and fascinating? We've just built our first model, and the rest of the toy box awaits, so let's get stuck in!
Types & Operations
Buoyed with confidence from our first example, let's march fearlessly onwards into the realm of variable types and basic operations. To do this, move up and over to the directory example2 and type make to build the example programs:
cd ../example2 make
Take a look inside types.c and after the start of the main function, you'll see a block of variable declarations:
char nucleotide; /* A, C, G or T for our DNA */ int numPlanets; /* eight in our solar system - poor old Pluto! */ float length; /* e.g. 1.8288m, for a 6' snooker table */ double accum; /* an accumulator */
C, like many languages (e.g. Fortran), requires that variables must be declared to be of a certain type before they can be used, and here we see examples of four intrinsic types provided by the language. It's a very good habit to comment all your variable declarations, and here the comments pretty much explain what the various types are. double is a double precision--twice the storage space of a float--floating point number. The extra space make a double a good choice for an accumulator where you want to minimise rounding errors and avoid under- and overflow as best as possible. (The Fortran programmers amongst us will note, with a whince, the absence of an implicit type for complex numbers. Those reeling from this revelation will be comforted by the knowledge that C++ contains a complex class.)
Various types can be given further qualifiers, such as short, long, signed and unsigned:
short int mini; /* typically two bytes */ long int maxi; /* typically eight bytes */ signed char cSigned; /* one byte, values in the range [-128:127] */ unsigned char cUsigned; /* values in the range [0:255] */
The const keyword is also very useful for, well, declaring constants. In invaluable intrinsic (aka built-in) function when pondering the amount of memory assigned to a variable is sizeof().
In addition to single entities of various types, we can also declare arrays of the self-same intrinsics. The syntax for this is along the lines of:
char cStr[20]; /* a character array/string of 20 chars */ int iMat2d[3][3]; /* a 2-dimensional matrix of integers - 3x3 */
You'll see a good deal more of accessing the various elements of an array in later examples, but for now be satisfied with the knowledge that array indices start at 0 in C (yes, that's right Fortraners, that's zero, not 1) and that the syntax for array access is, e.g.:
cStr[0] = 'h'; /* first elememt set to ascii char code for 'h' */
Enumerated types can be a useful way to map (a list of) symbolic names to integer values.
Now that you've read it through, run the program and satisfy yourself that it all works as you expect it to. To run the program, type:
./types.exe
Shifting our attention to operations.c, let's consider some basic operations that C supports. This is the start of the doing things part.
The first block of code here gives an arithmetic example--how to calculate the volume of an oblate spheroid that happens to be close to all our hearts, our shared home Earth:
val = (4.0/3.0) * pi * pow(equi_rad,2) * pol_rad;
I won't dwell on this as I'm confident that the syntax is self-explanatory, save to mention that the function pow comes from the built-in library of math functions.
Next up, you'll see the decrement and increment operators:
--numPlanets; ++numPlanets;
also self-explanatory.
C provides the logic operators, == (is equal), != (not equal), && (AND) and || (OR); as well as the relationals, > (greater than), < (less than), >= (greater than or equal) and <= (less than or equal).
An operation that you will become keenly aware of--especially working in scientific computing--is the ability to temporarily convert the a variable from one type to another on-the-fly. This is known as casting. Two examples of this are:
(short int) pi (float) 42
where, in the first, we convert pi into a (short) integer and convert 42 into a floating point number in the second. Note that the cast does not effect the original variable in any way. i.e. the value given to the variable called pi is not changed through using the cast.
One last class of operations for now are the bitwise operators. These give you very low-level control over the bytes associated with variables, should you need that. For example, we can perform a bitwise AND on the two bytes 01001000 and 10111000, yielding 00001000 when all the bit pairs are considered in turn according to the criteria:
INPUT | OUTPUT | |
A | B | A AND B |
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
To run the second program, type:
./operations.exe
Now, it's very important that you muck around with these example programs as much as possible! Ideally, so much so that you break them! We never learn as much as when we make a mess of things, and since these are just toy programs, you may as well go for it! If you get in a pickle, you can get the original programs back with a quick waft of the Subversion wand:
svn revert *
Exercises
types.c
- declare a character array sufficient to record the state of a game of naughts and crosses, populate is and print it to the screen.
- How many bytes is used to store a long double?
- You can give an initial value to a character array when you declare it (e.g. char cStr[20] = "xxxxxxxxxxxxxxxxxxxx";). What happens if we leave '\0' out of character assignments in this case?
operations.c
- 29.2% of the Earth's surface is land. How much is this in square kilometers?
- Logic is perilous. Can you think of a time when we say "or", but really mean logical AND?
- What happens if you cast the character '9' to an int?
Conditionals & Loops
OK, we have types and operators under our belts. This C malarky isn't too bad, eh? Let's take a look at some stalwarts of the procedural family of languages--conditionals and loops. As we will start all our sections, move up and over to the example3 directory and build the program(s) therein:
cd ../example3 make
Looking inside flow.c, our first block shows how we can make many way decisions using if tests and the else catch-all:
if ( temperature < 0.0 ) { printf("Water would normally freeze at:\t%f\tdegrees C\n"); } else if (temperature > 100.0 ) { printf("Water would normally boil at:\t%f\tdegrees C\n"); } else { printf("The temperature must be in the range [0.0,100.0]\n"); }
This is all very nice and self-explanatory. Typically you would use the above for a decision point that could follow one of 3 or less branches. If you have more than 3 branches, the switch statement is likely to be more concise and easier for you and your fellow developers to read:
switch (iCount) { case 0: printf("case 0: nada, zip, nowt.\n"); break; case 1: printf("case 1: uno, sole, unitary.\n"); break; ... default: /* a default protects against 'fall through' bugs */ printf("default: mucho, many, lashings.\n"); break; }
The default case is much like our else catch-all in the box above and is important to include as otherwise you will be vulnerable to a 'fall-through' bug. This is when none of the cases trigger because we did not consider the actual value passed to switch(). You will also notice the break statements in all the cases. Adding these is also a defensive maneouvre, since we could accidentally trigger two cases. Case 4 and the default, say.
Moving on. The for is an oft used tool on the work bench:
for(ii=0; ii<iMax; ii++) { if (ii == 3) { printf("Surprise!\n"); continue; /* jumps to the start of the next iteration */ } printf("Yup, I'm in a for loop, whizz-oh. Counter is:\t%d\n", ii); }
It's tidy, succinct and gets the job done. Note that we've nested an if statement inside our loop. The continue statement is a useful way to skip the rest of an iteration, if it's superfluous.
Sometimes, however, we don't know ahead of time how many iterations of a loop will be required. We can't use a for loop in this case and the while loop steps into the breach for us. For example:
while (ii > threshold) { printf("%d\t> threshold, continuing..\n",ii,threshold); ii = rand(); /* get next random number */ printf("next random value:\t%d\n", ii); }
In this case we keep testing to see if ii is greater than the threshold. If it is, then we go around the loop one more time, acquiring a new value for ii along the way. We loop back to the top, re-test against the threshold and so on. The loop will only terminate when ii is less than the threshold, i.e. when the while test fails, so watch out for those infinite loops!
To run the example program, type:
./flow.exe
Exercises
- Can you nest an if within another if and what would be the point? Indeed can you have an if within a loop, within an if..?
- What happens if you remove break statements from the switch construct?
- Can you write a for loop that counts down rather than up? What about in steps of 2, or 3?
- Can you increment more than one variable in a for loop?
- Can you have multiple tests conditions in a loop?
- What's the simplest infinite loop you can write? Do you know how to abort a program?!
- Can you sabotage the counting in a for loop? Is there a way to protect against such a bug?
The C Preprocessor
Up until now, we've been studiously ignoring the lines beginning with # at the start of our programs. The time has come, however, to look these statements square in the eyes!..
cd ../example4 make
So far we've glanced upon constructs such as:
#include <stdio.h>
Lines starting with a # form instructions to the C preprocessor. We can think of the preprocessor as a form of cut & paste. In our example, the preprocessor will replace our #include line with the contents of the system header file, stdio.h. Why are we doing this? Well, we wish use some of the standard input/output library functions, such as printf() in our program and the header file contains the function prototypes. The compiler needs these prototypes to make sure that we are calling the functions correctly and thus to produce a working executable or compile-time error--whichever is appropriate.
We'll look at header files in more detail when we come to write our own functions.
We can do a good deal more than just including header files, however. For one, we can use a #define statement to set global constants. Take a look inside macros.c and notice how we have specified the size of our character array, called cStr. Outside of the main program we have:
#define MAXSTR 25
Inside the main program, we then make use of our new symbol in our variable declarations block:
char cStr[MAXSTR]; /* a character array with size set globally */
We can arrange to loop over the contents of that array using:
for(ii=0;ii<MAXSTR;ii++) { cStr[ii] = 'c'; }
To run the program, type:
./macros.exe
as per usual.
Arranging conditional compilation is perhaps the most useful aspect of the preprocessor. Further down in the main program we have the conditional code block:
#ifdef DEBUG printf("DEBUG is ON\n"); printf("I'm going to print out a lot more information\n"); printf("Boy-oh-boy am I going to have a lot to say!\n"); #endif
which we can activate through the use of an appropriate compiler flag. In order to do this, uncomment the line:
#CFLAGS=-DDEBUG
in the Makefile, retype make, and re-run.
The preprocessor gives us yet more possibilities, with constructs such as:
#if SYSTEM == WIN32 #include <win.h> #elif SYSTEM == LINUX #include <linux.h> #else #include <default.h> #endif
However, a word of caution It is wise not to overuse the preprocessor. For example:
- It may be better to use a const variable declaration, rather than a global #define.
- Conditional compilation can be useful, but if you can use run-time switches in your code instead, you will not have to keep re-compiling your programs when you want to vary a parameter, say.
If you're keen, you can see a good use of the preprocessor for setting function names in mixed Fortran-C programming.
Note that we now have 3 distinct stages en route to producing an executable program:
- The preprocessor step: cut & paste.
- Compilation: taking source code and creating object code.
- Linkage: Linking object files and possibly libraries together to give an executable.
Exercises
- Vary the size of the character array. Note that you'll have to re-compile your program each time.
- Invent a new block of conditionally-compiled code and make the appropriate changes to the Makefile to bring it into effect.
- Experiment with the additional #ifdef and #ifndef preprocessor statements.
Functions
So, onto functions. What are these and why do we use them?
Well, we can think of a function, in some ways, as a black box--we feed in inputs and it returns outputs. An example would be a trigonometric function, such as the sine function. If we input [math]\displaystyle{ \pi/2 }[/math] radians, we'll get 1 as the output; input [math]\displaystyle{ \pi }[/math] and we'll get 0 back. We're not limited to just mathematical functions in C, however. We can write pretty much any function we like! We'll see many examples cropping up from here onwards.
OK, so much for a function's general form. Motivation-wise, if you need to do something more than once in a program, you should write a function to do it. That way, you just call your function whenever you need to perform that task. That strategy will give us concise programs as we don't need to duplicate any lines of code. Another benefit is that duplicate lines of code is a bug waiting to happen! Why? Well, there is a good chance that if you modify one of those lines of code, you'll forget to change the other. We're humans, after all. We err. Now, those lines are no longer identical and so will no longer do the same thing--tada, your bug.
Now repeat after me, never duplicate any code---write a function.
Another reason for writing a function, even if you don't call it more than once, is that breaking down your program into functional units will make it much easier to read and understand. This should be your #1 design criterion for any piece of code that you write.
OK, with the preamble out the way, let's take a look at an example:
cd ../example5 make
Inside funcs1.c, you'll see that we compute the volumes for all the planets in the solar system, rather than just for Earth. Accordingly, we bundle the volume calculation into a function of it's own:
double volume(double equitorial_rad, double polar_rad) { /* local variables */ const double pi = 3.14159265; double retval; /* the calcs */ retval = (4.0/3.0) * pi * pow(equitorial_rad,2) * polar_rad; /* functions typically return a value */ return retval; }
and call it a number of times as we cycle through the planets:
for(ii=0; ii<NumPlanets; ii++) { val = volume(equi_rad[ii], pol_rad[ii]); printf("the volume of the planet is:\t%f\tkms cubed\n", val); }
Note also the presence of a function prototype near the top of the file:
double volume(double equitorial_rad, double polar_rad);
We'll need one of those for each function that we write. To run the program, type:
./funcs1.exe
The eagle-eyed amongst you will have noticed that the const variable pi is no longer needed in the main program unit. Also, the variable val is declared inside the main program unit and the function. Both of these things allude to something called scope, and in particular that variables declared inside a function are only known to that function. This rule also applies to the main function. Thus, if we want to pass values between functions, we must use arguments and return values. (Deliberately ignoring global values, as they are typically considered to be bad news.)
While we're on the topic of C function arguments; they are what's known as passed-by-value. Beware, this contrasts with passed-by-reference, as used in Fortran, for example. What's the significance of this? Well, in C a copy of the value of the agument is passed into a function. That means that you can do anything you like to the it's value inside the function, but it will all be forgotten upon exiting. Pass--by-reference means that the actual memory address of the argument is passed into the routine and so any changes to it's value will stick. We'll look at this topic more closely when we consider memory addresses and pointers later on.
Exercises
- Modify funcs1.c so that the equitorial radius argument is zeroed inside the function. Write a second loop to investigate the consequences of that inside the main program.
- Write an additional function to calculate the surface area of a planet and print the results of applying that function too. (See [1] for the formula.)
A light hearted interlude. Functions can call themselves. Neat! It's called recursion and can be both elegant and powerful. Neater! The classic example is the Fibonacci series, and who am I to buck the trend? We'll it does lend itself to beautiful shapes:
Take a look in fibonacci.c and try running it (fibonacci.exe) using various function inputs.
In truth, more interesting examples of recursion crop up when we consider more advanced data structures, such as binray trees, so we'll save some of the good stuff until then.
Pointers and Allocatable Memory
OK, now we're talking. Now we're getting to the marrow of the language. Once you're comfortable with this material, the world will be your oyster! So, without further ado, let's wade in:
cd ../example6 make
Looking inside pointers.c, we don't have to wait long to see something new. In the variable declaration block we see:
int iNum; /* just a plain old integer */ int* iAddr = NULL; /* a 'pointer' to an integer - intriguing! */
iNum we're happy with, just a common-or-garden integer. iAddr is a new species, however, and is a pointer to an integer. Said another way, the value of iAddr is the memory address of iNum. We can draw an analogy between memory addresses and pigeon holes, where each pigeon hole is labelled with a unique (integer) number--it's address. Diagrams can often be helpful. Here's one where b is our plain old integer, given a value of 17 and a is used to store the memory address of b, i.e. a is a pointer to b:
Now, we can explore the consequences of this relationship in our program. Let's give iNum a value, and set iAddr to point to iNum. We use the & symbol to get the address of a variable:
iNum = 3; /* first, we'll set the value of iNum */ iAddr = &iNum; /* now we set iAddr to the 'address' of iNum */
If we print iNum, we'll obviously see that it has a value of 3. We can also follow the pointer iAddr, and see what value is stored in the memory address that it's pointing to. This is known as dereferencing and has the symbol is *. Since this is the value of iNum, it will, of course, always yield 3:
printf("*iAdrr (dereference) has the value:\t%d\n", *iAddr);
If we change the value of iNum, we'll see a corresponding change in *iAddr. Also, if we assign *iAddr to a new value, we'll see the value of iNum follow suit, since they are one and the same:
*iAddr = 17; printf("set *iAdrr to:\t\t\t\t%d\n", *iAddr); printf("and sure enough iNum has the value:\t%d\n", iNum);
We've seen arrays before, but until now, we haven't witnessed their special relationship with pointers: An array is a contiguous chunk of memory which we can reference through the address of it's first element. Further, we can access subsequent elements of an array through the use of pointer arithmetic. What does this all mean? Well, it's perhaps best explained with an example:
iAddr = &data[0]; for (ii=0; ii<MAXCOUNT; ii++) { printf("data[ii] or equivalently *(iAddr+ii) is:\t%d\n", *(iAddr+ii)); }
Here, we set iAddr to point to the first element of the array called data. We then loop through all the elements of the array printing values; first through the use of the familiar square bracket syntax ([]); and secondly via a pointer and increments upon the base address.
The last chunk of code in the program illustrates the use of dynamic memory allocation. Up until now, we've specified the size of all our variables at compile-time. Sometimes, however, we don't know how much space we'll need to store something ahead of time (perhaps we want to read a file which changes in length from one day to another). The ability to allocate memory on the fly will help us tremendously in this situation. Another situation where dynamic memory allocation will help is when we have too much data to store it all in main memory at the same time. In this situation, we can allocate some space, fill it with some of our data, work on it, free the space and then move on to work on another chunk of data.
You'll recall that arrays are chucks of contiguous memory referenced through the address of the first element. If we declare a pointer to the variable type we're interested in and then invoke a command to pair it with a chuck of memory, we'll be in business, right? Enter the malloc function:
iAddr = (int *)malloc(sizeof(int)*MAXCOUNT);
On one line, we've rather neatly requested a chunk of contiguous memory with malloc. We've requested the number of bytes required to store an integer, multiplied the number of elements we desire in our new array. The malloc function returns a general purpose pointer to void (void *), which we've quickly cast to be a pointer to an integer so that the RHS matches the LHS in our assignment to trusty old iAddr. Bingo, we have our new array; arranged on-the-fly, primed and ready to go!
What's even better is that we can access it like any other array:
for (ii=0; ii<MAXCOUNT; ii++) { printf("data[ii] is:\t%d\tiAddr[ii] is:\t%d\n", data[ii], iAddr[ii]); }
Here's a diagram illustrating the situation:
We should always clean up after ourselves and free any memory that we've allocated:
if(iAddr != NULL) { free(iAddr); }
making sure that we don't try to free any memory that we have not allocated, as that will cause our program to crash.
Pointers are very versatile and can be used to construct more advanced data structures such as binary trees and linked lists.
Exercises
- Allocate an array of doubles. Set all the elements to 0.0 and then set every third element to 1.0 in a loop.
- Allocate a 2-dimensional array of integers. To do this, you'll need a pointer-to-a-pointer, e.g. int **my2dArray. You'll have to allocate an array of integer pointers (sizeof(int *) will be handy) and set my2dArray to point to the first element of that. Finally you should loop over all the elements your array of int pointers and set them to point to freshly allocated blocks of memory to hold the ints themselves. Once you've done that, you deserve a chocolate bar! Be careful about how you free up all that memory, or you'll leave some chunks stranded (and create what's called a memory leak in the process).
- What happens when you mix up your types when setting pointers to other variables?
Header Files
Header files crop up when programs get larger and we want to split our code over multiple files. They provide a way for the compiler to check that all the pieces are consistent before ploughing on and creating an executable.
In this example, we've re-worked our program to calculate the volume of the planets in the solar system, splitting the various bits of functionality over several files:
cd ../example7 make
Inside the directory you'll find the files, main.c, calcs.c and io.c, together with the header files, calcs.h and io.h. Take a few minutes to peruse each in turn. You'll see that we moved the function prototypes out the the header files and used the #include directive to include those prototypes wherever they are needed. For example the prototype for the arithmetic function volume is need to compile both the function itself, written in calcs.c, and also the main function, where the volume is 'called'. (Note, to 'call' a function is to request that it is executed with a specific list of arguments.)
A quick diagram will again be useful:
In addition to introducing header files, we've begun to use file i/o as well. The program is now reading the equitorial and polar radii from a text file called, appropriately enough, radii.dat. I'll move quickly over the gory details (a little cumbersome using only intrinsic functions) save to say that we grab each line of the file in turn:
fgets(line,MAXSTR,fp); /* get line as character string */
then 'tokenise' the line, by splitting it around the tab character it contains, e.g.:
cPtr = strtok(line," \t\n"); /* get 1st column */
and lastly, for the number columns, we must convert the ascii character sequences into an actual numerical value which we can use in our calculations, e.g.:
equitorial_radii[ii] = atof(cPtr); /* convert chars to double */
We also introduce a new function to handle the printing to screen:
int pretty_print(const double volume)
All of these changes result in a far more concise main program, which essentially only contains:
/* * Load the data */ load("radii.dat", equi_rad, pol_rad, iNumPlanets); /* * loop over planets, calling function to calculate volume * each time */ for(ii=0; ii<iNumPlanets; ii++) { val = volume(equi_rad[ii], pol_rad[ii]); pretty_print(val); }
This program is small, so the increases in readability we've gained may not make a huge impact. However, this strategy applied to larger programs will pay real dividends!
Exercises
- Add other functions, such as the calculation of surface area, to this program.
- Add the names of the planets to the data file and read and store those too.
Structures
cd ../example8 make
A tidy way to do the planets thing. (Can go further when we meet classes in C++.) name, radii, results etc, all in the struct.
watch out for padding
The Command Line and more I/O
cd ../example9 make
Further Reading
- The bible is The C Programming Language by Kernighan & Ritchie. I've never used anything else.
- O'Reilly rarely produce a dud.