Difference between revisions of "StartingC"
(44 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
=Introduction= | =Introduction= | ||
+ | |||
+ | Welcome to 'StartingC', a tutorial aimed at getting you up and running with the C programming language. The tutorial is split into sections in each of which there will be some explanatory text (maybe even some diagrams!) but most importantly some working example programs that you can easily download and run. To get these, just cut and paste the text below onto your command line: | ||
<pre> | <pre> | ||
− | svn co | + | svn co https://svn.ggy.bris.ac.uk/subversion-open/startingC/trunk ./startingC |
</pre> | </pre> | ||
+ | |||
+ | We will not be assuming any previous programming experience, just an enquiring mind and the rudiments of using the command line on a Linux-based computer. If you have any doubts about the latter, take a look at the [[Linux1]] tutorial, which is also part of the 'pragmatic programming' set. | ||
=A Quintessential First Program= | =A Quintessential First Program= | ||
− | OK, now that we have the example code, let's get cracking and run our first C program. First of all, move into the | + | OK, now that we have the example code, let's get cracking and run our first C program. First of all, move into the example1 directory: |
<pre> | <pre> | ||
Line 34: | Line 38: | ||
</pre> | </pre> | ||
− | Bingo! We've just surmounted | + | Bingo! We've just surmounted, in some ways, our biggest step--running our first C program. Programming is like playing with [http://en.wikipedia.org/wiki/Mechano mechano] or [http://en.wikipedia.org/wiki/Lego lego]. Remember how much fun it was to assemble all those building blocks into something new and fascinating? We've just built our first model, and the rest of the toy box awaits, so let's get stuck in! |
=Types & Operations= | =Types & Operations= | ||
Line 45: | Line 49: | ||
</pre> | </pre> | ||
− | Take a look inside '''types.c''' and after the start of the main function, you'll see a block of '''variable declarations''': | + | Take a look inside '''types.c''' (it's best to run your text editor in the background, so that you can type make etc. when needed) and after the start of the main function, you'll see a block of '''variable declarations''': |
− | < | + | <source lang="c"> |
char nucleotide; /* A, C, G or T for our DNA */ | char nucleotide; /* A, C, G or T for our DNA */ | ||
int numPlanets; /* eight in our solar system - poor old Pluto! */ | int numPlanets; /* eight in our solar system - poor old Pluto! */ | ||
float length; /* e.g. 1.8288m, for a 6' snooker table */ | float length; /* e.g. 1.8288m, for a 6' snooker table */ | ||
double accum; /* an accumulator */ | double accum; /* an accumulator */ | ||
− | </ | + | </source> |
C, like many languages (e.g. Fortran), requires that variables must be declared to be of a certain type before they can be used, and here we see examples of four '''intrinsic types''' provided by the language. It's a very good habit to comment all your variable declarations, and here the comments pretty much explain what the various types are. '''double''' is a double precision--twice the storage space of a '''float'''--floating point number. The extra space make a double a good choice for an accumulator where you want to minimise rounding errors and avoid under- and overflow as best as possible. (The Fortran programmers amongst us will note, with a whince, the absence of an implicit type for complex numbers. Those reeling from this revelation will be comforted by the knowledge that C++ contains a complex class.) | C, like many languages (e.g. Fortran), requires that variables must be declared to be of a certain type before they can be used, and here we see examples of four '''intrinsic types''' provided by the language. It's a very good habit to comment all your variable declarations, and here the comments pretty much explain what the various types are. '''double''' is a double precision--twice the storage space of a '''float'''--floating point number. The extra space make a double a good choice for an accumulator where you want to minimise rounding errors and avoid under- and overflow as best as possible. (The Fortran programmers amongst us will note, with a whince, the absence of an implicit type for complex numbers. Those reeling from this revelation will be comforted by the knowledge that C++ contains a complex class.) | ||
Line 58: | Line 62: | ||
Various types can be given further qualifiers, such as '''short''', '''long''', '''signed''' and '''unsigned''': | Various types can be given further qualifiers, such as '''short''', '''long''', '''signed''' and '''unsigned''': | ||
− | < | + | <source lang="c"> |
short int mini; /* typically two bytes */ | short int mini; /* typically two bytes */ | ||
long int maxi; /* typically eight bytes */ | long int maxi; /* typically eight bytes */ | ||
signed char cSigned; /* one byte, values in the range [-128:127] */ | signed char cSigned; /* one byte, values in the range [-128:127] */ | ||
unsigned char cUsigned; /* values in the range [0:255] */ | unsigned char cUsigned; /* values in the range [0:255] */ | ||
− | </ | + | </source> |
The '''const''' keyword is also very useful for, well, declaring constants. In invaluable intrinsic (aka built-in) function when pondering the amount of memory assigned to a variable is '''sizeof()'''. | The '''const''' keyword is also very useful for, well, declaring constants. In invaluable intrinsic (aka built-in) function when pondering the amount of memory assigned to a variable is '''sizeof()'''. | ||
Line 69: | Line 73: | ||
In addition to single entities of various types, we can also declare arrays of the self-same intrinsics. The syntax for this is along the lines of: | In addition to single entities of various types, we can also declare arrays of the self-same intrinsics. The syntax for this is along the lines of: | ||
− | < | + | <source lang="c"> |
char cStr[20]; /* a character array/string of 20 chars */ | char cStr[20]; /* a character array/string of 20 chars */ | ||
int iMat2d[3][3]; /* a 2-dimensional matrix of integers - 3x3 */ | int iMat2d[3][3]; /* a 2-dimensional matrix of integers - 3x3 */ | ||
− | </ | + | </source> |
You'll see a good deal more of accessing the various elements of an array in later examples, but for now be satisfied with the knowledge that array indices start at '''0''' in C (yes, that's right Fortraners, that's '''zero''', not 1) and that the syntax for array access is, e.g.: | You'll see a good deal more of accessing the various elements of an array in later examples, but for now be satisfied with the knowledge that array indices start at '''0''' in C (yes, that's right Fortraners, that's '''zero''', not 1) and that the syntax for array access is, e.g.: | ||
− | < | + | <source lang="c"> |
cStr[0] = 'h'; /* first elememt set to ascii char code for 'h' */ | cStr[0] = 'h'; /* first elememt set to ascii char code for 'h' */ | ||
− | </ | + | </source> |
Enumerated types can be a useful way to map (a list of) symbolic names to integer values. | Enumerated types can be a useful way to map (a list of) symbolic names to integer values. | ||
Line 94: | Line 98: | ||
The first block of code here gives an '''arithmetic''' example--how to calculate the volume of an ''oblate spheroid'' that happens to be close to all our hearts, our shared home Earth: | The first block of code here gives an '''arithmetic''' example--how to calculate the volume of an ''oblate spheroid'' that happens to be close to all our hearts, our shared home Earth: | ||
− | < | + | <source lang="c"> |
val = (4.0/3.0) * pi * pow(equi_rad,2) * pol_rad; | val = (4.0/3.0) * pi * pow(equi_rad,2) * pol_rad; | ||
− | </ | + | </source> |
I won't dwell on this as I'm confident that the syntax is self-explanatory, save to mention that the function '''pow''' comes from the built-in library of math functions. | I won't dwell on this as I'm confident that the syntax is self-explanatory, save to mention that the function '''pow''' comes from the built-in library of math functions. | ||
Line 102: | Line 106: | ||
Next up, you'll see the '''decrement''' and '''increment''' operators: | Next up, you'll see the '''decrement''' and '''increment''' operators: | ||
− | < | + | <source lang="c"> |
--numPlanets; | --numPlanets; | ||
++numPlanets; | ++numPlanets; | ||
− | </ | + | </source> |
also self-explanatory. | also self-explanatory. | ||
Line 113: | Line 117: | ||
An operation that you will become keenly aware of--especially working in scientific computing--is the ability to temporarily convert the a variable from one type to another on-the-fly. This is known as '''casting'''. Two examples of this are: | An operation that you will become keenly aware of--especially working in scientific computing--is the ability to temporarily convert the a variable from one type to another on-the-fly. This is known as '''casting'''. Two examples of this are: | ||
− | < | + | <source lang="c"> |
(short int) pi | (short int) pi | ||
(float) 42 | (float) 42 | ||
− | </ | + | </source> |
where, in the first, we convert pi into a (short) integer and convert 42 into a floating point number in the second. '''Note that the cast does not effect the original variable in any way.''' i.e. the value given to the variable called pi is not changed through using the cast. | where, in the first, we convert pi into a (short) integer and convert 42 into a floating point number in the second. '''Note that the cast does not effect the original variable in any way.''' i.e. the value given to the variable called pi is not changed through using the cast. | ||
Line 154: | Line 158: | ||
types.c | types.c | ||
− | * declare a character array sufficient to record the state of a game of naughts and crosses, populate | + | * declare a character array sufficient to record the state of a game of naughts and crosses, populate it and print it to the screen. |
* How many bytes is used to store a long double? | * How many bytes is used to store a long double? | ||
* You can give an initial value to a character array when you declare it (e.g. char cStr[20] = "xxxxxxxxxxxxxxxxxxxx";). What happens if we leave '\0' out of character assignments in this case? | * You can give an initial value to a character array when you declare it (e.g. char cStr[20] = "xxxxxxxxxxxxxxxxxxxx";). What happens if we leave '\0' out of character assignments in this case? | ||
operations.c | operations.c | ||
− | * 29.2% of the Earth's surface is land. How much is this in square kilometers? | + | * 29.2% of the Earth's surface is land. How much is this in square kilometers? C has an arccos function (acos()) and the web has the [[http://en.wikipedia.org/wiki/Sphereoid#Surface_area formula for the surface area of an oblate spheroid]] |
* Logic is perilous. Can you think of a time when we say "or", but really mean logical AND? | * Logic is perilous. Can you think of a time when we say "or", but really mean logical AND? | ||
* What happens if you cast the character '9' to an int? | * What happens if you cast the character '9' to an int? | ||
Line 174: | Line 178: | ||
Looking inside '''flow.c''', our first block shows how we can make many way decisions using '''if''' tests and the '''else''' catch-all: | Looking inside '''flow.c''', our first block shows how we can make many way decisions using '''if''' tests and the '''else''' catch-all: | ||
− | < | + | <source lang="c"> |
if ( temperature < 0.0 ) { | if ( temperature < 0.0 ) { | ||
− | printf("Water would normally freeze at:\t%f\tdegrees C\n"); | + | printf("Water would normally freeze at:\t%f\tdegrees C\n", temperature); |
} | } | ||
else if (temperature > 100.0 ) { | else if (temperature > 100.0 ) { | ||
− | printf("Water would normally boil at:\t%f\tdegrees C\n"); | + | printf("Water would normally boil at:\t%f\tdegrees C\n", temperature); |
} | } | ||
else { | else { | ||
printf("The temperature must be in the range [0.0,100.0]\n"); | printf("The temperature must be in the range [0.0,100.0]\n"); | ||
} | } | ||
− | </ | + | </source> |
This is all very nice and self-explanatory. Typically you would use the above for a decision point that could follow one of 3 or less branches. If you have more than 3 branches, the '''switch''' statement is likely to be more concise and easier for you and your fellow developers to read: | This is all very nice and self-explanatory. Typically you would use the above for a decision point that could follow one of 3 or less branches. If you have more than 3 branches, the '''switch''' statement is likely to be more concise and easier for you and your fellow developers to read: | ||
− | < | + | <source lang="c"> |
switch (iCount) { | switch (iCount) { | ||
case 0: | case 0: | ||
Line 201: | Line 205: | ||
break; | break; | ||
} | } | ||
− | </ | + | </source> |
− | The '''default''' case is much like our '''else''' catch-all in the box above and is important to include as otherwise you will be vulnerable to a 'fall-through' bug. This is when none of the cases trigger because we did not consider the actual value passed to switch(). You will also notice the '''break''' statements in all the cases. Adding these is also a defensive maneouvre, since we could accidentally trigger two cases. Case 4 and the default, say. | + | The '''default''' case is much like our '''else''' catch-all in the box above and is important to include as otherwise you will be vulnerable to a 'fall-through' bug. This is when none of the cases trigger because we did not consider the actual value passed to switch(). You will also notice the '''break''' statements in all the cases. Adding these is also a defensive maneouvre, since we could accidentally trigger two cases. Case 4 and the default, say. A '''caveat''', however, is that the expression in the parentheses of '''switch(expr)''', must be '''integer''' valued. |
---- | ---- | ||
Line 209: | Line 213: | ||
Moving on. The '''for''' is an oft used tool on the work bench: | Moving on. The '''for''' is an oft used tool on the work bench: | ||
− | < | + | <source lang="c"> |
for(ii=0; ii<iMax; ii++) { | for(ii=0; ii<iMax; ii++) { | ||
if (ii == 3) { | if (ii == 3) { | ||
Line 217: | Line 221: | ||
printf("Yup, I'm in a for loop, whizz-oh. Counter is:\t%d\n", ii); | printf("Yup, I'm in a for loop, whizz-oh. Counter is:\t%d\n", ii); | ||
} | } | ||
− | </ | + | </source> |
It's tidy, succinct and gets the job done. Note that we've '''nested''' an if statement inside our loop. The '''continue''' statement is a useful way to skip the rest of an iteration, if it's superfluous. | It's tidy, succinct and gets the job done. Note that we've '''nested''' an if statement inside our loop. The '''continue''' statement is a useful way to skip the rest of an iteration, if it's superfluous. | ||
Line 223: | Line 227: | ||
Sometimes, however, we don't know ahead of time how many iterations of a loop will be required. We can't use a for loop in this case and the '''while''' loop steps into the breach for us. For example: | Sometimes, however, we don't know ahead of time how many iterations of a loop will be required. We can't use a for loop in this case and the '''while''' loop steps into the breach for us. For example: | ||
− | < | + | <source lang="c"> |
while (ii > threshold) { | while (ii > threshold) { | ||
printf("%d\t> threshold, continuing..\n",ii,threshold); | printf("%d\t> threshold, continuing..\n",ii,threshold); | ||
Line 229: | Line 233: | ||
printf("next random value:\t%d\n", ii); | printf("next random value:\t%d\n", ii); | ||
} | } | ||
− | </ | + | </source> |
In this case we keep testing to see if ii is greater than the threshold. If it is, then we go around the loop one more time, acquiring a new value for ii along the way. We loop back to the top, re-test against the threshold and so on. The loop will only terminate when ii is less than the threshold, i.e. when the while test fails, so watch out for those infinite loops! | In this case we keep testing to see if ii is greater than the threshold. If it is, then we go around the loop one more time, acquiring a new value for ii along the way. We loop back to the top, re-test against the threshold and so on. The loop will only terminate when ii is less than the threshold, i.e. when the while test fails, so watch out for those infinite loops! | ||
Line 262: | Line 266: | ||
So far we've glanced upon constructs such as: | So far we've glanced upon constructs such as: | ||
− | < | + | <source lang="c"> |
#include <stdio.h> | #include <stdio.h> | ||
− | </ | + | </source> |
Lines starting with a '''#''' form instructions to the C preprocessor. We can think of the preprocessor as a form of '''cut & paste'''. In our example, the preprocessor will replace our '''#include''' line with the contents of the system '''header''' file, '''stdio.h'''. Why are we doing this? Well, we wish use some of the standard input/output library functions, such as '''printf()''' in our program and the header file contains the '''function prototypes'''. The compiler needs these prototypes to make sure that we are calling the functions correctly and thus to produce a working executable or compile-time error--whichever is appropriate. | Lines starting with a '''#''' form instructions to the C preprocessor. We can think of the preprocessor as a form of '''cut & paste'''. In our example, the preprocessor will replace our '''#include''' line with the contents of the system '''header''' file, '''stdio.h'''. Why are we doing this? Well, we wish use some of the standard input/output library functions, such as '''printf()''' in our program and the header file contains the '''function prototypes'''. The compiler needs these prototypes to make sure that we are calling the functions correctly and thus to produce a working executable or compile-time error--whichever is appropriate. | ||
Line 272: | Line 276: | ||
We can do a good deal more than just including header files, however. For one, we can use a '''#define''' statement to set global constants. Take a look inside '''macros.c''' and notice how we have specified the size of our character array, called '''cStr'''. Outside of the main program we have: | We can do a good deal more than just including header files, however. For one, we can use a '''#define''' statement to set global constants. Take a look inside '''macros.c''' and notice how we have specified the size of our character array, called '''cStr'''. Outside of the main program we have: | ||
− | < | + | <source lang="c"> |
#define MAXSTR 25 | #define MAXSTR 25 | ||
− | </ | + | </source> |
Inside the main program, we then make use of our new symbol in our variable declarations block: | Inside the main program, we then make use of our new symbol in our variable declarations block: | ||
− | < | + | <source lang="c"> |
char cStr[MAXSTR]; /* a character array with size set globally */ | char cStr[MAXSTR]; /* a character array with size set globally */ | ||
− | </ | + | </source> |
We can arrange to loop over the contents of that array using: | We can arrange to loop over the contents of that array using: | ||
− | < | + | <source lang="c"> |
for(ii=0;ii<MAXSTR;ii++) { | for(ii=0;ii<MAXSTR;ii++) { | ||
cStr[ii] = 'c'; | cStr[ii] = 'c'; | ||
} | } | ||
− | </ | + | </source> |
To run the program, type: | To run the program, type: | ||
Line 300: | Line 304: | ||
Arranging '''conditional compilation''' is perhaps the most useful aspect of the preprocessor. Further down in the main program we have the conditional code block: | Arranging '''conditional compilation''' is perhaps the most useful aspect of the preprocessor. Further down in the main program we have the conditional code block: | ||
− | < | + | <source lang="c"> |
#ifdef DEBUG | #ifdef DEBUG | ||
printf("DEBUG is ON\n"); | printf("DEBUG is ON\n"); | ||
Line 306: | Line 310: | ||
printf("Boy-oh-boy am I going to have a lot to say!\n"); | printf("Boy-oh-boy am I going to have a lot to say!\n"); | ||
#endif | #endif | ||
− | </ | + | </source> |
which we can activate through the use of an appropriate compiler flag. In order to do this, uncomment the line: | which we can activate through the use of an appropriate compiler flag. In order to do this, uncomment the line: | ||
Line 318: | Line 322: | ||
The preprocessor gives us yet more possibilities, with constructs such as: | The preprocessor gives us yet more possibilities, with constructs such as: | ||
− | < | + | <source lang="c"> |
#if SYSTEM == WIN32 | #if SYSTEM == WIN32 | ||
#include <win.h> | #include <win.h> | ||
Line 326: | Line 330: | ||
#include <default.h> | #include <default.h> | ||
#endif | #endif | ||
− | </ | + | </source> |
'''However, a word of caution''' It is wise not to overuse the preprocessor. For example: | '''However, a word of caution''' It is wise not to overuse the preprocessor. For example: | ||
Line 349: | Line 353: | ||
* Experiment with the additional '''#ifdef''' and '''#ifndef''' preprocessor statements. | * Experiment with the additional '''#ifdef''' and '''#ifndef''' preprocessor statements. | ||
− | =Functions | + | =Functions= |
So, onto functions. What are these and why do we use them? | So, onto functions. What are these and why do we use them? | ||
Line 370: | Line 374: | ||
Inside '''funcs1.c''', you'll see that we compute the volumes for all the planets in the solar system, rather than just for Earth. Accordingly, we bundle the volume calculation into a function of it's own: | Inside '''funcs1.c''', you'll see that we compute the volumes for all the planets in the solar system, rather than just for Earth. Accordingly, we bundle the volume calculation into a function of it's own: | ||
− | < | + | <source lang="c"> |
double volume(double equitorial_rad, double polar_rad) | double volume(double equitorial_rad, double polar_rad) | ||
{ | { | ||
Line 383: | Line 387: | ||
return retval; | return retval; | ||
} | } | ||
− | </ | + | </source> |
and call it a number of times as we cycle through the planets: | and call it a number of times as we cycle through the planets: | ||
− | < | + | <source lang="c"> |
for(ii=0; ii<NumPlanets; ii++) { | for(ii=0; ii<NumPlanets; ii++) { | ||
val = volume(equi_rad[ii], pol_rad[ii]); | val = volume(equi_rad[ii], pol_rad[ii]); | ||
printf("the volume of the planet is:\t%f\tkms cubed\n", val); | printf("the volume of the planet is:\t%f\tkms cubed\n", val); | ||
} | } | ||
− | </ | + | </source> |
Note also the presence of a '''function prototype''' near the top of the file: | Note also the presence of a '''function prototype''' near the top of the file: | ||
− | < | + | <source lang="c"> |
double volume(double equitorial_rad, double polar_rad); | double volume(double equitorial_rad, double polar_rad); | ||
− | </ | + | </source> |
We'll need one of those for each function that we write. To run the program, type: | We'll need one of those for each function that we write. To run the program, type: | ||
Line 410: | Line 414: | ||
While we're on the topic of C function arguments; they are what's known as '''passed-by-value'''. Beware, this contrasts with '''passed-by-reference''', as used in '''Fortran''', for example. What's the significance of this? Well, in C a '''copy''' of the value of the agument is passed into a function. That means that you can do anything you like to the it's value inside the function, but it will all be forgotten upon exiting. Pass--by-reference means that the actual memory address of the argument is passed into the routine and so any changes to it's value will stick. We'll look at this topic more closely when we consider memory addresses and pointers later on. | While we're on the topic of C function arguments; they are what's known as '''passed-by-value'''. Beware, this contrasts with '''passed-by-reference''', as used in '''Fortran''', for example. What's the significance of this? Well, in C a '''copy''' of the value of the agument is passed into a function. That means that you can do anything you like to the it's value inside the function, but it will all be forgotten upon exiting. Pass--by-reference means that the actual memory address of the argument is passed into the routine and so any changes to it's value will stick. We'll look at this topic more closely when we consider memory addresses and pointers later on. | ||
− | ''' | + | '''Exercises''' |
* Modify '''funcs1.c''' so that the equitorial radius argument is zeroed inside the function. Write a second loop to investigate the consequences of that inside the main program. | * Modify '''funcs1.c''' so that the equitorial radius argument is zeroed inside the function. Write a second loop to investigate the consequences of that inside the main program. | ||
Line 421: | Line 425: | ||
[[Image:Fibonacci_spiral_34.jpg|300px|thumbnail|none|The beautiful Fibonacci spiral]] | [[Image:Fibonacci_spiral_34.jpg|300px|thumbnail|none|The beautiful Fibonacci spiral]] | ||
− | = | + | Take a look in '''fibonacci.c''' and try running it ('''fibonacci.exe''') using various function inputs. |
+ | |||
+ | In truth, more interesting examples of recursion crop up when we consider more advanced data structures, such as binray trees, so we'll save some of the good stuff until then. | ||
+ | |||
+ | =Pointers and Allocatable Memory= | ||
+ | |||
+ | OK, now we're talking. Now we're getting to the marrow of the language. Once you're comfortable with this material, the world will be your oyster! So, without further ado, let's wade in: | ||
+ | |||
+ | <pre> | ||
+ | cd ../example6 | ||
+ | make | ||
+ | </pre> | ||
+ | |||
+ | (Don't worry about the compiler warnings--this program deliberately mis-assigns a pointer variable.) | ||
+ | |||
+ | Looking inside '''pointers.c''', we don't have to wait long to see something new. In the variable declaration block we see: | ||
+ | |||
+ | <source lang="c"> | ||
+ | int iNum; /* just a plain old integer */ | ||
+ | int* iAddr = NULL; /* a 'pointer' to an integer - intriguing! */ | ||
+ | </source> | ||
+ | |||
+ | '''iNum''' we're happy with, just a common-or-garden integer. '''iAddr''' is a new species, however, and is a '''pointer''' to an integer. Said another way, the value of iAddr is the memory address of iNum. We can draw an analogy between memory addresses and pigeon holes, where each pigeon hole is labelled with a unique (integer) number--it's address. Diagrams can often be helpful. Here's one where '''b''' is our plain old integer, given a value of 17 and '''a''' is used to store the memory address of b, i.e. '''a is a pointer to b''': | ||
+ | |||
+ | [[Image:Pointer.jpg|300px|thumbnail|none|A schematic for a pointer in C]] | ||
+ | |||
+ | Now, we can explore the consequences of this relationship in our program. Let's give iNum a value, and set iAddr to point to iNum. We use the '''&''' symbol to '''get the address''' of a variable: | ||
+ | |||
+ | <source lang="c"> | ||
+ | iNum = 3; /* first, we'll set the value of iNum */ | ||
+ | iAddr = &iNum; /* now we set iAddr to the 'address' of iNum */ | ||
+ | </source> | ||
− | address, dereference | + | If we print iNum, we'll obviously see that it has a value of 3. We can also ''follow'' the pointer iAddr, and see what value is stored in the memory address that it's pointing to. This is known as '''dereferencing''' and has the symbol is '''*'''. Since this is the value of iNum, it will, of course, always yield 3: |
− | address | + | |
− | + | <source lang="c"> | |
− | binary trees and linked lists - | + | printf("*iAdrr (dereference) has the value:\t%d\n", *iAddr); |
+ | </source> | ||
+ | |||
+ | If we change the value of iNum, we'll see a corresponding change in *iAddr. Also, if we assign *iAddr to a new value, we'll see the value of iNum follow suit, since they are one and the same: | ||
+ | |||
+ | <source lang="c"> | ||
+ | *iAddr = 17; | ||
+ | printf("set *iAdrr to:\t\t\t\t%d\n", *iAddr); | ||
+ | printf("and sure enough iNum has the value:\t%d\n", iNum); | ||
+ | </source> | ||
+ | |||
+ | We've seen arrays before, but until now, we haven't witnessed their special relationship with pointers: An array is a contiguous chunk of memory which we can reference through the address of it's first element. Further, we can access subsequent elements of an array through the use of '''pointer arithmetic'''. What does this all mean? Well, it's perhaps best explained with an example: | ||
+ | |||
+ | <source lang="c"> | ||
+ | iAddr = &data[0]; | ||
+ | for (ii=0; ii<MAXCOUNT; ii++) { | ||
+ | printf("data[ii] or equivalently *(iAddr+ii) is:\t%d\n", *(iAddr+ii)); | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | Here, we set iAddr to point to the first element of the array called '''data'''. We then loop through all the elements of the array printing values; first through the use of the familiar square bracket syntax ([]); and secondly via a pointer and increments upon the base address. | ||
+ | |||
+ | The last chunk of code in the program illustrates the use of '''dynamic memory allocation'''. Up until now, we've specified the size of all our variables at compile-time. Sometimes, however, we don't know how much space we'll need to store something ahead of time (perhaps we want to read a file which changes in length from one day to another). The ability to allocate memory on the fly will help us tremendously in this situation. Another situation where dynamic memory allocation will help is when we have too much data to store it all in main memory at the same time. In this situation, we can allocate some space, fill it with some of our data, work on it, free the space and then move on to work on another chunk of data. | ||
+ | |||
+ | You'll recall that arrays are chucks of contiguous memory referenced through the address of the first element. If we declare a pointer to the variable type we're interested in and then invoke a command to pair it with a chuck of memory, we'll be in business, right? Enter the '''malloc''' function: | ||
+ | |||
+ | <source lang="c"> | ||
+ | iAddr = (int *)malloc(sizeof(int)*MAXCOUNT); | ||
+ | /* After requesting space it is good practice to check | ||
+ | whether the request was successful */ | ||
+ | if (iAddr == NULL) { | ||
+ | fprintf(stderr,"ERROR: could not allocate memory\n"); | ||
+ | exit(EXIT_FAILURE); | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | On one line, we've rather neatly requested a chunk of contiguous memory with malloc. We've requested the number of bytes required to store an integer, multiplied the number of elements we desire in our new array. The malloc function returns a general purpose '''pointer to void (void *)''', which we've quickly cast to be a pointer to an integer so that the RHS matches the LHS in our assignment to trusty old iAddr. Bingo, we have our new array; arranged on-the-fly, primed and ready to go! | ||
+ | |||
+ | What's even better is that we can access it like any other array: | ||
+ | |||
+ | <source lang="c"> | ||
+ | for (ii=0; ii<MAXCOUNT; ii++) { | ||
+ | printf("data[ii] is:\t%d\tiAddr[ii] is:\t%d\n", data[ii], iAddr[ii]); | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | Here's a diagram illustrating the situation: | ||
+ | |||
+ | [[Image:c_array.jpg|300px|thumbnail|none|A schematic showing a dynamically allocated array.]] | ||
+ | |||
+ | We should always clean up after ourselves and '''free''' any memory that we've allocated: | ||
+ | |||
+ | <source lang="c"> | ||
+ | if(iAddr != NULL) { | ||
+ | free(iAddr); | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | making sure that we don't try to free any memory that we have not allocated, as that will cause our program to crash. | ||
+ | |||
+ | Pointers are very versatile and can be used to construct more advanced data structures such as binary trees and linked lists. | ||
+ | |||
+ | ---- | ||
+ | |||
+ | '''Exercises''' | ||
+ | |||
+ | * Allocate an array of doubles. Set all the elements to 0.0 and then set every third element to 1.0 in a loop. | ||
+ | * Allocate a 2-dimensional array of integers. To do this, you'll need a '''pointer-to-a-pointer''', e.g. int **my2dArray. You'll have to allocate an array of integer pointers (sizeof(int *) will be handy) and set my2dArray to point to the first element of that. Finally you should loop over all the elements your array of int pointers and set them to point to freshly allocated blocks of memory to hold the ints themselves. Once you've done that, you deserve a chocolate bar! Be careful about how you free up all that memory, or you'll leave some chunks stranded (and create what's called a '''memory leak''' in the process). | ||
+ | * What happens when you mix up your types when setting pointers to other variables? | ||
+ | |||
+ | [[Image:c_2darray.jpg|300px|thumbnail|none|A 2d integer array.]] | ||
+ | |||
+ | =Header Files= | ||
+ | |||
+ | Header files crop up when programs get larger and we want to split our code over multiple files. They provide a way for the compiler to check that all the pieces are consistent before ploughing on and creating an executable. | ||
+ | |||
+ | In this example, we've re-worked our program to calculate the volume of the planets in the solar system, splitting the various bits of functionality over several files: | ||
+ | |||
+ | <pre> | ||
+ | cd ../example7 | ||
+ | make | ||
+ | </pre> | ||
+ | |||
+ | Inside the directory you'll find the files, '''main.c''', '''calcs.c''' and '''io.c''', together with the header files, '''calcs.h''' and '''io.h'''. Take a few minutes to peruse each in turn. You'll see that we moved the function prototypes out the the header files and used the '''#include''' directive to include those prototypes wherever they are needed. For example the prototype for the arithmetic function '''volume''' is need to compile both the function itself, written in calcs.c, and also the main function, where the volume is 'called'. (Note, to 'call' a function is to request that it is executed with a specific list of arguments.) | ||
+ | |||
+ | A quick diagram will again be useful: | ||
+ | |||
+ | [[Image:Headers.jpg|300px|thumbnail|none|A schematic showing the relation between our header and source files in this example.]] | ||
+ | |||
+ | In addition to introducing header files, we've begun to use file i/o as well. The program is now reading the equitorial and polar radii from a text file called, appropriately enough, '''radii.dat'''. I'll move quickly over the gory details (a little cumbersome using only intrinsic functions) save to say that we grab each line of the file in turn: | ||
+ | |||
+ | <source lang="c"> | ||
+ | fgets(line,MAXSTR,fp); /* get line as character string */ | ||
+ | </source> | ||
+ | |||
+ | then 'tokenise' the line, by splitting it around the tab character it contains, e.g.: | ||
+ | |||
+ | <source lang="c"> | ||
+ | cPtr = strtok(line," \t\n"); /* get 1st column */ | ||
+ | </source> | ||
+ | |||
+ | and lastly, for the number columns, we must convert the ascii character sequences into an actual numerical value which we can use in our calculations, e.g.: | ||
+ | |||
+ | <source lang="c"> | ||
+ | equitorial_radii[ii] = atof(cPtr); /* convert chars to double */ | ||
+ | </source> | ||
+ | |||
+ | We also introduce a new function to handle the printing to screen: | ||
+ | |||
+ | <source lang="c"> | ||
+ | int pretty_print(const double volume) | ||
+ | </source> | ||
+ | |||
+ | All of these changes result in a far more concise main program, which essentially only contains: | ||
+ | |||
+ | <source lang="c"> | ||
+ | /* | ||
+ | * Load the data | ||
+ | */ | ||
+ | load("radii.dat", equi_rad, pol_rad, iNumPlanets); | ||
+ | |||
+ | /* | ||
+ | * loop over planets, calling function to calculate volume | ||
+ | * each time | ||
+ | */ | ||
+ | for(ii=0; ii<iNumPlanets; ii++) { | ||
+ | val = volume(equi_rad[ii], pol_rad[ii]); | ||
+ | pretty_print(val); | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | This program is small, so the increases in readability we've gained may not make a huge impact. However, this strategy applied to larger programs will pay real dividends! | ||
+ | |||
+ | ---- | ||
+ | |||
+ | '''Exercises''' | ||
+ | * Add other functions, such as the calculation of surface area, to this program. | ||
+ | * Add the names of the planets to the data file and read and store those too. | ||
=Structures= | =Structures= | ||
− | + | <pre> | |
+ | cd ../example8 | ||
+ | make | ||
+ | </pre> | ||
+ | |||
+ | Example8 is another re-working of our trusty planets program. This time we've availed ourselves of the ability to collect together related variables into a single package, known as a '''structure'''. This seemingly small manoeuvre has very far reaching benefits. Before we get into those, let's peruse the structure itself. We've declared it inside a new header file, '''main.h''': | ||
+ | |||
+ | <source lang="c"> | ||
+ | typedef struct { | ||
+ | char name[MAXSTR]; /* planet's name */ | ||
+ | double equitorial_radius; /* center to equator */ | ||
+ | double polar_radius; /* center to pole */ | ||
+ | double volume; /* we can store computed volume here */ | ||
+ | } planet; | ||
+ | </source> | ||
+ | |||
+ | As you can see, we've grouped together all the things we'd like to know about the planet--it's name, various radii, volume etc.--and created a new datatype called '''planet'''. Neat eh? Not only neat, but we've also taken a subtle yet hugely important step in the way we think about our programs. We've moved from thinking about the functions we need to perform to the way in which our variables relate to each other. This is a cornerstone of '''object oriented programming''', something which we'll hear a good deal about when looking at C++. | ||
+ | |||
+ | Now that we have our new datatype in place, we can declare an array of such things, just as easily as if we were declaring an array of intrinsic datatypes (such as integers and what not): | ||
+ | |||
+ | <source lang="c"> | ||
+ | planet planets[iNumPlanets]; /* our master array */ | ||
+ | </source> | ||
+ | |||
+ | The big payoff arrives, however, when we no longer have to painstakingly pass in all the myriad variables required by some function, but instead we can just pass in an instance of our new structure, or even a whole array of structs in one go: | ||
+ | |||
+ | <source lang="c"> | ||
+ | load("planets.dat", planets, iNumPlanets); | ||
+ | </source> | ||
+ | |||
+ | Once inside the function, we can access whichever parts of the structure we happen to be interested in, for example: | ||
+ | |||
+ | <source lang="c"> | ||
+ | strcpy(planets[ii].name,cPtr); /* copy string to data struct */ | ||
+ | </source> | ||
+ | |||
+ | where the dot, '''.''', sytax indicates that we're accessing the member of an instance of a struct. | ||
+ | |||
+ | In our re-worked volume calculating function, we see the arrow syntax, '''->''', this is a structure access including a dereference, since the function was passed not an instance of the struct, but a pointer: | ||
+ | |||
+ | <source lang="c"> | ||
+ | int volume(planet *plnt) | ||
+ | { | ||
+ | ... | ||
+ | plnt->volume = (4.0/3.0) * pi * pow(plnt->equitorial_radius,2) * plnt->polar_radius; | ||
+ | ... | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | Why did we pass the pointer? Well, we wanted to store the result of the calculation in the 'volume' element and, in order to make that assignment stick, we needed to use pass-by-reference, rather than pass by value. | ||
+ | |||
+ | ---- | ||
− | + | '''Exercises''' | |
+ | * Add more functionality to the program--surface areas, orbits, journey times in a rocket etc. You'll need to add more 'fields' to the structure to accommodate the extra inputs and outputs for any new calculations. | ||
+ | * Think up some new structures and add to your program. Stars & galaxies perhaps? Too abstract? How about plants and animals? Mountains, lakes or rivers? | ||
− | = | + | =Command Line Parsing= |
+ | |||
+ | The icing-on-the-cake for our planets program will be to add in '''command line parsing'''. With this feature the program can read a number of arguments passed to in and act accordingly. This can be a really useful feature if we want to maximise our runtime flexibility and hence minimise the number of times we need to recompile our program. To take a peek, let's | ||
+ | |||
+ | <pre> | ||
+ | cd ../example9 | ||
+ | make | ||
+ | </pre> | ||
+ | |||
+ | Looking inside '''main.c''', the first difference we note is: | ||
+ | |||
+ | <source lang="c"> | ||
+ | int main(int argc, char** argv) | ||
+ | </source> | ||
+ | |||
+ | where previously the arguments to the main function were listed as 'void'. '''argc''' is a count of the number of command line arguments passed to the program (including the program name itself), which we can use as a check upon proper usage of the program: | ||
+ | |||
+ | <source lang="c"> | ||
+ | if(argc != 3) { | ||
+ | die("usage: command.exe <data file name> <num planets>"); | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | '''argv''' is a vector of strings (i.e. a 2d character array) which comprise the arguments themselves. The first argument we desire is a filename, and so that is easy enough to process. The second argument that we would like is the number of planets to process and so we must convert the string to a numerical value (integer in this case) as we have done previously when reading the values in the data file. Once we have that number, we can allocate an array of our planet structures accordingly: | ||
+ | |||
+ | <source lang="c"> | ||
+ | planets = (planet *)malloc(sizeof(planet)*iNumPlanets); | ||
+ | if (planets == NULL) { | ||
+ | die("could not allocate planet storage space!"); | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | This all makes our program more robust and flexible. | ||
+ | |||
+ | ---- | ||
+ | |||
+ | '''Exercises''' | ||
+ | * Run the program with data for another planetary system , such as [[http://en.wikipedia.org/wiki/Gliese_581 Gliese_581]] | ||
+ | * Change the number of command line arguments, perhaps adding an argument to trigger output in a concise or verbose mode? | ||
+ | |||
+ | =Going Further= | ||
+ | |||
+ | Well, if you have read through the tutorial and made a stab at the exercises, you should have pretty much all that you need to go forth and conquer in the land of C! The books listed below will help plug any gaps in your knowledge that you find. Also, you may like to move on to C++, the enhanced, object oriented, all sing, all dancing big brother to C. If so take a look at the [CtoC++] tutorial in the pragmatic programming family. | ||
+ | |||
+ | ---- | ||
− | + | '''Further Reading''' | |
− | The bible is | + | * The bible is '''The C Programming Language''' by Kernighan & Ritchie--often shortened to just '''K&R'''. Take a look at the [[A_Good_Read|'A Good Read?']] page for more details. |
Latest revision as of 14:43, 4 October 2013
startingC: Learning the C Programming Language
Introduction
Welcome to 'StartingC', a tutorial aimed at getting you up and running with the C programming language. The tutorial is split into sections in each of which there will be some explanatory text (maybe even some diagrams!) but most importantly some working example programs that you can easily download and run. To get these, just cut and paste the text below onto your command line:
svn co https://svn.ggy.bris.ac.uk/subversion-open/startingC/trunk ./startingC
We will not be assuming any previous programming experience, just an enquiring mind and the rudiments of using the command line on a Linux-based computer. If you have any doubts about the latter, take a look at the Linux1 tutorial, which is also part of the 'pragmatic programming' set.
A Quintessential First Program
OK, now that we have the example code, let's get cracking and run our first C program. First of all, move into the example1 directory:
cd startingC/examples/example1
We'll use of a Makefile for each example, so as to make the build process painless (hopefully!). All we need do is run make (see the [make tutorial about make] if you're interested in this further):
make
Now, we can run the classic program:
./hello.exe
and you should get the friendly response:
hello, world!
Bingo! We've just surmounted, in some ways, our biggest step--running our first C program. Programming is like playing with mechano or lego. Remember how much fun it was to assemble all those building blocks into something new and fascinating? We've just built our first model, and the rest of the toy box awaits, so let's get stuck in!
Types & Operations
Buoyed with confidence from our first example, let's march fearlessly onwards into the realm of variable types and basic operations. To do this, move up and over to the directory example2 and type make to build the example programs:
cd ../example2 make
Take a look inside types.c (it's best to run your text editor in the background, so that you can type make etc. when needed) and after the start of the main function, you'll see a block of variable declarations:
char nucleotide; /* A, C, G or T for our DNA */
int numPlanets; /* eight in our solar system - poor old Pluto! */
float length; /* e.g. 1.8288m, for a 6' snooker table */
double accum; /* an accumulator */
C, like many languages (e.g. Fortran), requires that variables must be declared to be of a certain type before they can be used, and here we see examples of four intrinsic types provided by the language. It's a very good habit to comment all your variable declarations, and here the comments pretty much explain what the various types are. double is a double precision--twice the storage space of a float--floating point number. The extra space make a double a good choice for an accumulator where you want to minimise rounding errors and avoid under- and overflow as best as possible. (The Fortran programmers amongst us will note, with a whince, the absence of an implicit type for complex numbers. Those reeling from this revelation will be comforted by the knowledge that C++ contains a complex class.)
Various types can be given further qualifiers, such as short, long, signed and unsigned:
short int mini; /* typically two bytes */
long int maxi; /* typically eight bytes */
signed char cSigned; /* one byte, values in the range [-128:127] */
unsigned char cUsigned; /* values in the range [0:255] */
The const keyword is also very useful for, well, declaring constants. In invaluable intrinsic (aka built-in) function when pondering the amount of memory assigned to a variable is sizeof().
In addition to single entities of various types, we can also declare arrays of the self-same intrinsics. The syntax for this is along the lines of:
char cStr[20]; /* a character array/string of 20 chars */
int iMat2d[3][3]; /* a 2-dimensional matrix of integers - 3x3 */
You'll see a good deal more of accessing the various elements of an array in later examples, but for now be satisfied with the knowledge that array indices start at 0 in C (yes, that's right Fortraners, that's zero, not 1) and that the syntax for array access is, e.g.:
cStr[0] = 'h'; /* first elememt set to ascii char code for 'h' */
Enumerated types can be a useful way to map (a list of) symbolic names to integer values.
Now that you've read it through, run the program and satisfy yourself that it all works as you expect it to. To run the program, type:
./types.exe
Shifting our attention to operations.c, let's consider some basic operations that C supports. This is the start of the doing things part.
The first block of code here gives an arithmetic example--how to calculate the volume of an oblate spheroid that happens to be close to all our hearts, our shared home Earth:
val = (4.0/3.0) * pi * pow(equi_rad,2) * pol_rad;
I won't dwell on this as I'm confident that the syntax is self-explanatory, save to mention that the function pow comes from the built-in library of math functions.
Next up, you'll see the decrement and increment operators:
--numPlanets;
++numPlanets;
also self-explanatory.
C provides the logic operators, == (is equal), != (not equal), && (AND) and || (OR); as well as the relationals, > (greater than), < (less than), >= (greater than or equal) and <= (less than or equal).
An operation that you will become keenly aware of--especially working in scientific computing--is the ability to temporarily convert the a variable from one type to another on-the-fly. This is known as casting. Two examples of this are:
(short int) pi
(float) 42
where, in the first, we convert pi into a (short) integer and convert 42 into a floating point number in the second. Note that the cast does not effect the original variable in any way. i.e. the value given to the variable called pi is not changed through using the cast.
One last class of operations for now are the bitwise operators. These give you very low-level control over the bytes associated with variables, should you need that. For example, we can perform a bitwise AND on the two bytes 01001000 and 10111000, yielding 00001000 when all the bit pairs are considered in turn according to the criteria:
INPUT | OUTPUT | |
A | B | A AND B |
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
To run the second program, type:
./operations.exe
Now, it's very important that you muck around with these example programs as much as possible! Ideally, so much so that you break them! We never learn as much as when we make a mess of things, and since these are just toy programs, you may as well go for it! If you get in a pickle, you can get the original programs back with a quick waft of the Subversion wand:
svn revert *
Exercises
types.c
- declare a character array sufficient to record the state of a game of naughts and crosses, populate it and print it to the screen.
- How many bytes is used to store a long double?
- You can give an initial value to a character array when you declare it (e.g. char cStr[20] = "xxxxxxxxxxxxxxxxxxxx";). What happens if we leave '\0' out of character assignments in this case?
operations.c
- 29.2% of the Earth's surface is land. How much is this in square kilometers? C has an arccos function (acos()) and the web has the [formula for the surface area of an oblate spheroid]
- Logic is perilous. Can you think of a time when we say "or", but really mean logical AND?
- What happens if you cast the character '9' to an int?
Conditionals & Loops
OK, we have types and operators under our belts. This C malarky isn't too bad, eh? Let's take a look at some stalwarts of the procedural family of languages--conditionals and loops. As we will start all our sections, move up and over to the example3 directory and build the program(s) therein:
cd ../example3 make
Looking inside flow.c, our first block shows how we can make many way decisions using if tests and the else catch-all:
if ( temperature < 0.0 ) {
printf("Water would normally freeze at:\t%f\tdegrees C\n", temperature);
}
else if (temperature > 100.0 ) {
printf("Water would normally boil at:\t%f\tdegrees C\n", temperature);
}
else {
printf("The temperature must be in the range [0.0,100.0]\n");
}
This is all very nice and self-explanatory. Typically you would use the above for a decision point that could follow one of 3 or less branches. If you have more than 3 branches, the switch statement is likely to be more concise and easier for you and your fellow developers to read:
switch (iCount) {
case 0:
printf("case 0: nada, zip, nowt.\n");
break;
case 1:
printf("case 1: uno, sole, unitary.\n");
break;
...
default: /* a default protects against 'fall through' bugs */
printf("default: mucho, many, lashings.\n");
break;
}
The default case is much like our else catch-all in the box above and is important to include as otherwise you will be vulnerable to a 'fall-through' bug. This is when none of the cases trigger because we did not consider the actual value passed to switch(). You will also notice the break statements in all the cases. Adding these is also a defensive maneouvre, since we could accidentally trigger two cases. Case 4 and the default, say. A caveat, however, is that the expression in the parentheses of switch(expr), must be integer valued.
Moving on. The for is an oft used tool on the work bench:
for(ii=0; ii<iMax; ii++) {
if (ii == 3) {
printf("Surprise!\n");
continue; /* jumps to the start of the next iteration */
}
printf("Yup, I'm in a for loop, whizz-oh. Counter is:\t%d\n", ii);
}
It's tidy, succinct and gets the job done. Note that we've nested an if statement inside our loop. The continue statement is a useful way to skip the rest of an iteration, if it's superfluous.
Sometimes, however, we don't know ahead of time how many iterations of a loop will be required. We can't use a for loop in this case and the while loop steps into the breach for us. For example:
while (ii > threshold) {
printf("%d\t> threshold, continuing..\n",ii,threshold);
ii = rand(); /* get next random number */
printf("next random value:\t%d\n", ii);
}
In this case we keep testing to see if ii is greater than the threshold. If it is, then we go around the loop one more time, acquiring a new value for ii along the way. We loop back to the top, re-test against the threshold and so on. The loop will only terminate when ii is less than the threshold, i.e. when the while test fails, so watch out for those infinite loops!
To run the example program, type:
./flow.exe
Exercises
- Can you nest an if within another if and what would be the point? Indeed can you have an if within a loop, within an if..?
- What happens if you remove break statements from the switch construct?
- Can you write a for loop that counts down rather than up? What about in steps of 2, or 3?
- Can you increment more than one variable in a for loop?
- Can you have multiple tests conditions in a loop?
- What's the simplest infinite loop you can write? Do you know how to abort a program?!
- Can you sabotage the counting in a for loop? Is there a way to protect against such a bug?
The C Preprocessor
Up until now, we've been studiously ignoring the lines beginning with # at the start of our programs. The time has come, however, to look these statements square in the eyes!..
cd ../example4 make
So far we've glanced upon constructs such as:
#include <stdio.h>
Lines starting with a # form instructions to the C preprocessor. We can think of the preprocessor as a form of cut & paste. In our example, the preprocessor will replace our #include line with the contents of the system header file, stdio.h. Why are we doing this? Well, we wish use some of the standard input/output library functions, such as printf() in our program and the header file contains the function prototypes. The compiler needs these prototypes to make sure that we are calling the functions correctly and thus to produce a working executable or compile-time error--whichever is appropriate.
We'll look at header files in more detail when we come to write our own functions.
We can do a good deal more than just including header files, however. For one, we can use a #define statement to set global constants. Take a look inside macros.c and notice how we have specified the size of our character array, called cStr. Outside of the main program we have:
#define MAXSTR 25
Inside the main program, we then make use of our new symbol in our variable declarations block:
char cStr[MAXSTR]; /* a character array with size set globally */
We can arrange to loop over the contents of that array using:
for(ii=0;ii<MAXSTR;ii++) {
cStr[ii] = 'c';
}
To run the program, type:
./macros.exe
as per usual.
Arranging conditional compilation is perhaps the most useful aspect of the preprocessor. Further down in the main program we have the conditional code block:
#ifdef DEBUG
printf("DEBUG is ON\n");
printf("I'm going to print out a lot more information\n");
printf("Boy-oh-boy am I going to have a lot to say!\n");
#endif
which we can activate through the use of an appropriate compiler flag. In order to do this, uncomment the line:
#CFLAGS=-DDEBUG
in the Makefile, retype make, and re-run.
The preprocessor gives us yet more possibilities, with constructs such as:
#if SYSTEM == WIN32
#include <win.h>
#elif SYSTEM == LINUX
#include <linux.h>
#else
#include <default.h>
#endif
However, a word of caution It is wise not to overuse the preprocessor. For example:
- It may be better to use a const variable declaration, rather than a global #define.
- Conditional compilation can be useful, but if you can use run-time switches in your code instead, you will not have to keep re-compiling your programs when you want to vary a parameter, say.
If you're keen, you can see a good use of the preprocessor for setting function names in mixed Fortran-C programming.
Note that we now have 3 distinct stages en route to producing an executable program:
- The preprocessor step: cut & paste.
- Compilation: taking source code and creating object code.
- Linkage: Linking object files and possibly libraries together to give an executable.
Exercises
- Vary the size of the character array. Note that you'll have to re-compile your program each time.
- Invent a new block of conditionally-compiled code and make the appropriate changes to the Makefile to bring it into effect.
- Experiment with the additional #ifdef and #ifndef preprocessor statements.
Functions
So, onto functions. What are these and why do we use them?
Well, we can think of a function, in some ways, as a black box--we feed in inputs and it returns outputs. An example would be a trigonometric function, such as the sine function. If we input [math]\displaystyle{ \pi/2 }[/math] radians, we'll get 1 as the output; input [math]\displaystyle{ \pi }[/math] and we'll get 0 back. We're not limited to just mathematical functions in C, however. We can write pretty much any function we like! We'll see many examples cropping up from here onwards.
OK, so much for a function's general form. Motivation-wise, if you need to do something more than once in a program, you should write a function to do it. That way, you just call your function whenever you need to perform that task. That strategy will give us concise programs as we don't need to duplicate any lines of code. Another benefit is that duplicate lines of code is a bug waiting to happen! Why? Well, there is a good chance that if you modify one of those lines of code, you'll forget to change the other. We're humans, after all. We err. Now, those lines are no longer identical and so will no longer do the same thing--tada, your bug.
Now repeat after me, never duplicate any code---write a function.
Another reason for writing a function, even if you don't call it more than once, is that breaking down your program into functional units will make it much easier to read and understand. This should be your #1 design criterion for any piece of code that you write.
OK, with the preamble out the way, let's take a look at an example:
cd ../example5 make
Inside funcs1.c, you'll see that we compute the volumes for all the planets in the solar system, rather than just for Earth. Accordingly, we bundle the volume calculation into a function of it's own:
double volume(double equitorial_rad, double polar_rad)
{
/* local variables */
const double pi = 3.14159265;
double retval;
/* the calcs */
retval = (4.0/3.0) * pi * pow(equitorial_rad,2) * polar_rad;
/* functions typically return a value */
return retval;
}
and call it a number of times as we cycle through the planets:
for(ii=0; ii<NumPlanets; ii++) {
val = volume(equi_rad[ii], pol_rad[ii]);
printf("the volume of the planet is:\t%f\tkms cubed\n", val);
}
Note also the presence of a function prototype near the top of the file:
double volume(double equitorial_rad, double polar_rad);
We'll need one of those for each function that we write. To run the program, type:
./funcs1.exe
The eagle-eyed amongst you will have noticed that the const variable pi is no longer needed in the main program unit. Also, the variable val is declared inside the main program unit and the function. Both of these things allude to something called scope, and in particular that variables declared inside a function are only known to that function. This rule also applies to the main function. Thus, if we want to pass values between functions, we must use arguments and return values. (Deliberately ignoring global values, as they are typically considered to be bad news.)
While we're on the topic of C function arguments; they are what's known as passed-by-value. Beware, this contrasts with passed-by-reference, as used in Fortran, for example. What's the significance of this? Well, in C a copy of the value of the agument is passed into a function. That means that you can do anything you like to the it's value inside the function, but it will all be forgotten upon exiting. Pass--by-reference means that the actual memory address of the argument is passed into the routine and so any changes to it's value will stick. We'll look at this topic more closely when we consider memory addresses and pointers later on.
Exercises
- Modify funcs1.c so that the equitorial radius argument is zeroed inside the function. Write a second loop to investigate the consequences of that inside the main program.
- Write an additional function to calculate the surface area of a planet and print the results of applying that function too. (See [1] for the formula.)
A light hearted interlude. Functions can call themselves. Neat! It's called recursion and can be both elegant and powerful. Neater! The classic example is the Fibonacci series, and who am I to buck the trend? We'll it does lend itself to beautiful shapes:
Take a look in fibonacci.c and try running it (fibonacci.exe) using various function inputs.
In truth, more interesting examples of recursion crop up when we consider more advanced data structures, such as binray trees, so we'll save some of the good stuff until then.
Pointers and Allocatable Memory
OK, now we're talking. Now we're getting to the marrow of the language. Once you're comfortable with this material, the world will be your oyster! So, without further ado, let's wade in:
cd ../example6 make
(Don't worry about the compiler warnings--this program deliberately mis-assigns a pointer variable.)
Looking inside pointers.c, we don't have to wait long to see something new. In the variable declaration block we see:
int iNum; /* just a plain old integer */
int* iAddr = NULL; /* a 'pointer' to an integer - intriguing! */
iNum we're happy with, just a common-or-garden integer. iAddr is a new species, however, and is a pointer to an integer. Said another way, the value of iAddr is the memory address of iNum. We can draw an analogy between memory addresses and pigeon holes, where each pigeon hole is labelled with a unique (integer) number--it's address. Diagrams can often be helpful. Here's one where b is our plain old integer, given a value of 17 and a is used to store the memory address of b, i.e. a is a pointer to b:
Now, we can explore the consequences of this relationship in our program. Let's give iNum a value, and set iAddr to point to iNum. We use the & symbol to get the address of a variable:
iNum = 3; /* first, we'll set the value of iNum */
iAddr = &iNum; /* now we set iAddr to the 'address' of iNum */
If we print iNum, we'll obviously see that it has a value of 3. We can also follow the pointer iAddr, and see what value is stored in the memory address that it's pointing to. This is known as dereferencing and has the symbol is *. Since this is the value of iNum, it will, of course, always yield 3:
printf("*iAdrr (dereference) has the value:\t%d\n", *iAddr);
If we change the value of iNum, we'll see a corresponding change in *iAddr. Also, if we assign *iAddr to a new value, we'll see the value of iNum follow suit, since they are one and the same:
*iAddr = 17;
printf("set *iAdrr to:\t\t\t\t%d\n", *iAddr);
printf("and sure enough iNum has the value:\t%d\n", iNum);
We've seen arrays before, but until now, we haven't witnessed their special relationship with pointers: An array is a contiguous chunk of memory which we can reference through the address of it's first element. Further, we can access subsequent elements of an array through the use of pointer arithmetic. What does this all mean? Well, it's perhaps best explained with an example:
iAddr = &data[0];
for (ii=0; ii<MAXCOUNT; ii++) {
printf("data[ii] or equivalently *(iAddr+ii) is:\t%d\n", *(iAddr+ii));
}
Here, we set iAddr to point to the first element of the array called data. We then loop through all the elements of the array printing values; first through the use of the familiar square bracket syntax ([]); and secondly via a pointer and increments upon the base address.
The last chunk of code in the program illustrates the use of dynamic memory allocation. Up until now, we've specified the size of all our variables at compile-time. Sometimes, however, we don't know how much space we'll need to store something ahead of time (perhaps we want to read a file which changes in length from one day to another). The ability to allocate memory on the fly will help us tremendously in this situation. Another situation where dynamic memory allocation will help is when we have too much data to store it all in main memory at the same time. In this situation, we can allocate some space, fill it with some of our data, work on it, free the space and then move on to work on another chunk of data.
You'll recall that arrays are chucks of contiguous memory referenced through the address of the first element. If we declare a pointer to the variable type we're interested in and then invoke a command to pair it with a chuck of memory, we'll be in business, right? Enter the malloc function:
iAddr = (int *)malloc(sizeof(int)*MAXCOUNT);
/* After requesting space it is good practice to check
whether the request was successful */
if (iAddr == NULL) {
fprintf(stderr,"ERROR: could not allocate memory\n");
exit(EXIT_FAILURE);
}
On one line, we've rather neatly requested a chunk of contiguous memory with malloc. We've requested the number of bytes required to store an integer, multiplied the number of elements we desire in our new array. The malloc function returns a general purpose pointer to void (void *), which we've quickly cast to be a pointer to an integer so that the RHS matches the LHS in our assignment to trusty old iAddr. Bingo, we have our new array; arranged on-the-fly, primed and ready to go!
What's even better is that we can access it like any other array:
for (ii=0; ii<MAXCOUNT; ii++) {
printf("data[ii] is:\t%d\tiAddr[ii] is:\t%d\n", data[ii], iAddr[ii]);
}
Here's a diagram illustrating the situation:
We should always clean up after ourselves and free any memory that we've allocated:
if(iAddr != NULL) {
free(iAddr);
}
making sure that we don't try to free any memory that we have not allocated, as that will cause our program to crash.
Pointers are very versatile and can be used to construct more advanced data structures such as binary trees and linked lists.
Exercises
- Allocate an array of doubles. Set all the elements to 0.0 and then set every third element to 1.0 in a loop.
- Allocate a 2-dimensional array of integers. To do this, you'll need a pointer-to-a-pointer, e.g. int **my2dArray. You'll have to allocate an array of integer pointers (sizeof(int *) will be handy) and set my2dArray to point to the first element of that. Finally you should loop over all the elements your array of int pointers and set them to point to freshly allocated blocks of memory to hold the ints themselves. Once you've done that, you deserve a chocolate bar! Be careful about how you free up all that memory, or you'll leave some chunks stranded (and create what's called a memory leak in the process).
- What happens when you mix up your types when setting pointers to other variables?
Header Files
Header files crop up when programs get larger and we want to split our code over multiple files. They provide a way for the compiler to check that all the pieces are consistent before ploughing on and creating an executable.
In this example, we've re-worked our program to calculate the volume of the planets in the solar system, splitting the various bits of functionality over several files:
cd ../example7 make
Inside the directory you'll find the files, main.c, calcs.c and io.c, together with the header files, calcs.h and io.h. Take a few minutes to peruse each in turn. You'll see that we moved the function prototypes out the the header files and used the #include directive to include those prototypes wherever they are needed. For example the prototype for the arithmetic function volume is need to compile both the function itself, written in calcs.c, and also the main function, where the volume is 'called'. (Note, to 'call' a function is to request that it is executed with a specific list of arguments.)
A quick diagram will again be useful:
In addition to introducing header files, we've begun to use file i/o as well. The program is now reading the equitorial and polar radii from a text file called, appropriately enough, radii.dat. I'll move quickly over the gory details (a little cumbersome using only intrinsic functions) save to say that we grab each line of the file in turn:
fgets(line,MAXSTR,fp); /* get line as character string */
then 'tokenise' the line, by splitting it around the tab character it contains, e.g.:
cPtr = strtok(line," \t\n"); /* get 1st column */
and lastly, for the number columns, we must convert the ascii character sequences into an actual numerical value which we can use in our calculations, e.g.:
equitorial_radii[ii] = atof(cPtr); /* convert chars to double */
We also introduce a new function to handle the printing to screen:
int pretty_print(const double volume)
All of these changes result in a far more concise main program, which essentially only contains:
/*
* Load the data
*/
load("radii.dat", equi_rad, pol_rad, iNumPlanets);
/*
* loop over planets, calling function to calculate volume
* each time
*/
for(ii=0; ii<iNumPlanets; ii++) {
val = volume(equi_rad[ii], pol_rad[ii]);
pretty_print(val);
}
This program is small, so the increases in readability we've gained may not make a huge impact. However, this strategy applied to larger programs will pay real dividends!
Exercises
- Add other functions, such as the calculation of surface area, to this program.
- Add the names of the planets to the data file and read and store those too.
Structures
cd ../example8 make
Example8 is another re-working of our trusty planets program. This time we've availed ourselves of the ability to collect together related variables into a single package, known as a structure. This seemingly small manoeuvre has very far reaching benefits. Before we get into those, let's peruse the structure itself. We've declared it inside a new header file, main.h:
typedef struct {
char name[MAXSTR]; /* planet's name */
double equitorial_radius; /* center to equator */
double polar_radius; /* center to pole */
double volume; /* we can store computed volume here */
} planet;
As you can see, we've grouped together all the things we'd like to know about the planet--it's name, various radii, volume etc.--and created a new datatype called planet. Neat eh? Not only neat, but we've also taken a subtle yet hugely important step in the way we think about our programs. We've moved from thinking about the functions we need to perform to the way in which our variables relate to each other. This is a cornerstone of object oriented programming, something which we'll hear a good deal about when looking at C++.
Now that we have our new datatype in place, we can declare an array of such things, just as easily as if we were declaring an array of intrinsic datatypes (such as integers and what not):
planet planets[iNumPlanets]; /* our master array */
The big payoff arrives, however, when we no longer have to painstakingly pass in all the myriad variables required by some function, but instead we can just pass in an instance of our new structure, or even a whole array of structs in one go:
load("planets.dat", planets, iNumPlanets);
Once inside the function, we can access whichever parts of the structure we happen to be interested in, for example:
strcpy(planets[ii].name,cPtr); /* copy string to data struct */
where the dot, ., sytax indicates that we're accessing the member of an instance of a struct.
In our re-worked volume calculating function, we see the arrow syntax, ->, this is a structure access including a dereference, since the function was passed not an instance of the struct, but a pointer:
int volume(planet *plnt)
{
...
plnt->volume = (4.0/3.0) * pi * pow(plnt->equitorial_radius,2) * plnt->polar_radius;
...
}
Why did we pass the pointer? Well, we wanted to store the result of the calculation in the 'volume' element and, in order to make that assignment stick, we needed to use pass-by-reference, rather than pass by value.
Exercises
- Add more functionality to the program--surface areas, orbits, journey times in a rocket etc. You'll need to add more 'fields' to the structure to accommodate the extra inputs and outputs for any new calculations.
- Think up some new structures and add to your program. Stars & galaxies perhaps? Too abstract? How about plants and animals? Mountains, lakes or rivers?
Command Line Parsing
The icing-on-the-cake for our planets program will be to add in command line parsing. With this feature the program can read a number of arguments passed to in and act accordingly. This can be a really useful feature if we want to maximise our runtime flexibility and hence minimise the number of times we need to recompile our program. To take a peek, let's
cd ../example9 make
Looking inside main.c, the first difference we note is:
int main(int argc, char** argv)
where previously the arguments to the main function were listed as 'void'. argc is a count of the number of command line arguments passed to the program (including the program name itself), which we can use as a check upon proper usage of the program:
if(argc != 3) {
die("usage: command.exe <data file name> <num planets>");
}
argv is a vector of strings (i.e. a 2d character array) which comprise the arguments themselves. The first argument we desire is a filename, and so that is easy enough to process. The second argument that we would like is the number of planets to process and so we must convert the string to a numerical value (integer in this case) as we have done previously when reading the values in the data file. Once we have that number, we can allocate an array of our planet structures accordingly:
planets = (planet *)malloc(sizeof(planet)*iNumPlanets);
if (planets == NULL) {
die("could not allocate planet storage space!");
}
This all makes our program more robust and flexible.
Exercises
- Run the program with data for another planetary system , such as [Gliese_581]
- Change the number of command line arguments, perhaps adding an argument to trigger output in a concise or verbose mode?
Going Further
Well, if you have read through the tutorial and made a stab at the exercises, you should have pretty much all that you need to go forth and conquer in the land of C! The books listed below will help plug any gaps in your knowledge that you find. Also, you may like to move on to C++, the enhanced, object oriented, all sing, all dancing big brother to C. If so take a look at the [CtoC++] tutorial in the pragmatic programming family.
Further Reading
- The bible is The C Programming Language by Kernighan & Ritchie--often shortened to just K&R. Take a look at the 'A Good Read?' page for more details.