Difference between revisions of "Profiling"
Line 28: | Line 28: | ||
</pre> | </pre> | ||
− | =An Example= | + | =An Simple Example= |
<pre> | <pre> | ||
Line 41: | Line 41: | ||
[[Image:callgrind.JPG|center|Profile information displayed graphically in kcachegrind]] | [[Image:callgrind.JPG|center|Profile information displayed graphically in kcachegrind]] | ||
+ | |||
+ | =An Situation Involoving Loops= | ||
+ | |||
+ | Let's look another example, this time involving the way we loop over the contents of a 2-dimensional array--a common task. Skip over a directory and compile the example programs: | ||
+ | |||
+ | <pre> | ||
+ | cd ../example2 | ||
+ | make | ||
+ | </pre> | ||
+ | |||
+ | In this directory, we have two almost identical programs. Compare the contents of '''loopingA.f90''' and '''loopingB.f90'''. You can see that the only way in which they differ is the order in which the cycle through the contents of the arrays. Let's compare how long it takes these two programs to run. We can do this using the Linux '''time''' command: | ||
+ | |||
+ | <pre> | ||
+ | time ./loopingA.exe | ||
+ | </pre> | ||
+ | |||
+ | and now, | ||
+ | |||
+ | <pre> | ||
+ | time ./loopingB.exe | ||
+ | </pre> | ||
+ | |||
+ | Dude! '''loopingA.exe''' takes more than twice the time of '''loopingB.exe'''. What's the reason. Well, Fortran stores it's 2-dimensional arrays in '''column-major order'''. Your 2-dimension array is actually stored in the computer memory as a 1-dimension array, where the cells in a given column are next to each other. That means is that our outer loop should be over the columns, and our inner loop over the rows. Otherwise we would end up hopping all around the memory, which takes extra time, and explains why loopingA takes longer. |
Revision as of 11:56, 12 December 2008
Can I speed up my program?
Introduction
How fast does each part of my program take? Perhaps I could speed it up? As with debugging, we could add print statements, use a stopwatch, compile and re-run etc. etc. But that would be tedius and there are tools around which do the job way better.
Compiler Options
You probably want to make sure that you've added all the go-faster flags that are available before you embark on any profiling. Activating optimisations for speed can make your progam run a lot faster!
Using Valgrind
Valgrind is an excellent open-source tool for debugging and profiling.
Compile your program as normal, adding in any optimisation flags that you desire and also -g. Then run it through the callgrind tool, embedded in valgrind:
valgrind --tool=callgrind your-program [program options]
When your program has run you find a file called callgrind.out.xxxx in the current directory, where xxxx is replaced with a number (the process ID of the command you have just executed).
You can inspect the contents of this newly created file using a graphical display too call kcachegrind:
kcachegrind callgrind.out.xxxx
An Simple Example
svn co http://source.ggy.bris.ac.uk/subversion-open/profiling ./profiling cd profiling/examples/example1 make valgrind --tool=callgrind ./div.exe >& div.out kcachegrind callgrind.out.xxxx
We can see from the graphical display given by kcachegrind that our inefficient division routine takes far more of the runtime that our efficient routine. Using this kind of information, we can focus our re-engineering efforts on the slower parts of our program.
An Situation Involoving Loops
Let's look another example, this time involving the way we loop over the contents of a 2-dimensional array--a common task. Skip over a directory and compile the example programs:
cd ../example2 make
In this directory, we have two almost identical programs. Compare the contents of loopingA.f90 and loopingB.f90. You can see that the only way in which they differ is the order in which the cycle through the contents of the arrays. Let's compare how long it takes these two programs to run. We can do this using the Linux time command:
time ./loopingA.exe
and now,
time ./loopingB.exe
Dude! loopingA.exe takes more than twice the time of loopingB.exe. What's the reason. Well, Fortran stores it's 2-dimensional arrays in column-major order. Your 2-dimension array is actually stored in the computer memory as a 1-dimension array, where the cells in a given column are next to each other. That means is that our outer loop should be over the columns, and our inner loop over the rows. Otherwise we would end up hopping all around the memory, which takes extra time, and explains why loopingA takes longer.