Difference between revisions of "Polyglot"
| (38 intermediate revisions by the same user not shown) | |||
| Line 6: | Line 6: | ||
| One of the key underling themes in this series of ''pragmatic programming'' workshops is getting things done, with a minimum of fuss and wasted effort, and this workshop on '''mixed language programming''' is no exception. | One of the key underling themes in this series of ''pragmatic programming'' workshops is getting things done, with a minimum of fuss and wasted effort, and this workshop on '''mixed language programming''' is no exception. | ||
| − | When we sit down to a keyboard, we do so with a particular goal in mind.  We have a task, idea or experiment and the computer is our workbench.  The languages we write  | + | When we sit down to a keyboard, we do so with a particular goal in mind.  We have a task, idea or experiment and the computer is our workbench.  The languages we write programs with, along with the text editors, compilers, debuggers etc. are our tools.  Now some tools are better than others (e.g. Subversion is an improvement upon CVS).  Some we just may have a preference for (e.g. emacs vs. vi for editing files).  None are perfect, however, and all have their pros and cons.        | 
| What's all this got to do with mixed language programming? you ask.  Well, imagine a scenario: | What's all this got to do with mixed language programming? you ask.  Well, imagine a scenario: | ||
| Line 13: | Line 13: | ||
| What are your options?  You could use something not so suitable because it happens to be written in Fortran.  Hmm, that doesn't sound so good.  You could translate the library from C to Fortran.  Hmm, that sounds like tedious and time-consuming work.  Plus you'd need to understand the C, and perhaps a direct translation can't be made anyhow?  This is looking like a dead-end too.  It would be far better to leave the library as it is and to call the routines from your favoured language.  Is that possible?  Sure it is!  Read on and find out how.. | What are your options?  You could use something not so suitable because it happens to be written in Fortran.  Hmm, that doesn't sound so good.  You could translate the library from C to Fortran.  Hmm, that sounds like tedious and time-consuming work.  Plus you'd need to understand the C, and perhaps a direct translation can't be made anyhow?  This is looking like a dead-end too.  It would be far better to leave the library as it is and to call the routines from your favoured language.  Is that possible?  Sure it is!  Read on and find out how.. | ||
| + | |||
| + | Another scenario is that C or Fortran do number crunching well, but aren't best suited to text processing or creating GUIs.  For scripting languages, like python for example, the reverse is true.  Wouldn't it be nice if we could combine their strengths in a single project.  Well, you can! | ||
| In this workshop, we'll look at ways in which we can mix languages and in the process create a useful end product which plays to the strengths of it's components and gets you to where you want to be, without any laborious re-writes.  In the first two examples we'll look at calling C code from Fortran and then calling Fortran from C.  To get the code for examples, log into your preferred Linux machine and cut and paste the following into your terminal. | In this workshop, we'll look at ways in which we can mix languages and in the process create a useful end product which plays to the strengths of it's components and gets you to where you want to be, without any laborious re-writes.  In the first two examples we'll look at calling C code from Fortran and then calling Fortran from C.  To get the code for examples, log into your preferred Linux machine and cut and paste the following into your terminal. | ||
| <pre> | <pre> | ||
| − | svn export  | + | svn export https://svn.ggy.bris.ac.uk/subversion-open/polyglot/trunk ./polyglot | 
| </pre> | </pre> | ||
| Line 65: | Line 67: | ||
| A word of caution, however.  Different compilers 'mangle' subroutine and function names differently, i.e. dependening upon the mix of compilers, we can't always rely on a single trailing underscore decoration.  You will need to use '''nm''' to determine the exact decoration your Fortran compiler expects and to design your code so that any changes are easily accomodated.  One way to do this is to include a '''define''' preprocessor macro in you C code, such as: | A word of caution, however.  Different compilers 'mangle' subroutine and function names differently, i.e. dependening upon the mix of compilers, we can't always rely on a single trailing underscore decoration.  You will need to use '''nm''' to determine the exact decoration your Fortran compiler expects and to design your code so that any changes are easily accomodated.  One way to do this is to include a '''define''' preprocessor macro in you C code, such as: | ||
| − | < | + | <source lang="c"> | 
| #define CFUNC cfunc_ | #define CFUNC cfunc_ | ||
| ... | ... | ||
| void CFUNC(int* string_size, char* string) | void CFUNC(int* string_size, char* string) | ||
| − | </ | + | </source> | 
| In that way, a change of decoration can be easily, and consistently made across the whole file/library. | In that way, a change of decoration can be easily, and consistently made across the whole file/library. | ||
| Line 92: | Line 94: | ||
| The decoration and name-mangling will be familiar to you from the last example.  Note the use of the '''define''' macro in '''call_fort.c''': | The decoration and name-mangling will be familiar to you from the last example.  Note the use of the '''define''' macro in '''call_fort.c''': | ||
| − | < | + | <source lang="c"> | 
| #define FORT_SUB fort_sub_ | #define FORT_SUB fort_sub_ | ||
| − | </ | + | </source> | 
| The code comments in the last example touched upon the distinction between the '''pass-by-value''' default behaviour of C and the '''pass-by-reference''' adopted by Fortran.  Thus, in order to dovetail the two languages, we must prevail upon our C code to use pass-by-reference.  We can see this in the function prototype in '''call_fort.c''': | The code comments in the last example touched upon the distinction between the '''pass-by-value''' default behaviour of C and the '''pass-by-reference''' adopted by Fortran.  Thus, in order to dovetail the two languages, we must prevail upon our C code to use pass-by-reference.  We can see this in the function prototype in '''call_fort.c''': | ||
| − | < | + | <source lang="c"> | 
| extern void FORT_SUB(const int*, const int*, float[]); | extern void FORT_SUB(const int*, const int*, float[]); | ||
| − | </ | + | </source> | 
| Here we are declaring that we will use '''pointers-to''' variables, i.e. addresseses or '''references''', as the arguments to the routine, denoted by the '''*'''s.  The routine, of course, is written in Fortran, and so is expecting these references.  By default, C would have taken a '''copy''' of the '''value''' of a variable and passed that.  This is where the term '''pass-by-value''' comes from.  (Note that a side effect of pass-by-value is that any changes are only local to a routine and so do not have an effect on the value of the variable used in the calling routine.)  Note the the '''const''' modifiers match with the '''intent(in)''' attributes in '''sub.f90'''.  This is an example of defensive programming.  Normally, when we pass references, we run the risk that called routine could modify the values of the variables in a lasting way.  This may not be what you would like, and so it would be '''unsafe'''.  However, by using '''const''' and '''intent(in)''', we are saying from both sides that some values are not to be changed and so we protect ourselves. | Here we are declaring that we will use '''pointers-to''' variables, i.e. addresseses or '''references''', as the arguments to the routine, denoted by the '''*'''s.  The routine, of course, is written in Fortran, and so is expecting these references.  By default, C would have taken a '''copy''' of the '''value''' of a variable and passed that.  This is where the term '''pass-by-value''' comes from.  (Note that a side effect of pass-by-value is that any changes are only local to a routine and so do not have an effect on the value of the variable used in the calling routine.)  Note the the '''const''' modifiers match with the '''intent(in)''' attributes in '''sub.f90'''.  This is an example of defensive programming.  Normally, when we pass references, we run the risk that called routine could modify the values of the variables in a lasting way.  This may not be what you would like, and so it would be '''unsafe'''.  However, by using '''const''' and '''intent(in)''', we are saying from both sides that some values are not to be changed and so we protect ourselves. | ||
| Line 106: | Line 108: | ||
| In general with mixed langauge programming, we need to be careful about mathing the variable types we pass between the languages.  For example, C typically uses 4 bytes to store a '''float'''.  However, Fortran may use 4 or 8 bytes to store a '''real''', depending upon the type of processor in the machine (32 ior 64 bit).  We need to be careful to ensure a match, otherwise we could have all sorts errors in store realated to truncation or uninitialsed memory space.  Thus in '''sub.f90''', we have limited our variables to 4 bytes: | In general with mixed langauge programming, we need to be careful about mathing the variable types we pass between the languages.  For example, C typically uses 4 bytes to store a '''float'''.  However, Fortran may use 4 or 8 bytes to store a '''real''', depending upon the type of processor in the machine (32 ior 64 bit).  We need to be careful to ensure a match, otherwise we could have all sorts errors in store realated to truncation or uninitialsed memory space.  Thus in '''sub.f90''', we have limited our variables to 4 bytes: | ||
| − | < | + | <source lang="c"> | 
|    integer(kind=4),intent(in)                     :: arg_in |    integer(kind=4),intent(in)                     :: arg_in | ||
|    integer(kind=4),intent(in)                     :: array_size |    integer(kind=4),intent(in)                     :: array_size | ||
|    real(kind=4),dimension(array_size),intent(out) :: arg_out |    real(kind=4),dimension(array_size),intent(out) :: arg_out | ||
| − | </ | + | </source> | 
| An intersting point of difference to note between Fortran and C is that arrays are indexed differently.  Firstly, Fortran gives the first element of an array the index 1, by default.  C gives it an index of 0.  Compare the two loops in the source code files.  First for the calling C program: | An intersting point of difference to note between Fortran and C is that arrays are indexed differently.  Firstly, Fortran gives the first element of an array the index 1, by default.  C gives it an index of 0.  Compare the two loops in the source code files.  First for the calling C program: | ||
| − | < | + | <source lang="c"> | 
|    /* initialise array to ones */ |    /* initialise array to ones */ | ||
|    for(ii=0;ii<MAXSIZE;++ii) { |    for(ii=0;ii<MAXSIZE;++ii) { | ||
|      from_fort[ii] = 1.0; |      from_fort[ii] = 1.0; | ||
|    } |    } | ||
| − | </ | + | </source>   | 
| and second for the Fortran subroutine: | and second for the Fortran subroutine: | ||
| − | < | + | <source lang="fortran"> | 
|    ! just set output to be equal to input |    ! just set output to be equal to input | ||
|    do ii=1,array_size |    do ii=1,array_size | ||
|       arg_out(ii) = real(arg_in) |       arg_out(ii) = real(arg_in) | ||
|    end do |    end do | ||
| − | </ | + | </source>   | 
| Things diverge further for multi-dimensional arrays.  This is because arrays in C are 'row-major' and arrays in Fortran are 'column major' in their ordering.  Thus for a defualt 2-dimensional array, the cell (3,7) in C would map to (8,4) in Fortran.  Oh the joys of mixed languages!  Thankfully, since program units are typically self-contained and we would pass arrays in their entirity (actually just the memory address of the first cell), this difference in behaviour does not cause many headaches. | Things diverge further for multi-dimensional arrays.  This is because arrays in C are 'row-major' and arrays in Fortran are 'column major' in their ordering.  Thus for a defualt 2-dimensional array, the cell (3,7) in C would map to (8,4) in Fortran.  Oh the joys of mixed languages!  Thankfully, since program units are typically self-contained and we would pass arrays in their entirity (actually just the memory address of the first cell), this difference in behaviour does not cause many headaches. | ||
| + | |||
| + | = Linker Strife = | ||
| + | |||
| + | If you have compiled your main program using a Fortran compiler, your C code with a C compiler, and would like to link the whole lot using a C compiler, you will get '''undefined reference to "main"''' errors.  One way around this is to create a C main which immediately calls the Fortran program which has the symbol '''MAIN__''':  | ||
| + | |||
| + | <source lang="c"> | ||
| + | /*main.c */ | ||
| + | |||
| + | void MAIN__(); | ||
| + | |||
| + | int main(int argc, char* argv[]) | ||
| + | { | ||
| + |     MAIN__(); | ||
| + | } | ||
| + | </source> | ||
| = Fortran 2003 Improves Matters = | = Fortran 2003 Improves Matters = | ||
| + | |||
| + | It's important that we take care to match the types of the variables that we're using when exchanging between C and Fortran.  We did this explicitly above, which was useful for emphasis, but quite likely to prove rather unportable from machine to machine.  On project which largely addressed this issue of portability is the '''cfortran header project''': [http://www-zeus.desy.de/~burow/cfortran/cfortran.html]. | ||
| + | |||
| + | However, support for the Fortran2003 standard is growing among compilers (e.g for gfortran-4.3 and newer) and with it, we receive a good deal of aid built in to the actual language.  For example, the intrinsic module named '''ISO_C_BINDING''' contains the following parameterisations upon instrinsic Fortran types:  | ||
| + | |||
| + | <pre> | ||
| + | Type      Named constant      C type or types | ||
| + | INTEGER   C_INT               int | ||
| + |           C_SHORT             short int | ||
| + |           C_LONG              long int | ||
| + |           C_LONG_LONG         long long int | ||
| + |           C_SIGNED_CHAR       signed char, unsigned char | ||
| + |           C_SIZE_T            size_t | ||
| + |  ... | ||
| + | REAL      C_FLOAT             float | ||
| + |           C_DOUBLE            double | ||
| + |           C_LONG_DOUBLE       long double. | ||
| + | COMPLEX   C_FLOAT_COMPLEX     float _Complex | ||
| + |           C_DOUBLE_COMPLEX    double _Complex | ||
| + |           C_LONG_DOUBLE_COMPLEX long double _Complex | ||
| + | LOGICAL   C_BOOL              _Bool | ||
| + | CHARACTER C_CHAR              char | ||
| + | </pre> | ||
| + | |||
| + | Using some appropriately specified types, we can re-write example1 in a much more transparent and portable manner: | ||
| <pre> | <pre> | ||
| cd ../example3 | cd ../example3 | ||
| + | make | ||
| + | ./call_c.exe | ||
| </pre> | </pre> | ||
| + | |||
| + | Looking at '''call_c.f90''' we can see: | ||
| + | |||
| + | <source lang="fortran"> | ||
| + |   ! make use of this intrinsic module | ||
| + |   use ISO_C_BINDING | ||
| + | </source> | ||
| + | |||
| + | near the top of the file.  We then see our variable declarations, which make use of the newly available parameterisations: | ||
| + | |||
| + | <source lang="fortran"> | ||
| + |   integer(C_INT),parameter                :: array_size = 5  ! parameter gets init | ||
| + |   character(C_CHAR),dimension(array_size) :: str = 'fffff'   ! a Fortran string | ||
| + |   real(C_DOUBLE),dimension(array_size)    :: data = 1.1      ! Fortran array | ||
| + |   integer(C_INT),allocatable,dimension(:) :: data2           ! can init to null | ||
| + | ... | ||
| + | </source> | ||
| + | |||
| + | We see no change with regard to the C code. | ||
| + | |||
| + | = Interpreting Fortran derived types as C structures = | ||
| + | |||
| + | In example3 we saw the portable exchange of intrinsic Fortran types.  This is all well and good, but perhaps you have tens of variables which you would like to pass as arguments to C function.  The above approach would become cumbersome.  A smart move would be to bundle up related variables into their own user-derived types and then pass them through to C.   | ||
| + | |||
| + | Mapping user-derived types from Fortran to structures in C and vice versa used to be a somewhat fraught affair.  Matters have been significantly improved with the advent of Fortran2003.  However, we will see that full support is still not available and limits remain upon what can be done. | ||
| <pre> | <pre> | ||
| + | cd ../example4 | ||
| make | make | ||
| ./call_c.exe | ./call_c.exe | ||
| </pre> | </pre> | ||
| − | =  | + | Looking in '''call_c.f90''' we see that this time we have declared a user-derived type, with a '''BIND(C)''' parameter: | 
| + | |||
| + | <source lang="fortran"> | ||
| + |   type, BIND(C) :: mystuff | ||
| + |      integer(C_INT)                           :: f_dataLen | ||
| + |      character(C_CHAR), dimension(array_size) :: f_data1 | ||
| + |      real(C_DOUBLE),    dimension(array_size) :: f_data2 | ||
| + |   end type mystuff | ||
| + | |||
| + |   type(mystuff) :: someStuff | ||
| + | </source> | ||
| + | |||
| + | This, we can happily to pass to C: | ||
| + | |||
| + | <source lang="fortran"> | ||
| + | call cfunc(someStuff) | ||
| + | </source> | ||
| + | |||
| + | which is expecting a '''struct''' of the form: | ||
| + | |||
| + | <source lang="c"> | ||
| + | #define DATALEN 5 | ||
| + | |||
| + | /* using pointers here will result in a seg fault */ | ||
| + | struct gubbins | ||
| + | { | ||
| + |   int    c_dataLen; | ||
| + |   char   c_charData[DATALEN]; | ||
| + |   double c_realData[DATALEN]; | ||
| + | }; | ||
| + | </source> | ||
| + | |||
| + | All is fine for scalar variables, such as the '''c_dataLen''', but you will notice that we couldn't use a C variable length array, or pointers in the struct.  In turn, this has forced us to use C arrays of a hardwired length, which is rather unsatisfactory.  If the arrays are specified with a mismatching length, you will get an instant bug, which we have tried to protect against with: | ||
| + | |||
| + | <source lang="c"> | ||
| + |   /* Provide at least _some_ protection against subtle bugs */ | ||
| + |   if (mystuff->c_dataLen != DATALEN) { | ||
| + |     fprintf(stderr, "Error: Assumed DATALEN does not match value in struct\n"); | ||
| + |     exit(EXIT_FAILURE); | ||
| + | |||
| + |   } | ||
| + | </source> | ||
| + | |||
| + | Passing user-derived types to C and vice versa is still a very useful feature, and if using scalar variables suits your code, then you are in a good position.  However, the foregoing does illustrate that while Fortran2003 significantly improves matters, it does not provide a complete solution to C-Fortran interoperability. | ||
| + | |||
| + | =Calling Fortran from Python using F2PY= | ||
| + | |||
| + | Another popular pairing is to use Python to create the interface, utilising it's strong support for text processing, GUI building and graphics, and to use a traditional programming language, such as Fortran or C/C++, to rapidly perform any number crunching. | ||
| + | |||
| + | In the examples below, we'll take a quick peek at the support that '''f2py''' (provided as part of the '''numpy''' python package) gives those who wish to call Fortran routines from their Python scripts. | ||
| + | |||
| + | <pre> | ||
| + | cd ../example5 | ||
| + | module add languages/python-2.7.2.0 | ||
| + | </pre> | ||
| + | |||
| + | Useful information on f2py can be found at the following URLs (indeed all of the examples below are derived from the linked documentation): | ||
| + | |||
| + | * http://cens.ioc.ee/projects/f2py2e/ | ||
| + | * http://cens.ioc.ee/projects/f2py2e/usersguide/ | ||
| + | |||
| + | ==A Quick Test== | ||
| + | |||
| + | OK, let's just prove that we can quickly and simply call a Fortran subroutine from Python. | ||
| + | |||
| + | First, run '''f2py''' (we'll request the use of the '''gfortran''' compiler):  | ||
| + | |||
| + | <pre> | ||
| + | f2py --fcompiler=gnu95 -c -m hello hello.f | ||
| + | </pre> | ||
| + | |||
| + | You'll see that the result in the creation of a shared object file, '''hello.so'''.  We can test our labours using the following python script: | ||
| + | |||
| + | <source lang="python"> | ||
| + | #!/bin/env python | ||
| + | |||
| + | import hello | ||
| + | print hello.__doc__ | ||
| + | print hello.foo.__doc__ | ||
| + | hello.foo(4) | ||
| + | </source> | ||
| + | |||
| + | where '''hello.so''' is the subject of the line '''import hello'''.  This is the output when we run the script: | ||
| + | |||
| + | <pre> | ||
| + | ./test.py | ||
| + | This module 'hello' is auto-generated with f2py (version:2). | ||
| + | Functions: | ||
| + |   foo(a) | ||
| + | . | ||
| + | foo - Function signature: | ||
| + |   foo(a) | ||
| + | Required arguments: | ||
| + |   a : input int | ||
| + | |||
| + |  Hello from Fortran! | ||
| + |  a= 4 | ||
| + | </pre> | ||
| + | |||
| + | ==Fortran90 Constructs== | ||
| + | |||
| + | OK, that's great.  However, some will rightly ask whether f2py fully supports the Fortran90 dialect.  In an attempt to answer that, consider the following examples: | ||
| + | |||
| + | # '''moddata''':  This example introduces Fortran90 modules. | ||
| + | # '''allocarr''':  This example adds in the use of a Fortran90 allocatable array. | ||
| + | # '''type''':  This example demonstrates the use of a Fortran90 user derived type. | ||
| + | # '''pointer''':  This shows that F2PY does not support Fortran90 pointers (although the Fortran code itself is legal). | ||
| + | |||
| + | Take some time to review the Fortran90 source code and the corresponding python test scripts. | ||
| + | |||
| + | Here's our first example.  '''moddata.f90''' includes a Fortran90 module.  First we compile the source code and create an interface, all in a single invocation of f2py: | ||
| + | |||
| + | <pre> | ||
| + | f2py --fcompiler=gnu95 -c -m moddata moddata.f90 | ||
| + | </pre> | ||
| + | |||
| + | Running the test script, '''moddata.py''' should yield:  | ||
| + | |||
| + | <pre> | ||
| + | ./moddata.py | ||
| + | i - 'i'-scalar | ||
| + | x - 'i'-array(4) | ||
| + | a - 'f'-array(2,3) | ||
| + | foo - Function signature: | ||
| + |   foo() | ||
| + | |||
| + | |||
| + |  i=           5 | ||
| + |  x=[           1           2           0           0 ] | ||
| + |  a=[ | ||
| + |  [  1.00000000     ,   2.0000000     ,   3.0000000     ] | ||
| + |  [   4.0000000     ,   5.0000000     ,   6.0000000     ] | ||
| + |  ] | ||
| + | </pre> | ||
| + | |||
| + | Our second example, '''allocarr.f90''', uses an allocatable array and is prepared by typing:  | ||
| + | |||
| + | <pre> | ||
| + | f2py --fcompiler=gnu95 -c -m allocarr allocarr.f90 | ||
| + | </pre> | ||
| + | |||
| + | The test script, '''allocarr.py''' should yield: | ||
| + | |||
| + | <pre> | ||
| + | ./allocarr.py | ||
| + | b - 'f'-array(-1,-1), not allocated | ||
| + | foo - Function signature: | ||
| + |   foo() | ||
| + | |||
| + | |||
| + |  b is not allocated | ||
| + |  b=[ | ||
| + |   1.00000000       2.0000000       3.0000000     | ||
| + |    4.0000000       5.0000000       6.0000000     | ||
| + |  ] | ||
| + |  b=[ | ||
| + |   1.00000000       2.0000000       3.0000000     | ||
| + |    4.0000000       5.0000000       6.0000000     | ||
| + |    7.0000000       8.0000000       9.0000000     | ||
| + |  ] | ||
| + |  b is not allocated | ||
| + | </pre> | ||
| − | + | Our third example, '''type.f90''', shows the use of a user derived type: | |
| + | <pre> | ||
| + | f2py --fcompiler=gnu95 -c -m type type.f90 | ||
| + | </pre> | ||
| − | + | The corresponding test, '''type.py''', pithily outputs: | |
| + | |||
| + | <pre> | ||
| + | ./type.py | ||
| + |     1   2.0000000000000000        3.0000000000000000                4           5           6 | ||
| + | </pre> | ||
| + | |||
| + | Lastly, '''pointer.f90''', attempts to use a Fortran90 pointer.  Although the source code will compile, f2py does not support the use of pointers and so the test scripts produces an error: | ||
| + | |||
| + | <pre> | ||
| + | ./pointer.py | ||
| + | c - 'f'-array(-1,-1), not allocated | ||
| + | bar - Function signature: | ||
| + |   bar() | ||
| + | |||
| + | |||
| + |  c is not associated | ||
| + | Traceback (most recent call last): | ||
| + |   File "./pointer.py", line 10, in <module> | ||
| + |     pointer.mod.c = [[1,2,3],[4,5,6]]         # allocate/initialize c | ||
| + | SystemError: error return without exception set | ||
| + | </pre> | ||
| = Calling C from Python using SWIG = | = Calling C from Python using SWIG = | ||
| − | '''NB You will need  | + | '''NB You will need SWIG installed on your machine for this example. SWIG is not installed on bluecrystal phases 1 & 2, but is installed on phase 3.''' | 
| + | |||
| + | SWIG stands for the Simplified Wrapper and Interface Generator. | ||
| + | |||
| + | This example covers calling C and C++ from Python.  However, SWIG can also be used to connect C and C++ code to other scripting languages, such as Perl, PHP, Tcl and Ruby. | ||
| + | |||
| + | See: | ||
| + | * http://www.swig.org/tutorial.html | ||
| + | |||
| + | First of all, let's move to the directory of examples: | ||
| <pre> | <pre> | ||
| cd ../exampleX | cd ../exampleX | ||
| + | ls | ||
| + | example.c  example.i  header.h  Makefile  pair.h  pair.i  test.py | ||
| </pre> | </pre> | ||
| + | |||
| + | You will notice that in addition to the source code files, interface files--with file extension '''<file>.i'''--must also be written.  This is in contrast to f2py, which automatically created the interface files. I have created a makefile to simplify the creation of the shared objects.  Just type: | ||
| <pre> | <pre> | ||
| make | make | ||
| </pre> | </pre> | ||
| + | |||
| + | Then, in order to test the C and C++ wrapping, run: | ||
| <pre> | <pre> | ||
| ./test.py | ./test.py | ||
| + | ===wrapped C code=== | ||
| + | example.fact(5) is: 120 | ||
| + | example.my_mod(7,3) is: 1 | ||
| + | example.get_time() is: Fri Aug 23 12:16:18 2013 | ||
| + | |||
| + | ===wrapped C++ code=== | ||
| + | initialise a with: pair.pairii(3,4) | ||
| + | a.first is: 3 | ||
| + | a.second is: 4 | ||
| + | reset a.second = 16 | ||
| + | a.second is: 16 | ||
| + | initialise b with: pair.pairdi(3.5,8) | ||
| + | b.first is: 3.5 | ||
| + | b.second is: 8 | ||
| </pre> | </pre> | ||
| − | + | =Calling C/C++ and Fortran from Matlab using MEX Files= | |
| − | |||
| − | Here we  | + | * http://www.mathworks.co.uk/help/matlab/create-mex-files.html | 
| + | |||
| + | =Calling C/C++ and Fortran from R= | ||
| + | |||
| + | * http://www.biostat.jhsph.edu/~rpeng/docs/interface.pdf | ||
| + | * http://users.stat.umn.edu/~geyer/rc/ | ||
| + | |||
| + | =Appendices= | ||
| + | ==Making a Daisy Chain: Calling Fortran from C from Fortran== | ||
| + | |||
| + | <pre> | ||
| + | cd ../exampleQ | ||
| + | </pre> | ||
| + | |||
| + | Here we pass a regular derived type ''through'' some C code and back out into some Fortran.  We can see that the derived type emerges unscathed, since the C code treats it as an opaque pacel, through the use of a pointer to void.  In Fortran: | ||
| + | |||
| + | <pre> | ||
| + |   type station | ||
| + |      real                  :: frequency                   ! MHz | ||
| + |      character(len=maxstr) :: name                        ! str | ||
| + |      character(len=maxstr) :: broadcaster                 ! str | ||
| + |   end type station | ||
| + | |||
| + |   type(station),pointer,dimension(:)   :: stations    ! array of stations | ||
| + | |||
| + |   ... | ||
| + | |||
| + |   call cfunc( ... ,stations) | ||
| + | </pre> | ||
| + | |||
| + | and in the C: | ||
| + | <pre> | ||
| + |   void cfunc_( ..., void* stations) | ||
| + | </pre> | ||
Latest revision as of 11:22, 13 September 2013
Mixed Language Programming: Mix up a quick and useful cocktail today!
Introduction
One of the key underling themes in this series of pragmatic programming workshops is getting things done, with a minimum of fuss and wasted effort, and this workshop on mixed language programming is no exception.
When we sit down to a keyboard, we do so with a particular goal in mind. We have a task, idea or experiment and the computer is our workbench. The languages we write programs with, along with the text editors, compilers, debuggers etc. are our tools. Now some tools are better than others (e.g. Subversion is an improvement upon CVS). Some we just may have a preference for (e.g. emacs vs. vi for editing files). None are perfect, however, and all have their pros and cons.
What's all this got to do with mixed language programming? you ask. Well, imagine a scenario:
You sit down to your workbench. You have your goal and you ask yourself whether somebody else has written some code which will do at least part of what you want. Perhaps they've bundled it up into a nice open-source library that you can use? That way you'll save truck-loads of time. A couple of web searches later, and bingo! You've found an ideal library for the job. There's only one snag. You like programming in Fortran, and the library is written in C. Scuppered! Well, perhaps not..
What are your options? You could use something not so suitable because it happens to be written in Fortran. Hmm, that doesn't sound so good. You could translate the library from C to Fortran. Hmm, that sounds like tedious and time-consuming work. Plus you'd need to understand the C, and perhaps a direct translation can't be made anyhow? This is looking like a dead-end too. It would be far better to leave the library as it is and to call the routines from your favoured language. Is that possible? Sure it is! Read on and find out how..
Another scenario is that C or Fortran do number crunching well, but aren't best suited to text processing or creating GUIs. For scripting languages, like python for example, the reverse is true. Wouldn't it be nice if we could combine their strengths in a single project. Well, you can!
In this workshop, we'll look at ways in which we can mix languages and in the process create a useful end product which plays to the strengths of it's components and gets you to where you want to be, without any laborious re-writes. In the first two examples we'll look at calling C code from Fortran and then calling Fortran from C. To get the code for examples, log into your preferred Linux machine and cut and paste the following into your terminal.
svn export https://svn.ggy.bris.ac.uk/subversion-open/polyglot/trunk ./polyglot
Making a Fortran wrapped C parcel
In the first example we'll call a C function from a Fortran program:
cd examples/example1
To compile the example, type:
make
and to run it, type:
call_c.exe
Tada! We've mixed our languages into a single executable and it works! Cool. Very cool. OK, so much for the magic, let's take a look inside the files. Open up call_c.f90. In this simple program, we have a character string that we pass to the C function--called cfunc--which in turn modifies the string and we can print the result. For those familiar with Fotran90, the main program is trivial enough. So, let's turn to the C code in func.c. Again this is a simple function and the syntax will be familiar to those who know some C. An interesting detail, however, is the name of the function. Note that we have called it cfunc_, i.e. with a trailing underscore. Why on earth have we done that? Well, we are anticipating the 'decoration' or name mangling that the compiler will apply.
We can look inside the object files using a utility called nm. First let's look inside call_c.o. On my machine I get:
> nm call_c.o
00000000 T MAIN__
         U _gfortran_set_std
         U _gfortran_st_write
         U _gfortran_st_write_done
         U _gfortran_transfer_character
         U cfunc_
These are the symbols created by the compiler in the object file. Now let's look inside func.o:
> nm func.o 00000000 T cfunc_
Note the two cfunc_s? They match! If we had not pre-decorated the name of the function in func.c, we would not have got a match and we would have got a link-time error. However, we were smart and got ourselves a working executable--hurrah!
A word of caution, however. Different compilers 'mangle' subroutine and function names differently, i.e. dependening upon the mix of compilers, we can't always rely on a single trailing underscore decoration. You will need to use nm to determine the exact decoration your Fortran compiler expects and to design your code so that any changes are easily accomodated. One way to do this is to include a define preprocessor macro in you C code, such as:
#define CFUNC cfunc_
...
void CFUNC(int* string_size, char* string)
In that way, a change of decoration can be easily, and consistently made across the whole file/library.
Turning the Parcel Inside-Out
OK let's turn our mind to working the other way 'round and call a Fortran subroutine from a C program:
cd ../example2
We can compile and run in a similar way:
make ./call_fort.exe
If you look inside the two key files, sub.f90 and call_fort.c, you can see that we've still kept the program fairly simple, but we've introduced a few more variable types (integers, floating-point numbers and arrays) and syntactic refinements (use of preprocessor macros and 'const' and 'intent(in)' declarations).
The decoration and name-mangling will be familiar to you from the last example. Note the use of the define macro in call_fort.c:
#define FORT_SUB fort_sub_
The code comments in the last example touched upon the distinction between the pass-by-value default behaviour of C and the pass-by-reference adopted by Fortran. Thus, in order to dovetail the two languages, we must prevail upon our C code to use pass-by-reference. We can see this in the function prototype in call_fort.c:
extern void FORT_SUB(const int*, const int*, float[]);
Here we are declaring that we will use pointers-to variables, i.e. addresseses or references, as the arguments to the routine, denoted by the *s. The routine, of course, is written in Fortran, and so is expecting these references. By default, C would have taken a copy of the value of a variable and passed that. This is where the term pass-by-value comes from. (Note that a side effect of pass-by-value is that any changes are only local to a routine and so do not have an effect on the value of the variable used in the calling routine.) Note the the const modifiers match with the intent(in) attributes in sub.f90. This is an example of defensive programming. Normally, when we pass references, we run the risk that called routine could modify the values of the variables in a lasting way. This may not be what you would like, and so it would be unsafe. However, by using const and intent(in), we are saying from both sides that some values are not to be changed and so we protect ourselves.
In general with mixed langauge programming, we need to be careful about mathing the variable types we pass between the languages. For example, C typically uses 4 bytes to store a float. However, Fortran may use 4 or 8 bytes to store a real, depending upon the type of processor in the machine (32 ior 64 bit). We need to be careful to ensure a match, otherwise we could have all sorts errors in store realated to truncation or uninitialsed memory space. Thus in sub.f90, we have limited our variables to 4 bytes:
  integer(kind=4),intent(in)                     :: arg_in
  integer(kind=4),intent(in)                     :: array_size
  real(kind=4),dimension(array_size),intent(out) :: arg_out
An intersting point of difference to note between Fortran and C is that arrays are indexed differently. Firstly, Fortran gives the first element of an array the index 1, by default. C gives it an index of 0. Compare the two loops in the source code files. First for the calling C program:
  /* initialise array to ones */
  for(ii=0;ii<MAXSIZE;++ii) {
    from_fort[ii] = 1.0;
  }
and second for the Fortran subroutine:
  ! just set output to be equal to input
  do ii=1,array_size
     arg_out(ii) = real(arg_in)
  end do
Things diverge further for multi-dimensional arrays. This is because arrays in C are 'row-major' and arrays in Fortran are 'column major' in their ordering. Thus for a defualt 2-dimensional array, the cell (3,7) in C would map to (8,4) in Fortran. Oh the joys of mixed languages! Thankfully, since program units are typically self-contained and we would pass arrays in their entirity (actually just the memory address of the first cell), this difference in behaviour does not cause many headaches.
Linker Strife
If you have compiled your main program using a Fortran compiler, your C code with a C compiler, and would like to link the whole lot using a C compiler, you will get undefined reference to "main" errors. One way around this is to create a C main which immediately calls the Fortran program which has the symbol MAIN__:
/*main.c */
void MAIN__();
int main(int argc, char* argv[])
{
    MAIN__();
}
Fortran 2003 Improves Matters
It's important that we take care to match the types of the variables that we're using when exchanging between C and Fortran. We did this explicitly above, which was useful for emphasis, but quite likely to prove rather unportable from machine to machine. On project which largely addressed this issue of portability is the cfortran header project: [1].
However, support for the Fortran2003 standard is growing among compilers (e.g for gfortran-4.3 and newer) and with it, we receive a good deal of aid built in to the actual language. For example, the intrinsic module named ISO_C_BINDING contains the following parameterisations upon instrinsic Fortran types:
Type      Named constant      C type or types
INTEGER   C_INT               int
          C_SHORT             short int
          C_LONG              long int
          C_LONG_LONG         long long int
          C_SIGNED_CHAR       signed char, unsigned char
          C_SIZE_T            size_t
 ...
REAL      C_FLOAT             float
          C_DOUBLE            double
          C_LONG_DOUBLE       long double.
COMPLEX   C_FLOAT_COMPLEX     float _Complex
          C_DOUBLE_COMPLEX    double _Complex
          C_LONG_DOUBLE_COMPLEX long double _Complex
LOGICAL   C_BOOL              _Bool
CHARACTER C_CHAR              char
Using some appropriately specified types, we can re-write example1 in a much more transparent and portable manner:
cd ../example3 make ./call_c.exe
Looking at call_c.f90 we can see:
  ! make use of this intrinsic module
  use ISO_C_BINDING
near the top of the file. We then see our variable declarations, which make use of the newly available parameterisations:
  integer(C_INT),parameter                :: array_size = 5  ! parameter gets init
  character(C_CHAR),dimension(array_size) :: str = 'fffff'   ! a Fortran string
  real(C_DOUBLE),dimension(array_size)    :: data = 1.1      ! Fortran array
  integer(C_INT),allocatable,dimension(:) :: data2           ! can init to null
...
We see no change with regard to the C code.
Interpreting Fortran derived types as C structures
In example3 we saw the portable exchange of intrinsic Fortran types. This is all well and good, but perhaps you have tens of variables which you would like to pass as arguments to C function. The above approach would become cumbersome. A smart move would be to bundle up related variables into their own user-derived types and then pass them through to C.
Mapping user-derived types from Fortran to structures in C and vice versa used to be a somewhat fraught affair. Matters have been significantly improved with the advent of Fortran2003. However, we will see that full support is still not available and limits remain upon what can be done.
cd ../example4 make ./call_c.exe
Looking in call_c.f90 we see that this time we have declared a user-derived type, with a BIND(C) parameter:
  type, BIND(C) :: mystuff
     integer(C_INT)                           :: f_dataLen
     character(C_CHAR), dimension(array_size) :: f_data1
     real(C_DOUBLE),    dimension(array_size) :: f_data2
  end type mystuff
  type(mystuff) :: someStuff
This, we can happily to pass to C:
call cfunc(someStuff)
which is expecting a struct of the form:
#define DATALEN 5
/* using pointers here will result in a seg fault */
struct gubbins
{
  int    c_dataLen;
  char   c_charData[DATALEN];
  double c_realData[DATALEN];
};
All is fine for scalar variables, such as the c_dataLen, but you will notice that we couldn't use a C variable length array, or pointers in the struct. In turn, this has forced us to use C arrays of a hardwired length, which is rather unsatisfactory. If the arrays are specified with a mismatching length, you will get an instant bug, which we have tried to protect against with:
  /* Provide at least _some_ protection against subtle bugs */
  if (mystuff->c_dataLen != DATALEN) {
    fprintf(stderr, "Error: Assumed DATALEN does not match value in struct\n");
    exit(EXIT_FAILURE);
  }
Passing user-derived types to C and vice versa is still a very useful feature, and if using scalar variables suits your code, then you are in a good position. However, the foregoing does illustrate that while Fortran2003 significantly improves matters, it does not provide a complete solution to C-Fortran interoperability.
Calling Fortran from Python using F2PY
Another popular pairing is to use Python to create the interface, utilising it's strong support for text processing, GUI building and graphics, and to use a traditional programming language, such as Fortran or C/C++, to rapidly perform any number crunching.
In the examples below, we'll take a quick peek at the support that f2py (provided as part of the numpy python package) gives those who wish to call Fortran routines from their Python scripts.
cd ../example5 module add languages/python-2.7.2.0
Useful information on f2py can be found at the following URLs (indeed all of the examples below are derived from the linked documentation):
A Quick Test
OK, let's just prove that we can quickly and simply call a Fortran subroutine from Python.
First, run f2py (we'll request the use of the gfortran compiler):
f2py --fcompiler=gnu95 -c -m hello hello.f
You'll see that the result in the creation of a shared object file, hello.so. We can test our labours using the following python script:
#!/bin/env python
import hello
print hello.__doc__
print hello.foo.__doc__
hello.foo(4)
where hello.so is the subject of the line import hello. This is the output when we run the script:
./test.py This module 'hello' is auto-generated with f2py (version:2). Functions: foo(a) . foo - Function signature: foo(a) Required arguments: a : input int Hello from Fortran! a= 4
Fortran90 Constructs
OK, that's great. However, some will rightly ask whether f2py fully supports the Fortran90 dialect. In an attempt to answer that, consider the following examples:
- moddata: This example introduces Fortran90 modules.
- allocarr: This example adds in the use of a Fortran90 allocatable array.
- type: This example demonstrates the use of a Fortran90 user derived type.
- pointer: This shows that F2PY does not support Fortran90 pointers (although the Fortran code itself is legal).
Take some time to review the Fortran90 source code and the corresponding python test scripts.
Here's our first example. moddata.f90 includes a Fortran90 module. First we compile the source code and create an interface, all in a single invocation of f2py:
f2py --fcompiler=gnu95 -c -m moddata moddata.f90
Running the test script, moddata.py should yield:
./moddata.py i - 'i'-scalar x - 'i'-array(4) a - 'f'-array(2,3) foo - Function signature: foo() i= 5 x=[ 1 2 0 0 ] a=[ [ 1.00000000 , 2.0000000 , 3.0000000 ] [ 4.0000000 , 5.0000000 , 6.0000000 ] ]
Our second example, allocarr.f90, uses an allocatable array and is prepared by typing:
f2py --fcompiler=gnu95 -c -m allocarr allocarr.f90
The test script, allocarr.py should yield:
./allocarr.py b - 'f'-array(-1,-1), not allocated foo - Function signature: foo() b is not allocated b=[ 1.00000000 2.0000000 3.0000000 4.0000000 5.0000000 6.0000000 ] b=[ 1.00000000 2.0000000 3.0000000 4.0000000 5.0000000 6.0000000 7.0000000 8.0000000 9.0000000 ] b is not allocated
Our third example, type.f90, shows the use of a user derived type:
f2py --fcompiler=gnu95 -c -m type type.f90
The corresponding test, type.py, pithily outputs:
./type.py
    1   2.0000000000000000        3.0000000000000000                4           5           6
Lastly, pointer.f90, attempts to use a Fortran90 pointer. Although the source code will compile, f2py does not support the use of pointers and so the test scripts produces an error:
./pointer.py
c - 'f'-array(-1,-1), not allocated
bar - Function signature:
  bar()
 c is not associated
Traceback (most recent call last):
  File "./pointer.py", line 10, in <module>
    pointer.mod.c = [[1,2,3],[4,5,6]]         # allocate/initialize c
SystemError: error return without exception set
Calling C from Python using SWIG
NB You will need SWIG installed on your machine for this example. SWIG is not installed on bluecrystal phases 1 & 2, but is installed on phase 3.
SWIG stands for the Simplified Wrapper and Interface Generator.
This example covers calling C and C++ from Python. However, SWIG can also be used to connect C and C++ code to other scripting languages, such as Perl, PHP, Tcl and Ruby.
See:
First of all, let's move to the directory of examples:
cd ../exampleX ls example.c example.i header.h Makefile pair.h pair.i test.py
You will notice that in addition to the source code files, interface files--with file extension <file>.i--must also be written. This is in contrast to f2py, which automatically created the interface files. I have created a makefile to simplify the creation of the shared objects. Just type:
make
Then, in order to test the C and C++ wrapping, run:
./test.py ===wrapped C code=== example.fact(5) is: 120 example.my_mod(7,3) is: 1 example.get_time() is: Fri Aug 23 12:16:18 2013 ===wrapped C++ code=== initialise a with: pair.pairii(3,4) a.first is: 3 a.second is: 4 reset a.second = 16 a.second is: 16 initialise b with: pair.pairdi(3.5,8) b.first is: 3.5 b.second is: 8
Calling C/C++ and Fortran from Matlab using MEX Files
Calling C/C++ and Fortran from R
Appendices
Making a Daisy Chain: Calling Fortran from C from Fortran
cd ../exampleQ
Here we pass a regular derived type through some C code and back out into some Fortran. We can see that the derived type emerges unscathed, since the C code treats it as an opaque pacel, through the use of a pointer to void. In Fortran:
  type station
     real                  :: frequency                   ! MHz
     character(len=maxstr) :: name                        ! str
     character(len=maxstr) :: broadcaster                 ! str
  end type station
  type(station),pointer,dimension(:)   :: stations    ! array of stations
  ...
  call cfunc( ... ,stations)
and in the C:
void cfunc_( ..., void* stations)