GENIE:GENIEToolboxTutorial

Andrew Price, University of Southampton, ([mailto:a.r.price@soton.ac.uk a.r.price@soton.ac.uk])

GENIE Toolbox Tutorial
The GENIE Toolbox has been designed to accommodate a wide range of computational resource on the Grid. Interfaces are provided for resources managed by the Globus Toolkit (v2.4), Condor (native, SSH, CondorWS) and Microsoft Compute Cluster. GENIE models can be submitted to systems running various operating systems including Linux/UNIX, Windows and MAC OSX. In such a heterogeneous environment it cannot be assumed that a compiler can be configured and used on a remote compute node. It is therefore necessary to build the GENIE model offline for any target system you intend to use before exploiting the Toolbox to use the computational Grid.

The GENIE Toolbox supports release rel-2-1-0 of the GENIE code as tagged in the CVS repository. This tutorial will demonstrate how to execute and manage this release of the GENIE framework on appropriate resource. We strongly recommend that you provide your own build(s) for production studies as described in the next section.

Preparing GENIE Model Archive
The GENIE toolbox provides a management and coordination layer for model binaries and their output data. The system does not, at the present time, interface to the SVN code repository and does not provide a compilation environment. It is therefore assumed that the user will provide a file archive (tar.gz or zip format) containing the model binary to be studied and any static input data files that this binary requires. To prepare such an archive:


 * Export the version of the code that you wish to study from the GENIE SVN repository
 * We strongly recommend tagging the version of the code to be studied and checking out against this tag
 * Compile and build the model binary for each target platform you intend to use on the Grid
 * At present, this is most easily achieved by invoking the genie_example.job script with the required changes to makefile.arc and using the appropriate config file for the study
 * Instructions for building GENIE on UNIX/Linux and Win32 platforms are available in the genie-main module directory
 * [win32] Copy the netcdf.dll file to the directory containing genie.exe
 * Archive the directory hierarchy containing the genie.exe binary and the static data input files
 * The GENIE Toolbox will assume that linux / unix archives are in the tar.gz format and that win32 archives are zip files.

Some reference builds for release rel-2-1-0 are available:


 * genie_ig_fi_fi_archive.zip
 * genie_ig_fi_fi_archive.tar.gz
 * genie_eb_go_gs_archive.zip
 * genie_eb_go_gs_archive.tar.gz
 * genie_ig_go_sl_archive.zip
 * genie_ig_go_sl_archive.tar.gz

Examples
For a standard Linux build the following steps would create a suitable archive:  Checkout / Export the GENIE code from CVS using a release tag if applicable > cvs export –r rel-2-1-0 core Make any changes to makefile.arc and/or genie_example.job as appropriate for your local environment</li> Comment out the execution of the genie.exe</tt> binary from the genie_example.job</tt> script</li> echo 'STARTING EXPERIMENT:' date echo 'ENDING EXPERIMENT:' Run the genie_example.job</tt> script using a config file if appropriate for your build</li> > cd genie-main > ./genie_example.job –f configs/genie_ig_go_sl.config Create an archive containing the output directory structure, the GENIE binary and the input data files. E.g.</li> > cd .. > tar zcvf genie_ig_go_sl_runtime.tar.gz \ genie_output \ genie-main/data/input \ genie-main/inputdata \ genie-igcm3/data/input \ genie-goldstein/data/input \ genie-slabseaice/data/input \ genie-fixedchem/data/input \ genie-fixedicesheet/data/input </ol> The input data directories contain a lot of files that are only applicable to a particular build of the model. We recommend only adding files that your build requires to keep the archive size to a minimum.
 * 1) time ./genie.exe || ABORT EXECUTE

Job Preparation
To execute a GENIE model using the toolbox three descriptive data structures must be created in the Matlab workspace. These variables provide a comprehensive description of the specific GENIE model configuration to execute, a local runtime environment in which model instances can be prepared for execution and a computational resource on which the simulation will be performed.

Configuration
Create a description of a specific instance of the GENIE model. At the Matlab command prompt enter:


 * >> configuration = genie_ig_go_sl_gaalbedofluxcorr1_config</tt>

<ol> configuration =

genie_main: [1x1 struct] genie_igcm3: [1x1 struct] genie_goldstein: [1x1 struct] genie_fixedchem: [1x1 struct] genie_fixedicesheet: [1x1 struct] genie_slabseaice: [1x1 struct] </ol>

This executes the genie_ig_go_sl_gaalbedofluxcorr1</tt> function which is a direct port of the genie_ig_go_sl_gaalbedofluxcorr1.config</tt> file from the GENIE CVS repository. The function has loaded a complete set of parameters for the GENIE-2 model comprising the IGCM atmosphere, the GOLDSTEIN ocean, slab sea-ice, fixed chemistry and fixed ice sheet modules. The genie_main</tt> field contains the parameters controlling the execution of the whole model. For the purposes of the demonstration we will reduce the total number of timesteps in the configuration so that the simulation lasts for a single month.


 * >> configuration.genie_main.Parameter.GENIE_CONTROL_NML.koverall_total = 720;</tt>

Runtime
A local runtime data structure is created to provide details about the locations of the model binary and a directory in which new model invocations can be managed. For the purposes of this demonstration we will initially execute the model on the local machine. The local runtime needs to provide the appropriate binary for the OS on which Matlab is running. The runtime for the demonstration binary is specified as follows:


 * Windows (Win32)
 * >> runtime.RuntimeArchive=fullfile('../demo/runtime','genie_ig_go_sl_gaalbedofluxcorr1_archive.zip');</tt>
 * >> runtime.RuntimeArchiveTool=fullfile('../demo/runtime','unzip.exe');</tt>
 * >> runtime.LocalRunDir='..\demo\runtime'</tt>
 * >> runtime.EXPID='genie_ig_go_sl_gaalbedofluxcorr1';</tt>

<ol> runtime = RuntimeArchive: '..\demo\runtime\genie_ig_go_sl_gaalbedofluxcorr1_archive.zip' RuntimeArchiveTool: '..\demo\runtime\unzip.exe' LocalRunDir: '..\demo\runtime' EXPID: 'genie_ig_go_sl_gaalbedofluxcorr1' </ol>


 * Linux / UNIX / Mac OSX
 * >> runtime.RuntimeArchive=fullfile('../demo/runtime','genie_ig_go_sl_gaalbedofluxcorr1_archive.tar.gz');</tt>
 * <tt>>> runtime.LocalRunDir='../demo/runtime'</tt>
 * <tt>>> runtime.EXPID='genie_ig_go_sl_gaalbedofluxcorr1';</tt>

<ol> runtime = RuntimeArchive: '../demo/runtime/genie_ig_go_sl_gaalbedofluxcorr1_archive.tar.gz' LocalRunDir: '../demo/runtime' EXPID: 'genie_ig_go_sl_gaalbedofluxcorr1' </ol>

Resource
The final data structure describes the computational resource on which the model will run. For this demonstration the model will be executed on the local machine. A utility script is provided for configuring the resource data structure:


 * Windows (Win32)
 * <tt>>> resource = createResource</tt>
 * [[image:resourcetype.png|Type of resource]]
 * Select 'local'
 * [[image:resourceos.png|Operating System of the resource]]
 * Select the operating system of the machine you are running Matlab on
 * <tt>Please provide a short meaningful name for the resource:</tt>
 * Type: local machine
 * <tt>Please specify the maximum number of jobs that may be submitted to this resource [10]: >></tt>
 * Type: 1
 * <tt>Upload this resource to the database? Y/N [N]:</tt>
 * Select N

<ol> resource =

type: 'local' name: 'local machine' MaxJobs: 1 broker: 'fork' RemoteTargetOS: 'win32' RemoteFileSep: '\' </ol>


 * Linux / UNIX / Mac OSX
 * <tt>>> resource = createResource</tt>
 * [[image:resourcetype.png|Type of resource]]
 * Select 'local'
 * [[image:resourceos.png|Operating System of the resource]]
 * Select the operating system of the machine you are running Matlab on
 * <tt>Please provide a short meaningful name for the resource:</tt>
 * Type: local machine
 * <tt>Please specify the maximum number of jobs that may be submitted to this resource [10]: >></tt>
 * Type: 1
 * <tt>Upload this resource to the database? Y/N [N]:</tt>
 * Select N

<ol> resource =

type: 'local' name: 'local machine' MaxJobs: 1 broker: 'fork' RemoteTargetOS: 'linux' RemoteFileSep: '/' </ol>

Restarts
To restart a model from previous output a further data structure is required. The <tt>restart</tt> structure simply specifies the locations of any additional files required to initialise the model. The files may be specified as locations in the local file system or with unique identifiers from the GENIE database.

Example
The files required to restart an instance of the <tt>genie_ig_go_sl_gaalbedofluxcorr1</tt> after one month of simulation are:
 * igcmlandsurf_restart_2000_01_30.nc
 * igcmoceansurf_restart_2000_01_30.nc
 * igcm_rs_2000_01.nc
 * igcm_rg_2000_01.nc
 * goldstein_restart_2000_01_30.nc
 * slabseaice_restart_2000_01_30.nc

If the files reside in the local filesystem they are specified in the workspace as follows:
 * <tt>restart{1}.localRestartFile='./igcmlandsurf_restart_2000_01_30.nc';</tt>
 * <tt>restart{2}.localRestartFile='./igcmoceansurf_restart_2000_01_30.nc';</tt>
 * <tt>restart{3}.localRestartFile='./igcm_rs_2000_01.nc';</tt>
 * <tt>restart{4}.localRestartFile='./igcm_rg_2000_01.nc';</tt>
 * <tt>restart{5}.localRestartFile='./goldstein_restart_2000_01_30.nc';</tt>
 * <tt>restart{6}.localRestartFile='./slabseaice_restart_2000_01_30.nc';</tt>

If the files reside in the database they are specified in the workspace as follows:
 * <tt>restart{1}.standard.ID='igcmlandsurf_restart_2000_01_30_nc_...';</tt>
 * <tt>restart{2}.standard.ID='igcmoceansurf_restart_2000_01_30_nc_...';</tt>
 * <tt>restart{3}.standard.ID='igcm_rs_2000_01_nc_...';</tt>
 * <tt>restart{4}.standard.ID='igcm_rg_2000_01_nc_...';</tt>
 * <tt>restart{5}.standard.ID='goldstein_restart_2000_01_30_nc_...';</tt>
 * <tt>restart{6}.standard.ID='slabseaice_restart_2000_01_30_nc_...';</tt>

If the local names of the files have been obtained as part of the query on the database then this information can be supplied. Providing this information helps the system as a further query does not have to be perfomed to find this information
 * <tt>restart{1}.standard.ID='igcmlandsurf_restart_2000_01_30_nc_...';</tt>
 * <tt>restart{1}.standard.localName='igcmlandsurf_restart_2000_01_30.nc';</tt>
 * <tt>restart{2}.standard.ID='igcmoceansurf_restart_2000_01_30_nc_...';</tt>
 * <tt>restart{2}.standard.localName='igcmoceansurf_restart_2000_01_30.nc';</tt>
 * <tt>restart{3}.standard.ID='igcm_rs_2000_01_nc_...';</tt>
 * <tt>restart{3}.standard.localName='igcm_rs_2000_01.nc';</tt>
 * <tt>restart{4}.standard.ID='igcm_rg_2000_01_nc_...';</tt>
 * <tt>restart{4}.standard.localName='igcm_rg_2000_01.nc';</tt>
 * <tt>restart{5}.standard.ID='goldstein_restart_2000_01_30_nc_...';</tt>
 * <tt>restart{5}.standard.localName='goldstein_restart_2000_01_30.nc';</tt>
 * <tt>restart{6}.standard.ID='slabseaice_restart_2000_01_30_nc_...';</tt>
 * <tt>restart{6}.standard.localName='slabseaice_restart_2000_01_30.nc';</tt>

Job Submission: Local Machine
The three data structures are now defined and the GENIE model can be executed. This achieved through a single call to the gc_jobsubmit function:


 * <tt>>> [handle, retrieve]=gc_jobsubmit(configuration, runtime, resource)</tt>

<ol> *******************************************************  Welcome to GENIE, initialisation starting ******************************************************* =======================================================   Initialisation of GENIE main module complete =======================================================  fixedicesheet: Opening orog file ../../genie-fixedicesheet/data/input/orog_grid_std_t21.nc

...

=
==========================================  Initialising GOLDSTEIN module shutdown ======================================================= GOLD : weighted r.m.s. model-data error    1.44769543374438 GOLD : volm transport weighted temperatures j=26    and opsia -1.053364926767422E-003 1.074417049012253E-003  2.107798060746225E-004 max poleward heat flux  7.257640610796259E-004 overturning extrema in Sv ominp,omaxp,omina,omaxa,avn -0.22954E+02   0.25604E+02   -0.79754E+01    0.20454E+01    0.15206E+00 =======================================================  GOLDSTEIN module shutdown complete ======================================================= *******************************************************   Shutdown complete, au revoir *******************************************************

handle = 192.168.0.1@@C:\demo\runtime\20060814T170527_950129

retrieve =

runtime: {'run_condor_win32.bat' 'genie_ig_go_sl_runtime.zip'  'unzip.exe'} handle: '192.168.0.1@@C:\demo\runtime\20060814T170527_950129' LocalRunDirUniq: 'C:\demo\runtime\20060814T170527_950129\' resource: [1x1 struct] configuration: {'fort.8' 'fort.7'  'goin_GOLD'  'fort.14'  'fort.13'  'fort.12'} </ol>

The model should execute on the local machine and display the stdout in the Matlab command window.

Job Submission: Remote Globus System
The execution of the same model instance on a remote resource can be achieved by specifying a new resource data structure. The easiest way to create a new resource structure is to run the createResource function which enables a user to specify any supported resource that is available to them. Since users of the GENIE toolbox should have access to the UK National Grid Service we now demonstrate how to submit the above compute job to the Oxford compute node of the NGS.

To exploit a computational resource that provides a Globus Toolkit v2.4 interface (specifically GRAM and GridFTP) a user must instantiate a X.509 proxy certificate. This is achieved by invoking the gd_createproxy function.


 * <tt>>> gd_createproxy</tt>

The client will open a dialog window and request the password for your certificate.


 * [[image:gd_createproxy.png|gd_createproxy]]

Enter your password and press Create.


 * [[image:proxypassword.png|Enter password for your e-Science certificate]]

Upon successful creation of the proxy certificate click <tt>OK</tt>. Click <tt>Cancel</tt> on the dialog window once the proxy has been created and press a key in the paused Matlab session.

In order to use the compute node of the NGS you will need to know the location of your home directory on the head node. This can be found using the gc_findMyHomeDir function in the client:


 * <tt>>> myHomeDir = gc_findMyHomeDir('grid-compute.oesc.ox.ac.uk')</tt>

<ol> myHomeDir =

/home/ngs0000 </ol>


 * <tt>>> runtime.RuntimeArchive=fullfile('../demo/runtime','genie_ig_go_sl_runtime.tar.gz');</tt>
 * <tt>>> runtime.LocalRunDir='..\demo\runtime'</tt>

<ol> runtime =

RuntimeArchive: '..\demo\runtime\genie_ig_go_sl_runtime.tar.gz' LocalRunDir: '..\demo\runtime' </ol>