GENIE:GENIEToolboxFAQ

Andrew Price, University of Southampton ([mailto:a.r.price@soton.ac.uk a.r.price@soton.ac.uk])

GENIE Toolbox: Frequently Asked Questions

 * How do I convert a config file for use in the GENIE Toolbox?
 * How do I create a model archiving function for the GENIE Toolbox?
 * How do I use Condor via SSH?

How do I convert a config file for use in the GENIE Toolbox?
The GENIE Toolbox has been designed to manage GENIE on multiple heterogeneous platforms including Windows. A consequence of this is that genie_example.job cannot be used to configure and execute the simulations - it is too complicated to get a bash script to run reliably on remote Windows platforms. The GENIE Toolbox therefore performs many of the same activities defined in genie_example.job. There are issues with this duplication of functionality and efforts are under way to rigorously define the individual activities so that alternate tooling can be safely employed - but for now we need to explain how config files can be ported to the GENIE Toolbox environment.

To configure a model with the GENIE Toolbox we follow a similar config file mechanism as genie_example.job. The default parameter set is loaded into Matlab and then corrections/amendments are applied to specify the particular configuration of the model to run. A configuration function is written that has the same name as the corresponding config file:


 * genie_{model}.config --> genie_{model}_config.m

To convert a config file from the CVS/SVN repository to a GENIE Toolbox equivalent the following steps are necessary. We present an example of porting the genie_eb_go_gs_ml.config file from the source repository to the GENIE Toolbox equivalent genie_eb_go_gs_ml_config.m:

 Navigate to the GENIEToolbox/configs directory in your installtion of GENIELab</li> <li>Create a copy of the GENIE Toolbox config template with a name corresponding to the model instance. <ul> <li>$ cp genie_model_config.m.template genie_eb_go_gs_ml_config.m</tt></li> </ul> </li> <li>Edit the new config script and replace all instances of the string model</tt> with the model descriptor for this config file. <ul> <li>Search and replace: model</tt> --> eb_go_gs_ml</tt></li> </ul></li> <li>Remove any module metadata that is not required for the particular configuration of the model. For genie_eb_go_gs_ml we just need configuration information for the main, embm, goldstein, land and seaice modules. % Load the default parameters (specify files using relative paths) metadata.genie_main         = metadata_main('../..','.'); metadata.genie_embm         = metadata_embm('../..','.'); metadata.genie_goldstein    = metadata_goldstein('../..','.'); metadata.genie_land         = metadata_land('../..','.'); metadata.genie_seaice       = metadata_seaice('../..','.'); </li> <li>Copy the parameter override entries from the bash config file to the new Matlab config function in the marked section of the template. % Add the adapted entries from the corresponding config file here % =========================================================================

% ========================================================================= For the genie_eb_go_gs_ml_config.m example % Add the adapted entries from the corresponding config file here % =========================================================================
 * 1) C-GOLDSTEIN DEFAULT INTEGRATION OF 5000 YEARS WITH GENIE-land
 * 2) Fixed present-day vegetation

EXPID=genie_eb_go_gs_ml

ma_flag_ebatmos=.TRUE. ma_flag_goldsteinocean=.TRUE. ma_flag_goldsteinseaice=.TRUE. ma_flag_land=.TRUE. ma_flag_igcmatmos=.FALSE. ma_flag_fixedocean=.FALSE. ma_flag_fixedseaice=.FALSE. ma_flag_fixedicesheet=.FALSE. ma_flag_fixedchem=.FALSE.

GENIEDP=TRUE IGCMATMOSDP=TRUE
 * 1) DP flags are important for global water and energy
 * 2) conservation tests

GENIENXOPTS='$(DEFINE)GENIENX=36' GENIENYOPTS='$(DEFINE)GENIENY=36'

ma_write_flag_atm=.false. ma_write_flag_sic=.false.
 * 1) this is to only write ocean-grid data in genie-main

ma_dt_write=50000
 * 1) this is to write genie-main data every 50000 timesteps=100 years

ea_3=50000 ea_4=10000 ea_5=5000 ea_6=50000
 * 1) this is to control embm output periods
 * 2)   npstp='health check' from 1000=10 years to 50000=500 years
 * 3)   iwstp='restarts' from 50000=500 years to 10000=100 years
 * 4)   itstp='time series' from 100=1 year to 5000=50 years
 * 5)   ianav='an average' stays 50000=500 years

go_3=50000 go_4=10000 go_5=5000 go_6=50000
 * 1) this is to control goldstein output periods
 * 2)   npstp='health check' from 1000=10 years to 50000=500 years
 * 3)   iwstp='restarts' from 50000=500 years to 10000=100 years
 * 4)   itstp='time series' from 100=1 year to 5000=50 years
 * 5)   ianav='an average' stays 50000=500 years

gs_3=50000 gs_4=10000 gs_5=5000 gs_6=50000
 * 1) this is to control seaice output periods
 * 2)   npstp='health check' from 1000=10 years to 50000=500 years
 * 3)   iwstp='restarts' from 50000=500 years to 10000=100 years
 * 4)   itstp='time series' from 100=1 year to 5000=50 years
 * 5)   ianav='an average' stays 50000=500 years

ma_lgraphics=.false.
 * 1) this is to turn graphics off

ma_koverall_total=2500000
 * 1) this is to change the model run length
 * 2)   720=1 month of igcm (timestep=1 hour) to 2500000=5000 years of c-goldstein

ma_ksic_loop=5 ma_kocn_loop=5 ma_klnd_loop=5
 * 1) this changes the relative atmos/ocean/seaice calling frequency

ml_idiag_land=5000 ml_irest_land=50000 ml_iacc_land=5000 ml_c_restart=$CODEDIR/genie-land/data/input/land_rs_embm_36x36.nc ml_c_runoff_fl=$CODEDIR/genie-land/data/input/runoff_mask_gold36x36.nc % ========================================================================= </li> <li>Move the EXPID</tt> entry to the bottom of the script and replace the template entry</li> <li>Parse the new section of script <ul> <li>Change the comment lines back into comments by pre-prending these lines with a '%</tt>' % # C-GOLDSTEIN DEFAULT INTEGRATION OF 5000 YEARS WITH GENIE-land % # Fixed present-day vegetation </li> <li>Remove or comment out the compiler directives from the script as these are applied at build time and are not part of the GENIE Toolbox configuration % GENIEDP=TRUE % IGCMATMOSDP=TRUE % GENIENXOPTS='$(DEFINE)GENIENX=36' % GENIENYOPTS='$(DEFINE)GENIENY=36' </li> <li>For each parameter assignment operation assign each parameter to the appropriate field of the metadata</tt> structure metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_ebatmos=.TRUE. metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_goldsteinocean=.TRUE. metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_goldsteinseaice=.TRUE. </li> <li>Change the Fortran boolean constants .TRUE.</tt> and .FALSE.</tt> to Matlab strings 'true'</tt> and 'false'</tt> metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_ebatmos='true' metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_goldsteinocean='true' metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_goldsteinseaice='true' </li> <li>Surround any path definitions with single quotes as these need to be treated as strings in Matlab metadata.genie_land.Parameter.GENIELAND_CONTROL.c_restart='$CODEDIR/genie-land/data/input/land_rs_embm_36x36.nc' metadata.genie_land.Parameter.GENIELAND_CONTROL.c_runoff_fl='$CODEDIR/genie-land/data/input/runoff_mask_gold36x36.nc' </li> <li>Change any instances of $CODEDIR</tt> to the value of runtime_root</tt> that is provided in the first argument to the metadata_module</tt> function calls at the top of the script. For the default builds of GENIE this entry should be ../..</tt> as the code directory lies to directories up the hierarchy from the executable binary metadata.genie_land.Parameter.GENIELAND_CONTROL.c_restart='../../genie-land/data/input/land_rs_embm_36x36.nc' metadata.genie_land.Parameter.GENIELAND_CONTROL.c_runoff_fl='../../genie-land/data/input/runoff_mask_gold36x36.nc' </li> <li>Add a ';</tt>' line terminator to every assignment to avoid echoing output to screen metadata.genie_land.Parameter.GENIELAND_CONTROL.c_restart='../../genie-land/data/input/land_rs_embm_36x36.nc'; metadata.genie_land.Parameter.GENIELAND_CONTROL.c_runoff_fl='../../genie-land/data/input/runoff_mask_gold36x36.nc'; </li> </ul> </li> <li>If the configuration of the model involves modules that are initiated from <tt>goin_</tt> files rather than namelists then the ordering of the fields in the metadata structure is critically important. We strongly recommend passing the data for these module through the '<tt>goin_dictionary</tt>' to ensure correct ordering. Add the following lines as appropriate for any module of the goin variety dict=goin_dictionary; [members indices]=ismember(fieldnames(metadata.genie_embm.Parameter), dict); list=dict(sort(indices)); metadata.genie_embm.Parameter=orderfields(metadata.genie_embm.Parameter,list); </li> <li>The <tt>genie_eb_go_gs_ml_config.m</tt> looks like this function [metadata, EXPID] = genie_eb_go_gs_ml_config % Loads the configuration metadata for the genie_eb_go_gs_ml eb_go_gs_ml % % SYNTAX %   [metadata, EXPID] = genie_eb_go_gs_ml_config % % DESCRIPTION %   Loads the default GENIE metadata data to set up the genie_eb_go_gs_ml %   eb_go_gs_ml. This function is a direct conversion of the file %   configs/genie_eb_go_gs_ml.config % % See also genie_eb_go_gs_ml_archive
 * 1) GENIE-land config changes
 * 1) GENIE-land config changes

% Copyright 2007 GENIE Project, University of Southampton % Andrew Price, $Date: $ % $Revision: $ % GENIE Toolbox for Matlab

% Load the default parameters (specify files using relative paths) metadata.genie_main         = metadata_main('../..','.'); metadata.genie_embm         = metadata_embm('../..','.'); metadata.genie_goldstein    = metadata_goldstein('../..','.'); metadata.genie_land         = metadata_land('../..','.'); metadata.genie_seaice       = metadata_seaice('../..','.');

% Add the adapted entries from the corresponding config file here % ========================================================================= % # C-GOLDSTEIN DEFAULT INTEGRATION OF 5000 YEARS WITH GENIE-land % # Fixed present-day vegetation

metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_ebatmos='true'; metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_goldsteinocean='true'; metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_goldsteinseaice='true'; metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_land='true'; metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_igcmatmos='false'; metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_fixedocean='false'; metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_fixedseaice='false'; metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_fixedicesheet='false'; metadata.genie_main.Parameter.GENIE_CONTROL_NML.flag_fixedchem='false';

% # DP flags are important for global water and energy % # conservation tests % GENIEDP=TRUE % IGCMATMOSDP=TRUE % % GENIENXOPTS='$(DEFINE)GENIENX=36' % GENIENYOPTS='$(DEFINE)GENIENY=36'

% # this is to only write ocean-grid data in genie-main metadata.genie_main.Parameter.GENIE_CONTROL_NML.write_flag_atm='false'; metadata.genie_main.Parameter.GENIE_CONTROL_NML.write_flag_sic='false';

% # this is to write genie-main data every 50000 timesteps=100 years metadata.genie_main.Parameter.GENIE_CONTROL_NML.dt_write=50000;

% # this is to control embm output periods % #  npstp='health check' from 1000=10 years to 50000=500 years % #  iwstp='restarts' from 50000=500 years to 10000=100 years % #  itstp='time series' from 100=1 year to 5000=50 years % #  ianav='an average' stays 50000=500 years metadata.genie_embm.Parameter.npstp_embm=50000; metadata.genie_embm.Parameter.iwstp=10000; metadata.genie_embm.Parameter.itstp=5000; metadata.genie_embm.Parameter.ianav=50000;

% # this is to control goldstein output periods % #  npstp='health check' from 1000=10 years to 50000=500 years % #  iwstp='restarts' from 50000=500 years to 10000=100 years % #  itstp='time series' from 100=1 year to 5000=50 years % #  ianav='an average' stays 50000=500 years metadata.genie_goldstein.Parameter.npstp_goldstein=50000; metadata.genie_goldstein.Parameter.iwstp=10000; metadata.genie_goldstein.Parameter.itstp=5000; metadata.genie_goldstein.Parameter.ianav=50000;

% # this is to control seaice output periods % #  npstp='health check' from 1000=10 years to 50000=500 years % #  iwstp='restarts' from 50000=500 years to 10000=100 years % #  itstp='time series' from 100=1 year to 5000=50 years % #  ianav='an average' stays 50000=500 years metadata.genie_seaice.Parameter.npstp_seaice=50000; metadata.genie_seaice.Parameter.iwstp=10000; metadata.genie_seaice.Parameter.itstp=5000; metadata.genie_seaice.Parameter.ianav=50000;

% # this is to turn graphics off metadata.genie_main.Parameter.GENIE_CONTROL_NML.lgraphics='false';

% # this is to change the model run length % #  720=1 month of igcm (timestep=1 hour) to 2500000=5000 years of c-goldstein metadata.genie_main.Parameter.GENIE_CONTROL_NML.koverall_total=2500000;

% # this changes the relative atmos/ocean/seaice calling frequency metadata.genie_main.Parameter.GENIE_CONTROL_NML.ksic_loop=5; metadata.genie_main.Parameter.GENIE_CONTROL_NML.kocn_loop=5; metadata.genie_main.Parameter.GENIE_CONTROL_NML.klnd_loop=5;

% ######################################### % # GENIE-land config changes % ######################################### metadata.genie_land.Parameter.GENIELAND_CONTROL.idiag_land=5000; metadata.genie_land.Parameter.GENIELAND_CONTROL.irest_land=50000; metadata.genie_land.Parameter.GENIELAND_CONTROL.iacc_land=5000; metadata.genie_land.Parameter.GENIELAND_CONTROL.c_restart='../../genie-land/data/input/land_rs_embm_36x36.nc'; metadata.genie_land.Parameter.GENIELAND_CONTROL.c_runoff_fl='../../genie-land/data/input/runoff_mask_gold36x36.nc'; % =========================================================================

% Sort the fields in the structures according to the goin dictionary dict=goin_dictionary; [members indices]=ismember(fieldnames(metadata.genie_embm.Parameter), dict); list=dict(sort(indices)); metadata.genie_embm.Parameter=orderfields(metadata.genie_embm.Parameter,list); [members indices]=ismember(fieldnames(metadata.genie_goldstein.Parameter), dict); list=dict(sort(indices)); metadata.genie_goldstein.Parameter=orderfields(metadata.genie_goldstein.Parameter,list); [members indices]=ismember(fieldnames(metadata.genie_seaice.Parameter), dict); list=dict(sort(indices)); metadata.genie_seaice.Parameter=orderfields(metadata.genie_seaice.Parameter,list);

% Specify the experiment identifier EXPID = 'genie_eb_go_gs_ml';

</li> </ol>

How do I create a model archiving function for the GENIE Toolbox?
A template is provided to aid construction of a function to archive GENIE models and their runtime files. A genie_model_config.m function should exist before creating this archive function because we want to tie the EXPID parameter to the contents of the code archive that we create. To create an archival script: <ol> <li>Navigate to the GENIEToolbox/configs directory in your installtion of GENIELab</li> <li>Create a copy of the GENIE Toolbox archive template with a name corresponding to the model instance. <ul> <li><tt>$ cp genie_model_archive.m.template genie_eb_go_gs_ml_archive.m</tt></li> </ul> </li> <li>Edit the new archive script and replace all instances of the string <tt>model</tt> with the model descriptor for this archive file. <ul> <li>Search and replace: <tt>model</tt> --> <tt>eb_go_gs_ml</tt></li> </ul></li> <li>Check that the function obtains the <tt>EXPID</tt> value from the appropriate model configuration function % Get the experiment identifier from genie_eb_go_gs_ml.config [md, EXPID] = genie_eb_go_gs_ml_config; </li> <li>Update the <tt>FILELIST</tt> cell array with the locations of the files that need to be included in the archive. The files are specified relative to the CODEDIR which contains genie-main etc. The file definitions should be single quoted strings with the relative paths to the files. The <tt>*</tt> wildcard can be used to specify multiple files. Please consult module documentation to find out which data files are likely to be required. % Files to be included in the archive FILELIST = { ['genie_output/' EXPID '/genie.exe'], ...            'genie-main/inputdata/zjmap.dat', ... 'genie-main/data/input/main_restart_0.nc' 'genie-embm/data/input/tauy_u.interp', ...            'genie-embm/data/input/ta_ncep.silo', ...             'genie-embm/data/input/worbe2.k1', ...             'genie-embm/data/input/taux_v.interp', ...             'genie-embm/data/input/tauy_v.interp', ...             'genie-embm/data/input/uncep.silo', ...             'genie-embm/data/input/qa_ncep.silo', ...             'genie-embm/data/input/taux_u.interp', ...             'genie-embm/data/input/vncep.silo', ...             'genie-goldstein/data/input/tauy_u.interp', ...             'genie-goldstein/data/input/worbe2.k1', ...             'genie-goldstein/data/input/taux_v.interp', ...             'genie-goldstein/data/input/saliann.silo', ...             'genie-goldstein/data/input/worbe2.psiles', ...             'genie-goldstein/data/input/tauy_v.interp', ...             'genie-goldstein/data/input/worbe2.paths', ...             'genie-goldstein/data/input/taux_u.interp', ...             'genie-goldstein/data/input/tempann.silo', ...             'genie-land/data/input/land_rs_embm_36x36.nc', ...             'genie-land/data/input/runoff_mask_gold36x36.nc', ... 'genie-seaice/data/input/worbe2.k1' }; </li> <li>If you are unsure of what files to archive a reasonable fall back position might be to include all content from the appropriate module data/input directories % Files to be included in the archive FILELIST = { ['genie_output/' EXPID '/genie.exe'], ...            'genie-main/inputdata/*', ...             'genie-main/data/input/*', ...             'genie-embm/data/input/*', ...             'genie-goldstein/data/input/*', ...             'genie-land/data/input/*', ... 'genie-seaice/data/input/*' }; </li> </ol>

How do I use Condor via SSH?
In the event that it is not possible or not desirable to install and run the GENIELab software on a valid submission node of your Condor pool then a SSH tunnel can be used to submit jobs from GENIELab to a remote Condor submit node. This functionality is provided for convenience - we would recommend Condor-G as a preferable long-term strategy.

Pre-requisites

 * It is assumed that you can access an account on the Condor submission node using SSH. If the Condor submit node is a Windows box you might want to look at WinSSHD from Bitvise: http://www.bitvise.com/
 * If the GENIELab software is installed on a Windows system then the software will expect to find a CygWin distribution with the OpenSSH software installed
 * Check that the CygWin executables ssh.exe, scp.exe and cygpath.exe are available in the <tt>cygwin\bin</tt> directory
 * If not, then OpenSSH can be installed using the CygWin installer and selecting <tt>OpenSSH</tt> from the <tt>Net</tt> category

Configuring SSH credentials
The GENIE Toolbox has been designed with the assumption that the user is authenticated on remote systems by their X.509 eScience certificate. Access to a Condor submit node via SSH relies on the authentication mechanisms of SSH and as such will by default prompt for the users password. As job submission is a non-interactive activity in the GENIE Toolbox we need to ensure that the client presents suitable credentials automatically. To achieve this we need the user to create a cryptographic key pair on the GENIELab client side and install the public key in the SSH configuration of their account on the Condor submission node. If installed correctly, the SSH tools on the GENIELab client side will be authenticated automatically using the credentials in the user's account on the submit node.

<ol> <li>On the host where the GENIELab softrware is installed open a linux or CygWin shell</li> <li>Create a public/private rsa key pair</li> <ul> <tt>$ ssh-keygen -t rsa</tt> Generating public/private rsa key pair. Enter file in which to save the key (/home/user/.ssh/id_rsa): </ul> <li>Accept the default location by pressing <tt>Enter</tt></li> <ul> Enter passphrase (empty for no passphrase): </ul> <li>Accept the empty passphrase as we do not want to be prompted for a password for every SSH interaction. Press <tt>Enter</tt></li> <ul> Enter same passphrase again: </ul> <li>And press <tt>Enter</tt> again to confirm the empty passphrase. This will save your new private/public key pair in the <tt>.ssh</tt> directory of your home folder</li> <ul> Your identification has been saved in /home/user/.ssh/id_rsa. Your public key has been saved in /home/user/.ssh/id_rsa.pub. The key fingerprint is: 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:0f user@hostname </ul> <li>We now need to copy the public key to the user's account on the Condor submit node</li> <ul> <tt>$ scp ~/.ssh/id_rsa.pub user@condorhost.domain:/home/user/.ssh/mykey.pub</tt> </ul> <li>If this is the first time you have connected to the host you will need to store its public key in the list of known hosts</li> <ul> The authenticity of host 'condorhost.domain (192.168.0.1)' can't be established. RSA key fingerprint is f0:f1:f2:f3:f4:f5:f6:f7:f8:f9:fa:fb:fc:fd:fe:ff. Are you sure you want to continue connecting (yes/no)? </ul> <li>Type <tt>yes</tt> to accept the servers public key</li> <ul> Warning: Permanently added 'condorhost.domain,192.168.0.1' (RSA) to the list of known hosts. user@condorhost.domain's password: </ul> <li>Enter the password for the user account on the Condor submit node</li> <ul> id_rsa.pub                                                                                               100%  400     0.4KB/s   00:00 </ul> <li>It is now easiest to log directly into the user account on the Condor submit node in order to install the GENIELab client public key. On the Condor submit node we need to add the contents of the public key file to the <tt>authorised_keys</tt> file. E.g. On a Linux/UNIX machine</li> <ul> <tt>$ cd ~/.ssh</tt> <tt>$ cat authorized_keys mykey.pub > temp</tt> <tt>$ mv temp authorized_keys</tt> </ul> <li>It should now be possible to use SSH to access the condor submission node without being prompted for a password. On the GENIELab host check that you can invoke a command using SSH</li> <ul> <tt>$ ssh user@condorhost.domain /bin/date</tt> Thu Jun 28 17:44:51 BST 2007 </ul> </li>If you are prompted for a password then the configuration is not correct. Double check that the <tt>authorized_keys</tt> file is spelt with a '<tt>z</tt>' - this has caught me out before</li> </ol>