GENIE:GENIEToolboxExamples

Andrew Price, University of Southampton, ([mailto:a.r.price@soton.ac.uk a.r.price@soton.ac.uk])

Parametric study
To evaluate and manage a number of concurrent GENIE simulations it is straight-forward to script a simple study. A Matlab script can be written that performs a set of job submissions, polls the job handles and retrieves the output upon completion.

By default, compute jobs managed by the GENIE Toolbox are configured in uniquely named directories. This is fine for single jobs or if the output is archived in the database with accomanying metadata. The data can be identified easily in these circumstances. If a user simply wishes to evaluate an ensemble of runs and process the data on the local system, it is useful to be able to label the directories of each run directory with a meaningful identifier.

LocalRunDirUniq
The field LocalRunDirUniq can be added to the runtime data structure to define a specific directory in which a compute job is managed. If an ensemble of runs is to be executed and processed locally this is a useful field to specify. For example, without the LocalRunDirUniq field a compute job is prepeared in a unique directory with the directory specified by LocalRunDir:

.  ..  20070614T142500_445096

By specifying the LocalRunDirUniq field of runtime you can control the directory in which the job is created and where the output data will be returned to:

.  ..  MyEnsembleMember

Scripted ensemble study
A simple script to perform an ensemble of model runs would need to perform the following actions


 * Load the model configuration
 * Specify the model runtime
 * Load the resource description metadata
 * Define the parameter range over which the model will be evaluated
 * Submit the compute jobs that make up the ensemble
 * The job handles and the accompanying retrieval data structures are key pieces of information required to obtain the results once the jobs complete. It is worth saving this information to disk if the jobs are likely to take some time.
 * Poll the job handles until all simulations are complete
 * Retrieve the output for each job


 * The simulation output should now reside in labelled directories within the folder specified in the runtime.LocalRunDir field in the script.


 * If you were to look at the directory contents at this point you would find:


 * We will quickly plot the maximum atlantic overturning circulation (MOC) at the end of each of these simulations. This information is obtained from the ascii .opsit file of the goldstein ocean output.


 * A full example script can be viewed here. The output of this particular study is a graph of maximum Atlantic overturning circulation strength as a function of the EMBM fresh water flux scaling factor.


 * [[image:SclFWFSweep.png|Example parametric study of genie_eb_go_gs]]

Tuning studies
Climate models rely heavily on parameterisations of physical processes that occur on comparatively small time and spatial scales. A key concern in climate modelling is therefore to find appropriate values for these parameters so that a reasonable climatology is simulated. The process of investigating the model parameter space and finding optimal points within that space is referred to as tuning. However, as with many design problems, the nonlinear response of a model to its parameters and the often conflicting tuning objectives make this a difficult problem to solve.

The general problem of optimising a set of model parameters in order to improve a number of possibly conflicting design objectives is typically approached in one of two ways. One can create a single objective measure of design quality by computing a weighted sum of the individual objectives and seek to find the set of variables that minimise or maximise this measure. Many sophisticated algorithms can be applied to a single objective problem but the weighting factors can be critical in the performance of the optimisation. Alternatively, multi-objective methods can be employed to seek a Pareto set of non-dominated solutions; designs that are superior when all objective measures are considered but that may be inferior when a subset of those objectives is considered. Such a solution set can inform the user of competition in the design goals and allows domain expertise to be applied to select the most appropriate parameter sets for further study.

We present below examples of GENIE tuning studies that can be performed using the GENIE Toolbox in conjunction with optimisation tools available from the Matlab Optimisation Toolbox and from the OptionsMatlab and OptionsNSGA2 packages that ship with GENIELab. More detailed documentation, examples and tuturials about these optimisation packages are available:


 * Matlab Optimisation Toolbox
 * OptionsMatlab

All of these tools rely on the user wrapping their GENIE model as a tuneable function in the Matlab environment.

Wrapping GENIE as a tuneable function
The optimisation tools that are available in GENIELab can be applied in many science and engineering problem domains and essentially treat the underlying problem as a black box. From the optimiser's point of view the problem is presented as an objective function and (optionally) a constraint function. These functions accept as input a vector of parameter values and calculate one or more objective measures of design quality and (optionally) evaluate one or more problem constraints. For GENIE models the objective function calculation typically involves instantiating a model with the provided parameter values, performing a simulation until an equilibrium state is achieved and then evaluating an RMS error of model fields compared to observational data. This section describes how to write such a wrapping for your model.

The simplest tuneable function will accept a vector of parameters as input, execute a GENIE model using those parameters, process the results and return a single objective function value. To create such a function one would normally edit a new script

We follow some naming conventions from OptionsMatlab. The function is declared to accept an input vector VARS</tt> and return the objective function value in EVAL</tt>.

As with the ensemble example above we need to define the three data structures that specify how to manage a GENIE simulation. The function will need to load a resource description describing the platform to target for the model run.

The local runtime environment is then configured.

The default configuration metadata for the model is loaded. A dedicated config function for the tuning exercise may have been written that specifies how the simulation should be performed to obtain the error function. E.g. The total number of timesteps and frequency of output might be defined in a separate config file.

Once the default metadata has been loaded it is necessary to override those defaults with the parameter values that have been provided as input to the function in the vector >VARS</tt>. Each tuneable parameter is assigned to the appropriate field of the configuration metadata structure.

Once the unique instance of the model has been defined the compute job is submitted for evaluation.

The function must then poll the job until the results are available.

% Poll the compute job

Once the job is complete it just remains to process the results and return an objective function value. For the purposes of this example we assume that a post-processing binary has been configured to execute after genie.exe</tt> which writes a single objecitve function value to the file results.dat</tt>.

Note that if the job fails we need to return a value to the optimiser. In this case we return a large number that is much greater than any value we are likely to get from the real objective function calculation. In OptionsMatlab it is possible to define OBJ_BAD_PT</tt>; any objective function evaluations above this value are automatically disregarded as bad points and do not subsequently impact the optimisation.

With a Matlab function defined in this way it is possible to exploit tools in the Matlab Optimisation toolbox to search for the input vector VARS</tt> that minimises (or maximises) the objective function value EVAL</tt>. Please see Matlab Optimisation Toolbox: fminsearch for an example of using fminsearch on a 2D example problem. The objective function calculation is an expensive computational task requiring at least 30 minutes of CPU time per evaluation. The optimisation algorithms available in the Matlab Optimisation Toolbox typically require large numbers of sequential evaluations (especially in higher dimensional problems) of the objective function and can prove prohibitively expensive for the study of GENIE problems. We therefore turn to the OptionsMatlab package which provides a suite of design search and optimisation algorithms and many of which are designed to perform multiple objective function evaluations in parallel. To benefit from the OptionsMatlab package there are two changes required to the scripting of the objective function:


 * Many of the algorithms in the Options package perform optimally if the input parameters are mapped to a common range. I.e. Options performs better if all the input parameters to the problem are defined on the range [0..1] or [-1..1] or similar.
 * In order to evaluate multiple objective funciton calculations concurrently the calculation must be split into a submission script and a polling/processing script. This enables Options to submit a batch of jobs and then monitor the array of handles that are returned.

Parameter Mapping
Many of the algorithms in the Options package perform sub-optimally if the input parameters are defined over ranges that are separated by many orders of magnitude. It is recommended that problems presented to the OptionsMatlab optimiser are specified on a common range for all input variables. To help map between Options variables defined on a common range and GENIE input parameters that specify values for physical properties we provide a mapping definition using op_createParameterMapping</tt>.

For a particular problem we recommend that a user creates a copy of the op_createParameterMapping</tt> script and edits the fields as appropriate for their problem. The script creates a simple data stucture specifies the bounds on the variables in the Options domain, the equivalent bounds on the variables in the problem (GENIE) domain and the method of mapping values in the intermediate ranges from one domain to the other. The variable bounds in the Options domain are specified in the fields options_LVARS</tt> and options_UVARS</tt>.

In this example we specify that Options will operate in the range [ -1.0, 1.0 ]</tt> for all variables. The bounds in the problem domain are simlarly defined in the fields problem_LVARS</tt> and problem_UVARS</tt>.

The mapping functions then assume that these bounds represent the same points in parameter space. For values in the ranges between the lower and upper bounds a mapping from one domain to the other is specified in the field scaling</tt>.

Two modes of mapping are supported at present; <tt>linear</tt> and <tt>log10</tt>. These specify how values are mapped from the problem domain to the Options domain. If we specify a variable value in the problem domain as $$x_p$$ and in the Options domain as $$x_o$$ with the superscripts $$L$$ and $$U$$ specifying respecitve lower and upper bounds then the mappings are defined:

'<tt>linear</tt>'  $$\frac{x_o-x_o^L} {x_o^U-x_o^L} = \frac{x_p-x_p^L} {x_p^U-x_p^L}\,\!$$ '<tt>log10</tt>'   $$\frac{x_o-x_o^L} {x_o^U-x_o^L} = \frac{\log_{10} x_p - \log_{10}x_p^L} {\log_{10} x_p^U - \log_{10} x_p^L}\,\!$$

Utility functions are provided to map parameter vectors from the Options domain to the problem domain and vice versa; op_OPTIONS2GENIE, op_GENIE2OPTIONS. These are wrappings of the function that actually performs the mapping op_ParameterMapping. In the Matlab workspace the default parameter mapping can be loaded

A point in the parameter space can be mapped between the domains by providing the parameter vector in one domain and <tt>map</tt> structure. E.g. To map from the Options domain to the GENIE domain:

The mid point in the parameter space is returned in the GENIE domain. Note that the mid-point in the <tt>log10</tt> scaled variable is 1.e3 which is midway between 1.e2 and 1.e4 in log10 space:

Similarly, to map from the GENIE domain to the Options domain:

The mid point in the parameter space is returned in the Options domain:

Wrapping GENIE for concurrent evaluation
The objective function definition above provides the means to evaluate a single point in parameter space. The function is passed a vector of parameter values and an objective function value is returned upon completion of a GENIE simulation. Invoking the function causes the workspace to be tied up polling the status of the GENIE compute job until the simulation finishes. This is fine for iterative direct search methods but there are many optimisation algorithms that can exploit concurrent evaluations of the objective function. In order to perform concurrent evaluation we need to split the objective function calculation into a job submission activity, a polling activity and a post-processing activity. This allows the optimiser to submit multiple concurrent jobs, monitor those jobs and then recover the results once available.

The OptionsMatlab package provides many optimisation algorithms that benefit from concurrent evaluation of expensive objective function calculations. The OptionsMatlab documentation discusses how a user should write their own objective and constraint functions and how these can be performed in parallel:


 * How do I write my own objective and constraint functions?
 * Can OptionsMatlab calculate function evaluations in parallel?

The following examples demonstrate how to write an objective function for concurrent evaluation using the OptionsMatlab job control script optjobparallel2.m.

The function must conform to the signature expected by <tt>optjobparallel2</tt>. The argument <tt>PARAMS</tt> is optional and can be used to provide constant values that are used throughout an optimisation. The <tt>U_CONS</tt> and <tt>L_CONS</tt> arguments are provided for constrained optimisation problems if the objective function calculation requires this information. For the GENIE problems presented here these fields are not used. The final argument <tt>USERDATA</tt> provides the means for data specific to a user's problem to be fed through to the objective function evaluation. For GENIE problems we pass the parameter mapping data structure into the objective function using this final argument.

This example present a wrapping of the <tt>genie_eb_go_gs_itfclsd_08l_optjobparallel2</tt> model for management using <tt>optjobparallel2</tt>. The return from this submission function is a data structure that will be passed to the accompanying parsing function <tt>genie_eb_go_gs_itfclsd_08l_optjobparallel2_parse2</tt> to monitor and post-process the compute job.

This example function performs some simple checks on the inputs to catch errors

The first action of the script is to map the input parameters passed in from the optimiser into the GENIE domain so that the model can be configured correctly. The parameter mapping <tt>map</tt> structure is passed into the script in the field <tt>USERDATA.GENIE.map</tt>. If a <tt>map</tt> does not exist then the input parameters are just passed through unchanged to configure the model.

The function then performs specifies the three key data structures for job submission. A <tt>resource</tt> definition is loaded and configures as appropriate

The local <tt>runtime</tt> environment is specified with details of the location of the model binary and a local directory in which jobs can be managed

The default model configuration metadata is loaded

The specific tuning problem is defined by assigning the parameter values from the optimiser to the appropriate fields of the configuration metadata. In this case we define a 2D objective function where the two parameters of the <tt>VARS</tt> vector are assigned to the Atmospheric temperature diffusion amplitude and the CO2 concentration scaling factor. It is important that the parameter mapping and the OptionsMatlab input data structure all agree about the meaning of the values in the <tt>VARS</tt> vector

Once the model is configured for the particular point in parameter space to be evluated the compute job is submitted with a call to <tt>gc_jobsubmit</tt>

The function returns a data structure <tt>retrieveStruct</tt> containing the job handle and job retrieval data structure. You can return whatever information you like in this data structure as it will be passed through to the parsing function directly. You just need to ensure that the information is sufficient to poll and post-process the job

A second function is then required to poll and process the compute job. The optjobparallel2.m OptionsMatlab job management script expects the processing script to have the same name as the submission function but appended with _parse2. For this example we therefore need to write a function named genie_eb_go_gs_itfclsd_08l_optjobparallel2_parse2.

The return value <tt>EVAL</tt> of the parsing script is initialised as an empty vector. <tt>optjobparallel2</tt> operates by regularly invoking the <tt>parse2</tt> script and will assume the job is still running if it receives an empty value, returning to the job until a numerical value is returned

The compute job is polled from the information passed through from the submission script. For this particular problem the results are returned in the file <tt>errfn_ea_go_gs.err</tt> and the status script therefore checks for the existance of this file for successful completion of the compute job

If the job successfully completed we retrieve and load the result from the post-processing in <tt>errfn_ea_go_gs.err</tt> that contains the objective function value. In this case the single composite objective function value is output as the fifth value in this file

In the event of a problem in processing the try, catch blocks are used to ensure the optimisation is not interrupted. A bad value is returned in the event of an error so that the calculation as a whole is not terminated. The value of 10001 is returned in the event of a problem and we configure the optimiser to set the <tt>OBJ_BAD_PT</tt> to a value below this, e.g. 10000. After processing, the local and remote directories that managed the execution of the model are removed. The function gc_jobremove is used to clean up job directories on remote compute resources.

If the job handle is detected as failed the function returns a bad point in a similar fashion to a processing failure

If the script did not find the job handle to have a status value of 3 or 4 then the script returns with an empty value and the optimiser will assume the job is still running.

Wrapping optjobparallel2 functions as a single tuneable function
It can prove useful to wrap the submission and processing scripts to create a single objective function calculation as discussed above. The function signature accepts as input a vector of parameter values <tt>VARS</tt> from the optimiser and returns, after simulation, an objective function evlaution <tt>EVAL</tt>.

Error checking is always a good idea

Load the parameter map

Invoke the job submission function

And then simply sit in a loop invoking the processing script until a result is obtained

OptionsMatlab: Design of Experiment
We present a few simple examples of how to use OptionsMatlab for the study and tuning of GENIE-1. As a first step we will discuss the study of GENIE-1 in a small 2D sub-space of its parameters. We use the objective function defined above where the optimiser provides a vector of two variables that are assigned to the <tt>AtmDiffAmpT</tt> and <tt>SclCO2</tt> fields of the model configuration:

The optimisation problem is presented to Options as a data structure that describes the tuneable variables, the objective function, optjob routine and optimisation method. Please see the OptionsMatlab documentation for full details.

The parameter <tt>map</tt> for this problem provides a linear mapping in the two dimensions between the Options and GENIE domains

OptionsMatlab provides Design of Experiment routines that can be used to efficiently sample points across a multi-dimensional parameter space. In this case we will provide a set of user-defined points across a full-factorial of the 2D design space. To instruct OptionsMatlab to evaluate a set of user-defined points the <tt>input</tt> structure should define <tt>OMETHD=2.8</tt> with <tt>MC_TYPE=7</tt>

We specify the points at which to evaluate the objective function in the <tt>DOE_TRACE</tt> structure. The field <tt>DOE_TRACE.NCALLS</tt> specifies the number of points to be evaluated in the DoE. The field <tt>NITERS</tt> specifies the total number of function calls that OptionsMatlab should expect to evaluate - this is set to <tt>DOE_TRACE.NCALLS+1</tt> because one extra check evaluation is performed by Options after the user-supplied points are evaluated.

The <tt>optjobparallel2</tt> job management script will submit jobs in batches of maximum size specified by the <tt>MAXJOBS</tt> field. This value should be tuned to the target resource that you specify in the objective function submission script. In this case we set the value high so that all 121 jobs are submitted in one go.

To evaluate the DoE the OptionsMatlab function is invoked with the <tt>input</tt> structure.

Once the evaluations are performed we record the data for posterity

In the 2DParamSweep example directory the data generated by the OptionsMatlab invocation can be loaded and plotted


 * [[image:2DDoE.png|2D Design of Experiment]]

Matlab Optimisation Toolbox: fminsearch
The Matlab optimisation toolbox function fminsearch will perform a multidimensional unconstrained nonlinear minimization (Nelder-Mead) search of a target function. To apply this algorithm to the objective function described above simply select a starting point:

Configure the optimiser's options as desired. In this case we specify a maximum number of evaluations to be performed on the expensive underlying code:

It simply remains to invoke fminsearch by passing in the handle to the function <tt>@genie_eb_go_gs_itfclsd_08l_optjobparallel2_fcn</tt> (note the <tt>@</tt> sign), the starting point and the optimiser <tt>options</tt>.

The Matlab optimisation toolbox function <tt>fminsearch</tt> will perform a Nelder-Meade search on the problem starting from the point [0.5, 0.5], performing a maximum of 50 function evaluations. The objective function has been modified in order to record the evaluation history in the file <tt>evalhist.mat</tt> as the optimiser progresses (Matlab does not provide the search history by default). The progress of the optimisation can be plotted on the contour plot of the factorial design data (with a blue circle indicating the start point, black dots for the function evaluations and a red circle for the end point):


 * [[image:2DDoE_fminsearch.png|fminsearch results over 2D problem]]

OptionsMatlab: Response Surface Modelling

 * [[image:2DDoE_KrigSearch.png|Optimum point from search of Krig RSM]]

4D Optimisation

 * [[image:4DParamSweep.png|4D Optimisation Tile Plot]]


 * [[image:4DParamSweepSubPlot.png|2D Tile]]