GENIE:GENIEToolboxExamples

Andrew Price, University of Southampton, ([mailto:a.r.price@soton.ac.uk a.r.price@soton.ac.uk])

Parametric study
To evaluate and manage a number of concurrent GENIE simulations it is straight-forward to script a simple study. A Matlab script can be written that performs a set of job submissions, polls the job handles and retrieves the output upon completion.

By default, compute jobs managed by the GENIE Toolbox are configured in uniquely named directories. This is fine for single jobs or if the output is archived in the database with accomanying metadata. The data can be identified easily in these circumstances. If a user simply wishes to evaluate an ensemble of runs and process the data on the local system, it is useful to be able to label the directories of each run directory with a meaningful identifier.

LocalRunDirUniq
The field LocalRunDirUniq can be added to the runtime data structure to define a specific directory in which a compute job is managed. If an ensemble of runs is to be executed and processed locally this is a useful field to specify. For example, without the LocalRunDirUniq field a compute job is prepeared in a unique directory with the directory specified by LocalRunDir:


 * >> runtime.RuntimeArchive = './genie_eb_go_gs_archive.tar.gz';
 * >> runtime.LocalRunDir   = './output/';
 * ... 
 * >> [handle, retrieve] = gc_jobsubmit(metadata, runtime, resource)
 * ... </tt>
 * >> dir(runtime.LocalRunDir)</tt>

<ol> .  ..  20070614T142500_445096 </ol>

By specifying the LocalRunDirUniq</tt> field of runtime</tt> you can control the directory in which the job is created and where the output data will be returned to:


 * >> runtime.RuntimeArchive = './genie_eb_go_gs_archive.tar.gz';</tt>
 * >> runtime.LocalRunDir   = './output/';</tt>
 * >> runtime.LocalRunDirUinq = './output/MyEnsembleMember';</tt>
 * ... </tt>
 * >> [handle, retrieve] = gc_jobsubmit(metadata, runtime, resource)</tt>
 * ... </tt>
 * >> dir(runtime.LocalRunDir)</tt>

<ol> .  ..  MyEnsembleMember </ol>

Scripted ensemble study
A simple script to perform an ensemble of model runs would need to perform the following actions

<ol> <li>Load the model configuration</li> [configuration, runtime.EXPID] = genie_eb_go_gs_config; <li>Specify the model runtime</li> runtime.RuntimeArchive = '/path/to/genie_eb_go_gs_archive.tar.gz'; runtime.LocalRunDir   = '/path/to/output'; <li>Load the resource description metadata</li> resource = createResource('NGSOxford'); <li>Define the parameter range over which the model will be evaluated</li> NJobs = 21; SclFWF = linspace(0.0, 2.0, NJobs); <li>Submit the compute jobs that make up the ensemble</li> for index=1:NJobs

% Specify a unique numbered directory for the model run runtime.LocalRunDirUniq = ['/path/to/output/', num2str(index,'%02d')];

% Update the SclFWF parameter configuration.genie_embm.Parameter.SclFWF=SclFWF(index);

% Submit the compute job [handle, retrieve] = gc_jobsubmit(configuration, runtime, resource);

end <li>The job handles and the accompanying retrieval data structures are key pieces of information required to obtain the results once the jobs complete. It is worth saving this information to disk if the jobs are likely to take some time.</li> <li>Poll the job handles until all simulations are complete</li> % Vector recording run statuses running=ones(1,NJobs);

while sum(running)

% Loop through the jobs for index=1:NJobs

% Poll the compute job status = gc_jobstatus(JobDetail{index}.handle);

% If complete, record the change in status if status >= 3 running(index) = 0; end

end

% Wait a short time before polling again pause(30)

end <li>Retrieve the output for each job</li> for index=1:NJobs

% Retrieve the output [success, resultsFiles] = gc_jobretrieve(JobDetail{index}.retrieve);

end

<li>The simulation output should now reside in labelled directories within the folder specified in the runtime.LocalRunDir</tt> field in the script.</li>

<ul> <li>If you were to look at the directory contents at this point you would find:</li> >> dir(runtime.LocalRunDir)

.  ..  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15  16  17  18  19  20  21 </ul>

<li>We will quickly plot the maximum atlantic overturning circulation (MOC) at the end of each of these simulations. This information is obtained from the ascii .opsit</tt> file of the goldstein ocean output.</li>

for index=1:NJobs load(['./output/' num2str(index,'%02d') '/spn.opsit']); moc(index)=spn(end,5); end plot(SclFWF, moc, 'o-') xlabel('FWF Scaling factor') ylabel('Atlantic MOC (Sv)') title('Example parametric study of genie\_eb\_go\_gs')

<li>A full example script can be viewed here. The output of this particular study is a graph of maximum Atlantic overturning circulation strength as a function of the EMBM fresh water flux scaling factor.</li>



</ol>

Tuning studies
Climate models rely heavily on parameterisations of physical processes that occur on comparatively small time and spatial scales. A key concern in climate modelling is therefore to find appropriate values for these parameters so that a reasonable climatology is simulated. The process of investigating the model parameter space and finding optimal points within that space is referred to as tuning. However, as with many design problems, the nonlinear response of a model to its parameters and the often conflicting tuning objectives make this a difficult problem to solve.

The general problem of optimising a set of model parameters in order to improve a number of possibly conflicting design objectives is typically approached in one of two ways. One can create a single objective measure of design quality by computing a weighted sum of the individual objectives and seek to find the set of variables that minimise or maximise this measure. Many sophisticated algorithms can be applied to a single objective problem but the weighting factors can be critical in the performance of the optimisation. Alternatively, multi-objective methods can be employed to seek a Pareto set of non-dominated solutions; designs that are superior when all objective measures are considered but that may be inferior when a subset of those objectives is considered. Such a solution set can inform the user of competition in the design goals and allows domain expertise to be applied to select the most appropriate parameter sets for further study.

We present below examples of GENIE tuning studies that can be performed using the GENIE Toolbox in conjunction with optimisation tools available from the Matlab Optimisation Toolbox and from the OptionsMatlab and OptionsNSGA2 packages that ship with GENIELab. More detailed documentation, examples and tuturials about these optimisation packages are available:


 * Matlab Optimisation Toolbox
 * OptionsMatlab

All of these tools rely on the user wrapping their GENIE model as a tuneable function in the Matlab environment.

Wrapping GENIE as a tuneable function
The optimisation tools that are available in GENIELab can be applied in many science and engineering problem domains and essentially treat the underlying problem as a black box. From the optimiser's point of view the problem is presented as an objective function and (optionally) a constraint function. These functions accept as input a vector of parameter values and calculate one or more objective measures of design quality and (optionally) evaluate one or more problem constraints. For GENIE models the objective function calculation typically involves instantiating a model with the provided parameter values, performing a simulation until an equilibrium state is achieved and then evaluating an RMS error of model fields compared to observational data. This section describes how to write such a wrapping for your model.

The simplest tuneable function will accept a vector of parameters as input, execute a GENIE model using those parameters, process the results and return a single objective function value. To create such a function one would normally edit a new script


 * >> edit op_genie_eg_go_gs_objfun

We follow some naming conventions from OptionsMatlab. The function is declared to accept an input vector VARS</tt> and return the objective function value in EVAL</tt>.



As with the ensemble example above we need to define the three data structures that specify how to manage a GENIE simulation. The function will need to load a resource description describing the platform to target for the model run.



The local runtime environment is then configured.



The default configuration metadata for the model is loaded. A dedicated config function for the tuning exercise may have been written that specifies how the simulation should be performed to obtain the error function. E.g. The total number of timesteps and frequency of output might be defined in a separate config file.



Once the default metadata has been loaded it is necessary to override those defaults with the parameter values that have been provided as input to the function in the vector <tt>>VARS</tt>. Each tuneable parameter is assigned to the appropriate field of the configuration metadata structure.



Once the unique instance of the model has been defined the compute job is submitted for evaluation.



The function must then poll the job until the results are available.


 * % Poll the compute job

Once the job is complete it just remains to process the results and return an objective function value. For the purposes of this example we assume that a post-processing binary has been configured to execute after <tt>genie.exe</tt> which writes a single objecitve function value to the file <tt>results.dat</tt>.



Note that if the job fails we need to return a value to the optimiser. In this case we return a large number that is much greater than any value we are likely to get from the real objective function calculation. In OptionsMatlab it is possible to define <tt>OBJ_BAD_PT</tt>; any objective function evaluations above this value are automatically disregarded as bad points and do not subsequently impact the optimisation.

Wrapping GENIE for concurrent evaluation
The objective function definition above provides the means to evaluate a single point in parameter space. The function is passed a vector of parameter values and an objective function value is returned upon completion of a GENIE simulation. Invoking the function causes the workspace to be tied up polling the status of the GENIE compute job until the simulation finishes. This is fine for iterative direct search methods but there are many optimisation algorithms that can exploit concurrent evaluations of the objective function. In order to perform concurrent evaluation we need to split the objective function calculation into a job submission activity, a polling activity and a post-processing activity. This allows the optimiser to submit multiple concurrent jobs, monitor those jobs and then recover the results once available.

The OptionsMatlab package provides many optimisation algorithms that benefit from concurrent evaluation of expensive objective function calculations. The OptionsMatlab documentation discusses how a user should write their own objective and constraint functions and how these can be performed in parallel:


 * How do I write my own objective and constraint functions?
 * Can OptionsMatlab calculate function evaluations in parallel?

The following examples demonstrate how to write an objective function for concurrent evaluation using the OptionsMatlab job control script optjobparallel2.m.