GENIE BiogemTutorial

Andy Ridgwell (andy@seao2.org)

Exercise 1. Climate-only
Expected learning outcome stuff:
 * 1) 	Set up and running genie_eb_go_gs_ac_bg
 * 2) 	Configure data saving (time-slice and time-series operation)
 * 3) 	Explore basic biogem data output (about the physical climate system)

Instructions

 * 1) 	First, download the configuration set biogem_exp_1.tar.gz from:
 * 2) [http://www.seao2.org/genie_workshop/biogem_exp_1.tar.gz]
 * 3) to $HOME/genie_input
 * 4) and unpack ($ tar zxf biogem_exp_1.tar.gz)
 * 5) 	Using a text editor, explore the contents of the tracer selection files: gem_config_atm.par, gem_config_ocn.par, gem_config_sed.par
 * 6) Note that the only tracers selected are temperature and salinity (humidity in the case of the atmospheric tracers).
 * 7) Also note that in biogem_config.par, no biological option is selected (or rather, the ‘null’ option NONE is selected, with the corresponding biological configuration file biogem_bio_NONE_config.par containing no parameter information.
 * 8) Get the necessary genie config file genie_eb_go_gs_ac_bg_itfclsd_08l.config and put it in genie-main/configs/
 * 9)       Get the rungenie script. (Just in case that one doesn't work, here's a backup:rungenie). Put it at the same level as your genie and genie_input directories (i.e. probably in your home directory). Make it executable:
 * 10) chmod u+x rungenie
 * 11) 	Run the model for 1 year (refer to the [rungenie.README] file for the description of the options passed to the rungenie configuration script), i.e.:
 * 12) $ ./rungenie genie_eb_go_gs_ac_bg_itfclsd_08l n 0 1 biogem_exp_1
 * 13) 	Now browse the results directory: genie_output/genie_eb_go_gs_ac_bg_itfclsd_08l.biogem_exp_1/biogem
 * 14) The cupboard is (almost) bare. A re-start file has been saved for biogem (biogem), but if you attempt to open either of the the netCDF output files you will get an error message. No data has been saved, for two reasons:
 * 15) No year has been set for a results time-slice in the file biogem_save_timeslice.dat (in $HOME/genie_input/biogem_exp_1). Actually, you were warned to this fact in the run-time output as biogem was initialized:
 * 16) WARNING -> Originating location in code [module,subroutine]: biogem_data,sub_init_data_save -> ERROR MESSAGE: No time-slice dates listed in file biogem_save_timeslice.dat fall within the model start and end years -> ERROR ACTION:  CONTINUING
 * 17) No data categories for time-slice saving have been selected in biogem_config.par.
 * 18) 	Re-run the model, but with a time-slice mid-point of 0.5 years set in biogem_save_timeslice.dat (any earlier or later that this will not work because the integration interval is set as 1.0 years by default in biogem_config.par (near the bottom of the I/O - TIME-SERIES section) and the run is only 1 year long …), and select (as t) ocean 'physical' properties as a time-slice data save in biogem_config.par. Explore the contents of fields_biogem_3d.nc. Note that much of the physical grid information is as dull as ditchwater (and also note that ocn_sal and ocn_temp fields are erroneously created, but with no data saved to them – a bug to be fixed :o) ). Also note that nothing is being saved yet in the 2-D netCDF file: fields_biogem_2d.nc.
 * 19) 	The important ‘physics’ is saved by selecting miscellaneous properties as a time-slice data save. De-select ocean 'physical' properties and select miscellaneous properties instead and re-run.
 * 20) 	A second type of data that can be saved is ‘time-series’ data: i.e., time series of global of surface averaged properties. The time-series configuration file: biogem_save_sig.dat is already populated for you. This file contains the mid-point years at which a time-series data point are to be saved. Again the data is time-integrated, with the integration interval specified in biogem_config.par (near the bottom of the I/O - TIME-SERIES section). Try selecting miscellaneous properties time-series data saving and re-run the model. The results directory now contains time-series files in ASCII (plain text) format. There is a header describing the information and units of each column.
 * 21) 	Do any other playing with the options you like.

Exercise 2. Basic ocean carbon cycling
Expected learning outcomey things stuff:
 * 1) 	Configure and run a full ocean carbon cycle
 * 2) 	Configure and explore further data saving capabilities
 * 3) 	Explore surface processes such as biological export and air-sea gas exchange
 * 4) 	Explore useful global diagnostics

Instructions

 * 1) 	Download and unpack configuration set [http://www.seao2.org/genie_workshop/biogem_exp_2.tar.gz]
 * 2) 	Try running the model:
 * 3) $ ./rungenie genie_eb_go_gs_ac_bg_itfclsd_08l n 0 1 biogem_exp_2
 * O, too bad – you don’t have the correct biological options configuration file … ?
 * 1) You have been provided with a basic phosphate-based configuration of ocean biogeochemical cycling – biogem_bio_1N1T_PO4MM_config.par. You need to change the biological option selected in biogem_config.par to 1N1T_PO4MM. (the string takes the form: OPTION, with the corresponding complete file name being: biogem_bio_OPTION_config.par.).
 * 2) Try running again.
 * 3) 	Still no luck … ?
 * 4) You don’t have any gaseous (atm), dissolved (ocn), or particulate (sed) tracers selected yet.
 * 5) The minimum useful selection of atmospheric tracers (in file gem_config_atm.par) would be pCO2. Select it (by a t in the first column).
 * 6) The corresponding dissolved tracer (gem_config_ocn.par) to CO2 gas is DIC (dissolved inorganic carbon) – select it. You will also need alkalinity (ALK) and the (single) nutrient: PO4. Some of the organic matter is partitioned into dissolved form (indicated by a non-zero value for the parameter: fraction of export production in the form of DOC in the biological configuration file: biogem_bio_1N1T_PO4MM_config.par), so you will need to select the dissolved tracers: DOM_C (the carbon component of dissolved organic matter) and DOM_P (the phosphate component).
 * 7) Finally, organic matter is produced in particulate form as well of course, so you will need to select a couple of sediment tracers (gem_config_sed.par): POC, POP, and CaCO3.
 * 8) 	Now try re-running the model.
 * 9) Whoa! Now it wants to go and completely re-compile itself :-(
 * 10) (Because the number of tracers selected dictates some fundamental array sizes in the ocean circulation model, which must be compiled in, everything has to start from scratch.)
 * 11) But after it has finished compiling, it does at least now run :-)
 * 12) Note that the list of tracers selected is listed during the initialization of biogem in the run-time output.
 * 13) Ignore the 1st ‘error’ message (about the POC and O2 tracers) – ‘tis a bug to be corrected …
 * 14) Also ignore the 2nd error message, which is simply telling you that you do not have dissolved calcium (Ca2+) selected as a tracer, so biogem does not know where to get Ca2+ from when creating CaCO3 and where to put it when it dissolves. However, biogem will happily estimate oceanic Ca2+ concentrations from salinity in the absence of an explicit Ca2+ cycle, and takes into account the alkalinity removal and release during CaCO3 precipitation and dissolution anyway.
 * 15) (The fact that genie does not halt upon either ‘error’ message indicates that biogem does not consider your configuration to be fundamentally incorrect, but rather it just contains issues that you might like to know about.)
 * 16) 	Now go to the results directory:
 * 17) $HOME/genie_output/genie_eb_go_gs_ac_bg_itfclsd_08l.biogem_exp_2/biogem
 * 18) As before, you don’t have any output saved yet. Start by defining a time-slice at 0.5 years and selecting ocean composition time-slice data to be saved. Re-run. You will now see a list of ocean (dissolved) tracers in fields_biogem_3d.nc, both related to the physical climate system (temperature and salinity) and biogeochemistry (ALK, DIC, PO4, and dissolved organic matter constituents). You can plot these both as vertical (latitude-depth) and horizontal (longitude-latitude) sections, as well as global averages and view the raw data in the Panopoly viewer.
 * 19) 	Go back to biogem_config.par and select aqueous carbonate system properties time-slice data to be saved. Re-run. You now have information about the aqueous carbonate system available to you.
 * 20) 	Selecting 'biological' fluxes will give you fields of particulate matter fluxes.
 * 21) 	Selecting miscellaneous properties will some further information based on what you have already been saving.
 * 22) 	Selecting ocean-atmosphere flux gives you new information in fields_biogem_2d.nc about 2-D property fields, such as the surface ocean to atmosphere pCO2 difference, and CO2 gas exchange.
 * 23) 	Similarly, time-series can be added to the data that is saved, try adding some …
 * 24) 	However, to save having to search through multiple time-series files, some simple global diagnostics are provided at the time-slice intervals in a single ASCII file. You can request these by selecting save global diagnostics under I/O – MISC in biogem_config.par. You will get a file with a name of the format: biogem_year_x_yyy_diag_GLOBAL.res (where x is the integer component of the year (mid-point) and yyy is the fractional part). Diagnostics include:
 * 25) 	time mid-point and integration interval
 * 26) 	global ocean surface area and volume
 * 27) 	mean global sir-sea gas exchange coefficient (for CO2)
 * 28) 	mean atmospheric tracer concentrations + total inventory
 * 29) 	mean ocean tracer concentrations + total inventory
 * 30) 	mean + total global productivity
 * 31) 	mean + total global sedimentation

Exercise 2a. Extended ocean carbon cycling [optional]

 * 1) 	Using the same model configuration as in Exercise #2, submit a 1000 years long model run to the cluster queue, e.g.:
 * 2) qsub -S /bin/bash subgenie genie_eb_go_gs_ac_bg_itfclsd_08l n 0 1000 biogem_exp_2
 * 3) Don’t forget to set a time-slice save for the last year of the run (mid-point: 999.5).
 * 4) While this runs on the queue continue on to the next Exercise.
 * 5) 	When it is done, browse the distributions of carbon cycle tracers or whatever takes your interest, and note that a large-scale circulation pattern has started to become well established. You can tell how close to a steady-state you are approaching by looking at the contents of some of the time-series files, particularly those relating to biological productivity, ocean DIC and/or atmospheric pCO2 – the values will asymptote to a constant value at steady-state.

Exercise 3. Transient operation
Expected learning outcomes:
 * 1) 	Use of forcing files
 * 2) 	Configure and analyse transient model runs

Instructions

 * 1) 	Download and unpack configuration set [http://www.seao2.org/genie_workshop/biogem_exp_3.tar.gz].
 * 2) 	Select the following atmospheric (gaseous) tracers: CFC-11, CFC-12, together with the equivalent dissolved tracers in the ocean.
 * 3) 	Select some data to save and then run the model for 10 years. It should run … but there is nothing (interesting) going on – the ocean and atmosphere have been initialized with a zero concentration of CFCs, and irrespective of the establishment of large scale ocean circulation and adjustment of climate, zero is where their concentrations are going to remain everywhere (hopefully!).
 * 4) 	You can initialize the composition of the ocean or atmosphere with any concentration you like – these are the parameter values in the 6th column of the tracer configuration/selection files (gem_config_ocn.par and gem_config_atm.par). In fact, in the previous exercise you already did this (in that the values of DIC, ALK, and PO4 in the ocean were: 2.244x10-3, 2.363x10-3, and 2.159x10-6 mol kg-1, respectively, and with 278x10-6 (ppm) CO2 in the atmosphere).
 * 5) Note that the ocean and atmosphere start out completely homogeneous at the prescribed tracer concentrations in the absence of a re-start file.
 * 6) 	Now edit gem_config_atm.par to give 1x10-6 (ppm) of CFC-11 in the atmosphere, select ocean composition time-slice saving (and a time-slice year as well), as well as both ocean composition and atmospheric (interface) composition time-series data saving, and re-run the model for 10 years.
 * 7) 	View the ocean and atmospheric time-series files for CFC-11. Note that CFC-11 is invading the ocean, thereby depleting the atmospheric inventory and increasing the amount in the ocean. Explore the time-slice netCDF file, in vertical and/or horizontal sections.
 * 8) 	Now we are going to run the model with a prescribed (observed transient) of CFC concentration in the atmosphere. The application of a prescribed time-dependent boundary conditions is termed a forcing:
 * 9) For the atmosphere (atm) tracers, atmospheric composition is forced from the 2-D surface ocean grid of biogem. This means that no forcing is applied to the atmosphere overlying 'dry' land cells. For the ocean (ocn) tracers, all the 'wet' cells in of the full 3-D ocean grid of biogem can be forced. For the sediment (sed) tracers, all the cells comprising the 3-D ocean grid of biogem (rather than the 2D sedgem grid) are forced. However, for the sed tracers, no provision is currently made for restoring forcing.
 * 10) Secondly, two differnt types of forcings of the system are recognized - a flux forcing and a restoring forcing. The first type, the flux forcing, is pretty self-evident - it represents a flux of tracer (in units of mol yr-1) that is applied to each cell. The restoring forcing is less intuitative - a flux is applied to the grid cells with a value calculated to bring the tracer values closer to the prescribed restoring value. A time-scale (in years) determines the rate at which the tracer values are brought towards the boundary condition and is set in the appropriate tracer configuration file (gem_config_*.par). This is termed the restoring constant. The smaller the value of the restoring constant the 'harder' the restoring, and the more rapidly the model will be constrained to approach the boundary condition.
 * 11) A tracer forcing of cb/s-goldstein is defined by a set of three files. Two files, with filenames of the name format:
 * 12) biogem_force_*_I.dat
 * 13) biogem_force_*_II.dat
 * 14) hold information about spatial distributions. The data format of these is either 2-D for the atmosphere or surface ocean only - rows (j) and columns (i), or 3-D (for the whole ocean) with the successive depth layers (k) repeated as (i,j) blocks down through the file. The third file has a filename of the format:
 * 15) biogem_force_*_sig.dat
 * 16) and contains two columns of information - the first is a time marker (year) and is paired with a corresponding magnitude modifier value in the second column.
 * 17) The values assigned to the time markers in the signal file define the time-varying information. For a model year falling outside of the maximum or minimum time markers specified in the signal file, no forcing will be applied (this in effect allows a restoring forcing to be turned 'on' and then 'off' again later). For model years inbetween specified time points, the bounding modifier values are linearly interpolation. It should be noted that biogem_force_*_sig.dat file must contain at least one (row) entry.
 * 18) The way in which these three files are employed to define a spatially-explicit time-varying input field is as follows. The first field file (*_I.dat) defines the baseline distribution and the second (*_II.dat) an alternative distribution. The difference between the two fields defines the spatial component of a time-varying forcing. The 2nd value of each data pair in the signal file is used to modify the difference between the two spatial fields.
 * 19) Puting it another way - the magnitude of the forcing that is applied at any point in time is equal to the baseline field, defined in:
 * 20) biogem_force_*_I.dat
 * 21) plus the time-dependent modifier times the difference between the two fields.
 * 22) Remember that the value of the time-dependent modifier is interpolated from the contents of the forcing signal file biogem_force_*_sig.dat.
 * 23) Because what I have just written above is probably quite close to complete gobbledygook, a couple of examples might help (or not);
 * 24) 	One way of using the forcing functionality of biogem would be to assign a minimum forcing field to *_I.dat, a maximum forcing field to *_II.dat, and specify a normalized (i.e., taking values between 0.0 and 1.0) series of modifier values in the signal file. For example, one could continually vary the surface temperature distribution forcing of the marine carbon cycle over the deglacial transition, based on available end-member reconstructions for glacial maximum and modern [CLIMAP, 1980]) and by assuming a semi-representative (normalized) signal with which to interpolate between these two time-slice reconstructions (see Ridgwell [2007]).
 * 25) 	A second way of using the functionality of tracer forcing in biogem would be to set every ‘wet’ cell in *_I.dat to a value of 0.0, and every corresponding location in *_II.dat to a value of 1.0. The final forcing field applied to the model will then be the same as applying the value of the interpolated modifier (in the signal file) equally to each and every cell. An example of this usage would be in applying a spatially uniform time-varying change in atmospheric composition, such as restoring atmospheric CO2 to an observed historical or predicted future atmospheric concentrations trajectory.
 * 26) NOTE: Although this scheme is generic and can equally be applied to any tracer (atm, ocn, or sed and including isotopic properties) as well as offering reasonable flexibility in representing the time-varying characteristics of a boundary condition, it does have limitations and cannot cover all possible eventualities. For instance, historical changes in atmospheric CFC concentrations and CO2 radiocarbon activity vary not only with time, but spatial heterogeneity also changes in a complex way that cannot be represented as a interpolation between two alternative end-member distributions.
 * 27) 	You have been provided with restoring forcings for the (mean global) atmospheric CFC-11 and CFC-12 concentrations from the years 1931 to 2001. Select these forcings in gem_config.atm.par (column #7) and use a restoring time-constant of 0.1 years (column #8; already set). Now run the model, starting in the year 1931 and finishing in 2001 (a run length of 70 years):
 * 28) $ ./rungenie genie_eb_go_gs_ac_bg_itfclsd_08l n 1931 70 biogem_exp_3
 * 29) Ensure that you have appropriate time-slice year(s) set – year 1994 would be good as a start, as there is observed data for around this time (setting 1994.5 as the year mid-point and retaining the default 1.0 year integration time will give you a averaged slice corresponding to year 1994). You may (i.e., will …) also want to edit the time-series file (biogem_save_sig.dat) to resolve the interval in question (1931 to 2001) in somewhat finer detail (or at all, in fact).
 * 30) 	Explore the resulting atmospheric and oceanic inventory time-series files as well as the netCDF output.
 * 31) Note that this does NOT produce a realistic simulation of the (observed) penetration of CFCs into the ocean because ocean circulation has not yet been spun-up (which would require either you waiting around for a couple of hours or a set of pre-calculated re-start files for the climate model).

Exercise 3a. Transient run from spin-up [optional]

 * 1) 	Try repeating Exercise #3, but using the re-start of the run generated in Exercise #2a, in which a large-scale pattern of circulation was already starting to be well established. You can use a re-start by adding an extra parameter to the list passed to the run script, e.g.:
 * 2) $ ./rungenie genie_eb_go_gs_ac_bg_itfclsd_08l n 1931 70 biogem_exp_2 biogem_exp_3
 * 3) 	Re-analyze the year 1994 CFC distributions in the ocean.