BISMG:StephenC

Wed Jan 27
We had a discussion about MPI and CHOMBO, and we were unsure as to how something far down the call stack might find out MPI context information. I've had a look and the good news is: provided the main program initilizes and finalizes the MPI data, as all MPI programs must, this data is globally available. Example program below - note that   the function ignorant does parallel work, extracting all its data from the global pool  MPI_Init and MPI_Finalize must be called very first and very last - there might be some issues here to dow ith the way C++ compilers write the code to construct and destroy objects - we will have to see. 


 * 1) include "mpi.h"
 * 2) include "FArrayBox.H"
 * 3) include "DisjointBoxLayout.H"
 * 4) include "LoadBalance.H"
 * 5) include "LevelData.H"
 * 6) include "BRMeshRefine.H"

//! ignorant doesn't get passed any arguments regarding MPI void ignorant{

int m = 128; int maxsize = 65; ProblemDomain pd( Box(IntVect(0,0), IntVect(m,m))) ;

//split problem into blocks with maxsize degress of freedom Vector boxes; domainSplit(pd, boxes, maxsize, 1); Vector procs(boxes.size); LoadBalance(procs, boxes);

if (procID == procs[0]){ for (size_t i =0; i < procs.size;++i){ std::cout << "processor = " << procs[i] << " handles box " << i << " : " << boxes[i] << std::endl; } }

DisjointBoxLayout grids(boxes, procs, pd); LevelData f;

f.define(grids, 1, IntVect::Zero);

//set the value in each processors portion to its ID number for (DataIterator dit (f.disjointBoxLayout); dit.ok; ++dit){ f[dit].setVal(procID); }

//per processor max Real fmax = 0.0; for (DataIterator dit (f.disjointBoxLayout); dit.ok; ++dit){ fmax = std::max(fmax, f[dit].max); } std::cout << " procID = " << procID << " local max(f) = " << fmax << std::endl;

//actual max (using broadcast & gather) Real gmax = 0.0; Vector pmax(numProc); gather(pmax, fmax, procs[0]); if (procID == procs[0]){ for (size_t i =0; i < pmax.size;++i){ gmax=std::max(pmax[i],gmax); } }  broadcast(gmax, procs[0]);

std::cout << " procID = " << procID << " global max(f) = " << gmax << std::endl;

}

int main(int argc, char* argv[]){

MPI_Init(&argc,&argv); //{   ignorant;
 * 1) ifdef CH_MPI
 * 1) endif

//} MPI_Finalize; return 0; }
 * 1) ifdef CH_MPI
 * 1) endif

running as scalar process: > ./mpitest processor = 0 handles box 0 : ((0,0) (64,64) (0,0)) processor = 0 handles box 1 : ((0,65) (64,128) (0,0)) processor = 0 handles box 2 : ((65,0) (128,64) (0,0)) processor = 0 handles box 3 : ((65,65) (128,128) (0,0)) procID = 0 local max(f) = 0 procID = 0 global max(f) = 0

and with two processors mpirun -n 2 ./mpitest processor = 0 handles box 0 : ((0,0) (64,64) (0,0)) processor = 0 handles box 1 : ((0,65) (64,128) (0,0)) processor = 1 handles box 2 : ((65,0) (128,64) (0,0)) processor = 1 handles box 3 : ((65,65) (128,128) (0,0)) procID = 0 local max(f) = 0 procID = 1 local max(f) = 1 procID = 0 global max(f) = 1 procID = 1 global max(f) = 1

and six mpirun -n 6 ./mpitest processor = 0 handles box 0 : ((0,0) (64,64) (0,0)) processor = 1 handles box 1 : ((0,65) (64,128) (0,0)) processor = 2 handles box 2 : ((65,0) (128,64) (0,0)) processor = 3 handles box 3 : ((65,65) (128,128) (0,0)) procID = 0 local max(f) = 0 procID = 2 local max(f) = 2 procID = 1 local max(f) = 1 procID = 3 local max(f) = 3 procID = 4 local max(f) = 0 procID = 5 local max(f) = 0 procID = 0 global max(f) = 3 procID = 2 global max(f) = 3 procID = 4 global max(f) = 3 procID = 1 global max(f) = 3 procID = 3 global max(f) = 3 procID = 5 global max(f) = 3

Fri Jan 22
We also talked about FORTRAN 90 calling C++ functions which had some kind of persistent state, and I suggested this could be done using void* pointers. That isn't as easy a I assumed (gfortan f90 for example doesn't seem to preserve the pointers as i expected, so some additonal fiddling is needed), and is also a bit difficult to port between platforms (which have a different idea of what address is and so on). There is a simple way round this which ought to be sufficent. The idea is this: Fortan code asks C++ code to create new contexts, with integer keys, then act upon them, and ultimately destroy them.

Here is an example, where two contexts are created.

! file b.f90 program b

integer n1,n2; n1 = 1 call create(n1) n2 = 2 call create(n2) call act(n1,10) call act(n2,100) call act(n1,10) call act(n2,10) call destroy(n1) call destroy(n2)

end program b

//! file a.hpp

extern "C" { void create_( int*); void destroy_(int* ); void act_(int*, int*);
 * 1) include
 * 2) include
 * 3) include

};

class A { std::vector v;

public:

void grow(int n){ v.resize(std::max(0,int(v.size) + n)); std::cout << "this = " << this << " this->v.size = " << v.size << std::endl; } };

std::map globals; // a global store of heap pointers with an integer key to look them up

//! file a.cpp
 * 1) include "a.hpp"

void create_( int* n) { if (globals.find(*n) == globals.end){ globals[*n] = new A; } }

void destroy_(int* n) { if (globals.find(*n) != globals.end){ delete (globals[*n]); globals.erase(*n); } }

void act_(int* n, int* m) { if (globals.find(*n) != globals.end){ globals[*n]->grow(*m); }

}

compiling and linking is done like this (gfortran does the linking) gfortran -g -o f90part.o -c b.f90 g++ -g -o cpppart.o -c a.cpp gfortran -g f90part.o cpppart.o -lstdc++

and then running, we get

this = 0x4c315d0 this->v.size = 10 this = 0x4c31678 this->v.size = 100 this = 0x4c315d0 this->v.size = 20 this = 0x4c31678 this->v.size = 110

in other words, we are managing two classes which can offer various functionality including memory management. valgrind reports that all the memory is properly freed. Of course, usually we will only need one key,pointer pair - the cache for one ice sheet, but we can also cope with the case where more might be run.

Fri Jan 15
Last week's bugs are fixed, so I can begin to run Rupert's models

CHOMBO's ViscousTensorOp class treats velocity values at cell centres - they are colocated with values of $$H$$. This isn't ideal in normal circumstances because (1) to compute advective fluxes, we need velocity values at cell faces, so have to interpolate (2) the pressure gradient - $$\rho g H h_{,i} $$ in the Shelfy-stream model must be approximated (in the x direction) by $$\rho g H h_{,x} \approx \rho g H_P (h_E - h_W)/2 \Delta x$$, which is then insensitive to oscillations with period $$\Delta x$$. But there is a bigger problem at the grounding line. The diagram below shows a profile where the grounding line passes through point P. In this cell, the approximate pressure gradient will be large (because $$h_W$$ is), but the basal stress coefficient small - so the velocity gradient can be huge. The ice then this at the grounding line and it never advances.



On the other hand, a staggered velocity mesh doesn't suffer this problem - either the pressure gradient and basal stress coefficient are both large (left of the grounding line), or neither are (right of it).

In the long term, we won't be using CHOMBO's ViscousTensorOp, but our own linear operator class based on the Blatter-Pattyn model, so we will use a staggered mesh (with some attendant complications) but for now I am using a kludge term

$$\rho g H h_{,x} \approx \rho g H_P [ \frac{(t + \beta _W + \beta _P)(h_E - h_P) + (t + \beta _E + \beta _P)(h_P - h_W)}{ ( 2t + \beta _W + 2 \beta _P + \beta _E) \Delta x} ] $$

with small $$t$$, so that the pressure gradient is taken from the floating ice in the grounding line cell and the grounding line can advance.

8 Jan 2009
Trying to reproduce Rupert's grounding line results using a periodic (in y) 2D domain. Currently have some bugs to fix. The first image shows, for a 32x32 grid, ice height, $$H$$, the x velocity  $$u_x$$ and a flotation number $$H - \rho _w / \rho b$$ which is zero at the grounding line. All these are averaged in y. Empty circles are at t = 1800 years, full circles at t = 1820 years.



There are two obvious issues. (1) Oscillation in the ice height and (2) An incorrect nearly flat velocity profile at the later time, despite it looking sensible at the earlier time.

In the second image, I've used the condition described previously (ie and estimate of the error in the velocity field) to refine the mesh. While it has the desired result (the grounding line advances smoothly), the original issues remain.



Friday Dec 18 2009
Added a new Richardson error estimator, based on

$$ R_{2h} = L_{2h}^{-1}C(r_h,2) - C(L_h ^{-1} r_h,2) $$

where $$L_h$$ is the linearized viscous tensor operator discretized at resolution $$h(x,y)$$ and $$r_h$$ is the right hand side in the most recent linear system solved.

ie

$$ R_{2h} = L_{2h}^{-1}C(r_h,2) - C(u_h,2) $$

where $$u_h$$ is the velocity function at the end of the Picard iterations

The previous estimator was producing some odd results in the nonlinear case (growing with resolution). This one appears to work well

The major advantage is that error estimates are just that: estimates of the error in the velocity field (rather than truncation order). Of course, they are only reliable close to the solution, but should act as decent error indicators otherwise.

The major disadvantage is computational cost - an additional linear system needs to be solved. But it is a smaller one than the main problem (by a factor of 4 in 2D and 8 in 3D), we have a decent initial guess for its solution ($$C(u_h,2)$$), and it is just a linear problem, to be solved after a nonlinear problem. Profiling suggest that it takes about 5% of the CPU time when running the ISMIP-HOM C test at L = 160 km. It is necessary to insist on blocks of at least 4x4 cells in any patch (for the multigrid solver to work) which might result in some extra cost.

Some graphs [[Media:slcdec18.pdf]] show the error estimator and adaptive refinement at work. In all these the problem is ISMIP-HOM C but at L = 1024 km. The first pair (x-component of u and the order of magnitude of its error etimate) are at 32km resolution - the fast flowing region confined to a couple of cells. But most importantly, it only grows to about 1km/a. In the subsequent graphs, the peak speed grows to 2 km / a over much of a cell, and the error estimates shrink, first in extent and then in magnitude. I think is is interesting that a broad variation in basal stress (easily resolved by the 32km mesh) can lead to a velocity profile that depends so strongly on resolution. If the fast flowing regions were connected to mass sinks, about twice as much mass per unit time would be lost through them in the adaptively refined mesh compared to the coarse mesh.

Friday 11 Dec
Periodic boundary conditions and nonlinear stresses (which are solved by Picard iteration) are now included, so I can compare with Experiment C from Pattyn et al 2008, Crysophere 2 p95. Pattyn et al 2008 is a comparison study of many higher order stress models, including full Stokes models, Blatter-Pattyn (LMLa) type models, and Shelfy-Stream (L1L1) type models. Experiment C prescribes a periodic domain of side $$L$$, a constant ice thickness 1km, a base elevation

$$ z_b = - x \tan (0.1/180 \pi) $$

and basal drag

$$ \beta ^2 = 1000 + 1000 \sin(2 \pi x / L) \sin (2 \pi y / L) \mbox{Pa a m}^{-1} $$

My results (for $$L \in \{5,10,20,40,80,160\} \mbox{km}$$ are similar to those in the paper, but consistently with a slightly higher flow rate than the mean plus standard deviation reuslts published in there - they state that L1L1 approximations are outliers.

Also added a Richardson error estimator, that is an error indicator $$ e(x,y) = |T_{2h} C(u_h) - C(T_{h} u_h)| $$ where $$T$$ is the viscous tensor operator, C(.) is a coarse averaging operator, and the $$h$$ subscript means 'discretised with mesh spacing h'. This is applied at the end of the Picard iterations

Friday 4th Dec
Can now solve (linear) stress balance equations with arbitrary Dirichlett conditions, and advance ice thickness given the solution, u. Set up a test problem where the initial ice thickness is

$$H = 1 + 12 x(1-x)y(1-y) \exp( - \left [ (x-1/2)^2 + (y-1/2)^2 \right] / s$$

and at the boundaries, $$u_i = 0$$ and $$H = 1$$. As time passes, the ice bump spreads across the domain at much the same rate if I refine the majority of the mesh or just a region where $$ |H_{,i}| $$ is largest.

Image shows ice thickness along $$y = 1/2$$ at several times, black for adaptive mesh, purple for mostly-refined mesh



must find a way of generating decent graphics - visit only exports rasters and some odd formats

Friday 27th Nov
Started to write classes AMRLevelIceSheet, IceSheetPhysics, and IceSheetPhysicsIBC which take advantage of CHOMBO's explicit treatment of advection and implicit treatment of diffusion.

Advection

I've got the ice thickness $$H$$ advecting around, using a first-order upwind scheme. With a given velocity field $$u = (1,1)^T$$, $$H = 1$$ on the boundaries and initial $$ H=1 + x(1-x)y(1-y)\exp( ((x-1/2)^2 + (y-1/2)^2)/s)$$, on a 32x32 mesh, not only is the central bump in the initial state not well resolved (below left), but numerical diffusion broadens it a good deal over time (below right)



Numerical diffusion due to the first order upwind scheme decays like $$ |u_x| |u_y| h^2 |\nabla H| $$, so adaptive refinement can be use to reduce it, as in the images below



Stress model

I also started to implement a stress model, currently linear, but otherwise similar to the usual Shelfy-Stream model (equation 2, in Nov 13 notes). This is an elliptic equation, so must be solved on all the patches at the same time. This is working for no-slip boundary conditions ($$u_i = 0 $$) only for now, other Dirichlett boundary conditions cause a failure. The graph below shows velocity vectors superimposed on the initial state.



The associated class, ViscousTensorOp, might be adapted to use external stress models in two ways: first by called external code to evalutate the flux, and second, by using whatever solvers already exists in a preconditioning step.

Friday 20 Nov
I'm still getting to grips with Chombo. I've written a very simple application, approximately solving a first order equation

$$\frac{\partial u}{\partial t} = \sin(10(x+y) - t)$$

with initial values

$$u(x,y,t=0) = \cos(10(x+y))$$

To implement this in Chombo, I wrote a C++ class, AMRLevelSinWave which is a subclass of Chombo's AMRLevel base class. Any subclass of AMRLevel can be passed to the main driver for time dependent calculations, and needs to implement a few methods: advance, initialGrid(g), regrid(g), initialTagCells(a), tagCells(a).

Of these, it is advance that would interface with Glimmer-CISM. In this app, it updates the solution at each level like so

DataIterator dit(_uNew.disjointBoxLayout); for (dit.begin; dit.ok; ++dit){ const Box& b = _uNew.disjointBoxLayout[dit]; BoxIterator bit(b); for (bit.begin; bit.ok; ++bit){ const IntVect& iv = bit; Real x = (iv[0] + 0.5) * dx; Real y = (iv[1] + 0.5) * dx; _uNew[dit](iv,0) += dt * 0.5*(sin( 10.0 * (x+y) - time) + sin( 10.0 * (x+y) - time - dt)); } }

So, iterate through each of a level's set of rectangular regions, using the DataIterator dit. Then fish out _uNew[dit], which contains the data for just one rectangular box (conveniently including a block of data arranged as a FORTRAN multidimensional array would be), and updates it. It is easy enough to imagine passing this block (plus some boundary data, which could be derived from the solution on a coarser level) to some fixed mesh code to update (although something extra needs to be done to match both solution and fluxes at the boundaries)

By implementing the other methods, a series of refined meshes are generated as the solution is evolved, three timesteps worth shown below







Friday 13th November
I've been investigating Chombo, attempting to decide how we might use it to build an ice sheet model with adaptive mesh refinement. Ideally, we would like to integrate it with the GLIMMER-CISM model, but at the moment I don't know enough about either to say how that might be done. I think that the best way to start would be to look at how a Chombo application needs to be written by trying to do just that: implementing a simple ice sheet model, essentially a 2D analogue of the model Rupert used in his 1D grounding line work.

The (nonlinear) PDEs are (1), a vertically integrated mass balance equation $$\frac{\partial H}{\partial t} + (Hu_j)_{,j} = a$$ and (2) a vertically integrated momentum balance equation $$\left [ \nu \left (u_{i,j} + u_{j,i} + 2 \delta _{ij} u_{k,k} \right ) \right]_{,j} = \beta ^2 u_i + \rho g H s_{,i}$$. $$H$$ is the ice height, and $$u$$ the vertically averaged velocity vector. $$ \nu = b \left [ (u_{i,j} + u_{j,i})(u_{j,i} + u_{i,j}) \right ]^\frac{n-1}{2n} $$ is a viscosity given by Glen's flow law for ice, $$\beta^2(x,y)$$ is a basal drag coefficient, $$s(x,y)$$ is the surface elevation of ice.

(1) is a first-order hyperbolic conservation equation and (2) is second-order elliptic. On top of this there will be second-order parabolic equations (heat equation), which can be broken into hyperbolic and elliptic parts, so I think that building a model that solves these should demonstrate that Chombo can do the basic job, and help me decide what kind of bridge between the two code bases might be needed. I expect that the most effort will be due to elliptic operators, which need to be inverted on the whole mesh. Chombo provides a fairly involved multigrid procedure (AMRVCycleMG(l)) which does the job, and only seems to need definitions of the operators on uniform grids, which looks promising. There is a C++ class, ViscousTensorOp which can be used to solve a Stokes problem, with $$A u_i + B \left [ C \left ( u_{i,j} + u_{j,i} \right ) + D \delta _{ij} u_{k,k} \right ] _{,j} = E $$ That is, a linear, constant coefficient elliptic PDE that is, basically, (2) linearised. Chombo also has an example application (SelfGravity) where a system of hyperbolic equations are to be advanced in time alongside solutions to one elliptic equation (Poisson's equation). So, it seems that I have a template to work from.

If anyone would like see the sort of thing Chombo can do, let me know. There is a nice advection-diffusion (heat equation) example included but it has a minor flaw which renders the results insensible - it uses $$|\phi _{,j} / \phi| > \epsilon$$ to decide where to refine the mesh. Since $$\phi$$ is close to zero across much of the domain, the mesh is refined everywhere. In my copy, I have changed the criterion to $$|\phi _{,j} / (1 + \phi)| > \epsilon$$, and the resulting adaptive refinement works nicely