Revision as of 15:24, 17 December 2012

Benchmarking UM Version4.5 on different Architectures

Preamble

Cluster/Parallel file systems are often a bottleneck.
If the model is not filesystem-bound, it is often (MPI massage) latency-bound.
Only the master process writes output, this can lead to load-balance issues, which hinder scaling.

MPI message latency
	0 bytes	128 bytes	1024 bytes
between sockets	~0.70us	~1.15us	~2.0us

@@ Line 13: / Line 13: @@
 =Intel SandyBridge=
+* Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power)
+* 20MB L3 cache
+{| border="1" cellpadding="10"
+!colspan=4|MPI message latency
+|-
+||  || 0 bytes || 128 bytes || 1024 bytes
+|-
+|| between sockets || ~0.70us || ~1.15us || ~2.0us
+|-
+|}
+==FAMOUS==
+{| border="1" cellpadding="10"
+|| Domain Decomposition || Model-years/day
+|-
+|| 4x2 || ~327
+|-
+|| 8x2 || ~450
+|-
+|| 8x4 || ~480
+|-
+|}
+==HadCM3==
+{| border="1" cellpadding="10"
+|| Domain Decomposition || Model-years/day
+|-
+|| 8x2 || ~48
+|-
+|| 8x4 || ~65
+|-
+|}