Difference between revisions of "UM version4.5 benchmarks"
Jump to navigation
Jump to search
(Created page with 'category:JASMIN '''Benchmarking UM Version4.5 on different Architectures''' =Preamble= * Cluster/Parallel file systems are often a bottleneck. * If the model is not filesys…') |
|||
Line 13: | Line 13: | ||
=Intel SandyBridge= | =Intel SandyBridge= | ||
+ | |||
+ | * Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power) | ||
+ | * 20MB L3 cache | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="10" | ||
+ | !colspan=4|MPI message latency | ||
+ | |- | ||
+ | || || 0 bytes || 128 bytes || 1024 bytes | ||
+ | |- | ||
+ | || between sockets || ~0.70us || ~1.15us || ~2.0us | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | ==FAMOUS== | ||
+ | |||
+ | {| border="1" cellpadding="10" | ||
+ | || Domain Decomposition || Model-years/day | ||
+ | |- | ||
+ | || 4x2 || ~327 | ||
+ | |- | ||
+ | || 8x2 || ~450 | ||
+ | |- | ||
+ | || 8x4 || ~480 | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | ==HadCM3== | ||
+ | |||
+ | {| border="1" cellpadding="10" | ||
+ | || Domain Decomposition || Model-years/day | ||
+ | |- | ||
+ | || 8x2 || ~48 | ||
+ | |- | ||
+ | || 8x4 || ~65 | ||
+ | |- | ||
+ | |} |
Revision as of 15:24, 17 December 2012
Benchmarking UM Version4.5 on different Architectures
Preamble
- Cluster/Parallel file systems are often a bottleneck.
- If the model is not filesystem-bound, it is often (MPI massage) latency-bound.
- Only the master process writes output, this can lead to load-balance issues, which hinder scaling.
AMD Bulldozer
Intel Westmere
Intel SandyBridge
- Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power)
- 20MB L3 cache
MPI message latency | |||
---|---|---|---|
0 bytes | 128 bytes | 1024 bytes | |
between sockets | ~0.70us | ~1.15us | ~2.0us |
FAMOUS
Domain Decomposition | Model-years/day |
4x2 | ~327 |
8x2 | ~450 |
8x4 | ~480 |
HadCM3
Domain Decomposition | Model-years/day |
8x2 | ~48 |
8x4 | ~65 |