Difference between revisions of "UM version4.5 benchmarks"
Jump to navigation
Jump to search
Line 4: | Line 4: | ||
=Preamble= | =Preamble= | ||
− | * Cluster/Parallel file systems are often a bottleneck. | + | * Cluster/Parallel file systems are often a bottleneck. Timings are for writing to local disk, unless specified otherwise. |
* If the model is not filesystem-bound, it is often (MPI massage) latency-bound. | * If the model is not filesystem-bound, it is often (MPI massage) latency-bound. | ||
* Only the master process writes output, this can lead to load-balance issues, which hinder scaling. | * Only the master process writes output, this can lead to load-balance issues, which hinder scaling. |
Revision as of 15:39, 17 December 2012
Benchmarking UM Version4.5 on different Architectures
Preamble
- Cluster/Parallel file systems are often a bottleneck. Timings are for writing to local disk, unless specified otherwise.
- If the model is not filesystem-bound, it is often (MPI massage) latency-bound.
- Only the master process writes output, this can lead to load-balance issues, which hinder scaling.
AMD Bulldozer
Intel Westmere
- Emerald.
- QDR Infiniband (non-RoCE)
- GCOMv3.1
IMB ping-pong message latency | |||
---|---|---|---|
0 bytes | 128 bytes | 1024 bytes | |
between sockets | ~2.0us | ~2.4us | ~4.7us |
FAMOUS
Domain Decomposition | Number of Cores | Model-years/day |
4x3 | 12 | ~313 |
6x4 | 24 | ~360 |
12x3 | 36 | ~424 |
HadCM3
Domain Decomposition | Number of Cores | Model-years/day |
4x3 | 12 | ~24 |
6x4 | 24 | ~40 |
12x3 | 36 | ~60 |
Intel SandyBridge
- Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power)
- 20MB L3 cache
- GCOMv3.1
IMB ping-pong message latency | |||
---|---|---|---|
0 bytes | 128 bytes | 1024 bytes | |
between sockets | ~0.70us | ~1.15us | ~2.0us |
FAMOUS
Domain Decomposition | Number of Cores | Model-years/day |
4x2 | 8 | ~327 |
8x2 | 16 | ~450 |
8x4 | 32 | ~480 |
- The last line of this table shows a real problem scaling beyond 16 cores. Load balance?
- Would like to try to improve file writing performance and re-run.
HadCM3
Domain Decomposition | Number of Cores | Model-years/day |
8x2 | 16 | ~48 |
8x4 | 32 | ~65 |