Difference between revisions of "UM version4.5 benchmarks"
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
	
| (One intermediate revision by the same user not shown) | |||
| Line 9: | Line 9: | ||
* Worst case message latencies for a cohort of processors are what matter for scaling.  The vast majority of messages are either ~100 bytes or ~1KB in size.  Latencies are reported for these key message sizes.  | * Worst case message latencies for a cohort of processors are what matter for scaling.  The vast majority of messages are either ~100 bytes or ~1KB in size.  Latencies are reported for these key message sizes.  | ||
| − | |||
| − | *   | + | =Emerald=  | 
| + | |||
| + | * Intel Westmere E5649 (2.53GHz)  | ||
* QDR Infiniband (non-RoCE)  | * QDR Infiniband (non-RoCE)  | ||
* GCOMv3.1  | * GCOMv3.1  | ||
| Line 51: | Line 52: | ||
|}  | |}  | ||
| − | =Intel SandyBridge=  | + | |
| + | =Intel SandyBridge Test System=  | ||
* Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power)  | * Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power)  | ||
| Line 91: | Line 93: | ||
|-  | |-  | ||
|| 8x4 || 32 || ~65  | || 8x4 || 32 || ~65  | ||
| + | |-  | ||
| + | |}  | ||
| + | |||
| + | |||
| + | =Polaris=  | ||
| + | |||
| + | * Intel E5-2670 @ 2.60GHz  | ||
| + | * Infiniband: Mellanox Technologies MT27500 Family [ConnectX-3]  | ||
| + | * Lustre  | ||
| + | |||
| + | |||
| + | ==FAMOUS==  | ||
| + | |||
| + | {| border="1" cellpadding="10"  | ||
| + | || Domain Decomposition || Number of Cores || Model-years/day  | ||
| + | |-  | ||
| + | || 4x4 || 16 || ~330  | ||
| + | |-  | ||
| + | || 8x4 || 32 || ~330  | ||
| + | |-  | ||
| + | |}  | ||
| + | |||
| + | ==HadCM3==  | ||
| + | |||
| + | {| border="1" cellpadding="10"  | ||
| + | || Domain Decomposition || Number of Cores || Model-years/day  | ||
| + | |-  | ||
| + | || 4x4 || 16 || ~51  | ||
| + | |-  | ||
| + | || 8x4 || 32 || ~73  | ||
| + | |-  | ||
| + | || 16x4 || 64 || ~73  | ||
|-  | |-  | ||
|}  | |}  | ||
Latest revision as of 14:36, 24 May 2013
Benchmarking UM Version4.5 on different Architectures
Preamble
- Cluster/Parallel file systems are often a bottleneck. Timings are for writing to local disk, unless specified otherwise.
 - If the model is not filesystem-bound, it is often (MPI massage) latency-bound.
 - Only the master process writes output, this can lead to load-balance issues, which hinder scaling.
 - Worst case message latencies for a cohort of processors are what matter for scaling. The vast majority of messages are either ~100 bytes or ~1KB in size. Latencies are reported for these key message sizes.
 
Emerald
- Intel Westmere E5649 (2.53GHz)
 - QDR Infiniband (non-RoCE)
 - GCOMv3.1
 
| IMB ping-pong message latency | |||
|---|---|---|---|
| 0 bytes | 128 bytes | 1024 bytes | |
| between nodes | ~2.0us | ~2.4us | ~4.7us | 
FAMOUS
| Domain Decomposition | Number of Cores | Model-years/day | 
| 4x3 | 12 | ~313 | 
| 6x4 | 24 | ~360 | 
| 12x3 | 36 | ~424 | 
HadCM3
| Domain Decomposition | Number of Cores | Model-years/day | 
| 4x3 | 12 | ~24 | 
| 6x4 | 24 | ~40 | 
| 12x3 | 36 | ~60 | 
Intel SandyBridge Test System
- Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power)
 - 20MB L3 cache
 - GCOMv3.1
 
| IMB ping-pong message latency | |||
|---|---|---|---|
| 0 bytes | 128 bytes | 1024 bytes | |
| between sockets | ~0.70us | ~1.15us | ~2.0us | 
FAMOUS
| Domain Decomposition | Number of Cores | Model-years/day | 
| 4x2 | 8 | ~327 | 
| 8x2 | 16 | ~450 | 
| 8x4 | 32 | ~480 | 
- The last line of this table shows a real problem scaling beyond 16 cores. Load balance? (Latencies are much better than QDR IB.)
 - Would like to try to improve file writing performance and re-run.
 
HadCM3
| Domain Decomposition | Number of Cores | Model-years/day | 
| 8x2 | 16 | ~48 | 
| 8x4 | 32 | ~65 | 
Polaris
- Intel E5-2670 @ 2.60GHz
 - Infiniband: Mellanox Technologies MT27500 Family [ConnectX-3]
 - Lustre
 
FAMOUS
| Domain Decomposition | Number of Cores | Model-years/day | 
| 4x4 | 16 | ~330 | 
| 8x4 | 32 | ~330 | 
HadCM3
| Domain Decomposition | Number of Cores | Model-years/day | 
| 4x4 | 16 | ~51 | 
| 8x4 | 32 | ~73 | 
| 16x4 | 64 | ~73 |