WRF EMS Benchmark results and hardware

Looking for new hardware to run WRF? Intel or AMD? Check this forum.
Boogie
Posts: 17
Joined: Sun Jan 02, 2011 12:24 pm

Re: WRF EMS Benchmark results and hardware

Post by Boogie » Tue May 06, 2014 9:23 am

Some benchmark results for the NMM large simulation with no nesting.
Basic System Information for WRF

System Date : Tue May 6 11:16:24 2014 UTC
System Hostname : WRF
System Address : xxx

System OS : Linux
Linux Distribution : CentOS release 6.5 (Final)
OS Kernel : 2.6.32-431.11.2.el6.x86_64
Kernel Type : x86_64

Processor and Memory Information for WRF

CPU Name : Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz
CPU Instructions : sandybridge
CPU Type : 64-bit
CPU Speed : 2500.14 MHz

EMS Determined Processor Count
Physical CPUs : 2
Cores per CPU : 6
Total Processors : 12

Hyper-Threading : Off

System Memory : 47 Gbytes

EMS Release Information for WRF

EMS Release : 3.4.1.14.16
EMS Binaries : x64


EMS NMM core benchmark simulation completed in 11 minutes 58 seconds
And with the same setup the ARW large simulation (no nesting):
EMS ARW core benchmark simulation completed in 25 minutes 46 seconds

Ben Lankamp
Posts: 9
Joined: Thu Jun 17, 2010 1:18 pm

Re: WRF EMS Benchmark results and hardware

Post by Ben Lankamp » Wed Jun 18, 2014 7:45 pm

New system, we only run ARW core so here are the results for the ARW large benchmark simulation:
Basic System Information for WRF

System OS : Linux
Linux Distribution : Debian 7.5
OS Kernel : 3.2.0-4-amd64
Kernel Type : x86_64

Processor and Memory Information for WRF

CPU Name : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
CPU Instructions : nehalem
CPU Type : 64-bit
CPU Speed : 2660.05 MHz

EMS Determined Processor Count
Physical CPUs : 2
Cores per CPU : 6
Total Processors : 12

Hyper-Threading : On

System Memory : 23.5 Gbytes

EMS Release Information for WRF

EMS Release : 3.4.1.14.16
EMS Binaries : x64

EMS ARW core benchmark simulation completed in 30 minutes 7 seconds
We experienced that turning HT on, while off is commonly recommended, actually improved our performance by ~4-5%. We traced this to improved IO-efficiency with hyper threading on. With our old SAS storage, the CPU was always faster than IO, but now we use SSD storage which benefits from the CPU being able to properly feed the storage device. Reflecting this, raw performance is up by about 170%, but IO performance is up by 500%, leaving us with a respectable ~215% increase on our operational domain (257x217, 45 levels) from our old system. So before turning off HT in a new system with SSD-storage, try running with HT on and see how it performs.

Viking
Posts: 1
Joined: Fri Sep 26, 2014 2:13 pm

Re: WRF EMS Benchmark results and hardware

Post by Viking » Sat Sep 27, 2014 2:23 pm

Here I go with a build cheap and not very exotic I made 1 year ago for 400$ (primary purpose was not for numerical computing).
All options are raw, i made no changes (4 cores available, but only 3 used by EMS default conf)
Basic System Information for wrf

System Date : Sat Sep 27 02:12:40 2014 UTC
System Hostname : wrf
System Address : 127.0.0.1

System OS : Linux
Linux Distribution : Scientific Linux release 6.5 (Carbon)
OS Kernel : 2.6.32-431.29.2.el6.x86_64
Kernel Type : x86_64

Processor and Memory Information for wrf

CPU Name : Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz
CPU Instructions : sandybridge
CPU Type : 64-bit
CPU Speed : 3400.07 MHz

EMS Determined Processor Count
Physical CPUs : 1
Cores per CPU : 4
Total Processors : 4

Hyper-Threading : Off

System Memory : 7.6 Gbytes

EMS Release Information for wrf

EMS Release : 3.4.1.14.16
EMS Binaries : x64
_______________________________________LARGE_________________________________________
EMS NMM core................................23 minutes 24 seconds
EMS NMM core nested .....................49 minutes 54 seconds

EMS ARW core.................................47 minutes 38 seconds
EMS ARW core nested ......................1 hour 51 minutes 5 seconds
_______________________________________SMALL_________________________________________
EMS NMM core................................ 5 minutes 10 seconds
EMS NMM core nested .....................14 minutes 7 seconds

EMS ARW core..................................8 minutes 3 seconds
EMS ARW core nested .......................27 minutes 2 seconds

fog
Posts: 1
Joined: Fri Feb 20, 2015 7:29 am

Re: WRF EMS Benchmark results and hardware

Post by fog » Fri Feb 20, 2015 8:13 am

After reading this complete thread several times and checking other ressources regarding hardware for WRF, it seems clear to me that you have to find a good balance between CPU performance and RAM performance.

What I ask myself now, especially after reading that going from single channel ram config to dual channel config makes a huge difference:

Wouldn't it be best to give each physical core his "own" RAM channel, so none of the cores has to fight for memory access? As I understand intels i7/xeons, you are limited to four RAM channels per physical processor at the moment, so this would be my first (naive?) configuration:

one fast quad core CPU with four RAM channels, 4(8) RAM sticks (single socket system)
two fast quad core CPUs with four RAM channels each, 8(16) RAM sticks (dual socket system)
...

Regarding raw computing performance (ignoring disk I/O for now), it seems to me that WRF can't benefit from CPUs with >> 4 cores, given a reasonably fast CPU (3-4GHz?).

I know that eventually I have to run benchmarks on different configurations to find out, but as dual socket xeons with 20 cores are quite costly at the moment, I appreciate every comment!

meteoadriatic
Posts: 1510
Joined: Wed Aug 19, 2009 10:05 am

Re: WRF EMS Benchmark results and hardware

Post by meteoadriatic » Fri Feb 20, 2015 1:03 pm

Hello!
fog wrote:After reading this complete thread several times and checking other ressources regarding hardware for WRF, it seems clear to me that you have to find a good balance between CPU performance and RAM performance.

What I ask myself now, especially after reading that going from single channel ram config to dual channel config makes a huge difference:

Wouldn't it be best to give each physical core his "own" RAM channel, so none of the cores has to fight for memory access?
That is what is NUMA about (numad daemon on Centos) and this works and has sense only in multisocket environment whare you limit access to memory controller of particualar CPU socket. It would have no difference on single socket computers.

As I understand intels i7/xeons, you are limited to four RAM channels per physical processor at the moment, so this would be my first (naive?) configuration:

one fast quad core CPU with four RAM channels, 4(8) RAM sticks (single socket system)
two fast quad core CPUs with four RAM channels each, 8(16) RAM sticks (dual socket system)
...
fog wrote:Regarding raw computing performance (ignoring disk I/O for now), it seems to me that WRF can't benefit from CPUs with >> 4 cores, given a reasonably fast CPU (3-4GHz?).
It can because not all computing is done through northbridge (CPU-RAM path), but also in CPU cache, which means that more computing cores you have, you will always get faster computing, at least in theory. However, computing speed won't increase linearly with adding more cores, but more logarithmically.
fog wrote:I know that eventually I have to run benchmarks on different configurations to find out, but as dual socket xeons with 20 cores are quite costly at the moment, I appreciate every comment!
The best you can do with multi-CPU computers it to make sure memory configuration is fastest you can get. This usually means populating all memory banks with chips to achieve "performance mode", but you need to consult user manual of particular computer. Also make sure that you use latest generation hardware if you can, because every next generation is ~10-20% faster than previous, given the same number of cores and frequency.

Sure, if you can't afford too much costs, one socket I5/I7 systems of latest generation is safest choice and has best overall performance/price ratio for WRF.

Boogie
Posts: 17
Joined: Sun Jan 02, 2011 12:24 pm

Re: WRF EMS Benchmark results and hardware

Post by Boogie » Fri Mar 27, 2015 1:54 pm

Some new results with a fairly powerful 28 core system.

This is the NMM-large domain with no nesting.
Basic System Information for wrf

System Date : Fri Mar 27 17:20:49 2015 UTC
System Hostname : wrf
System Address :

System OS : Linux
Linux Distribution : CentOS Linux release 7.0.1406 (Core)
OS Kernel : 3.10.0-123.20.1.el7.x86_64
Kernel Type : x86_64

Processor and Memory Information for wrf

CPU Name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
CPU Instructions : sandybridge
CPU Type : 64-bit
CPU Speed : 1200.06 MHz

EMS Determined Processor Count
Physical CPUs : 2
Cores per CPU : 14
Total Processors : 28

Hyper-Threading : Off

System Memory : 62.6 Gbytes

EMS Release Information for wrf

EMS Release : 3.4.1.14.16
EMS Binaries : x64


Benchmark simulation length was 24 hours

Summary of nodes and processors used for benchmark simulation:

* 27 Processors on wrf
------------------------
* 27 Total Processors
* 1 Tile per Processor

* Internal Domain Decomposition

EMS NMM core benchmark simulation completed in 4 minutes 46 seconds
The result for NMM-small domain with no nesting:
EMS NMM core benchmark simulation completed in 1 minute 26 seconds
And finally the ARW large simulation with no nesting:
EMS ARW core benchmark simulation completed in 8 minutes 41 seconds

Antonix
Posts: 256
Joined: Fri Oct 16, 2009 8:53 am

Re: WRF EMS Benchmark results and hardware

Post by Antonix » Thu Apr 02, 2015 8:22 am

NMM small No Nesting


* Hey, hey! Your simulation appears to have been successful, just like you!


* The simulation output files have been moved to the wrfprd directory

Basic System Information for bora

System Date : Thu Apr 2 08:18:52 2015 UTC
System Hostname : bora
System Address : 192.168.0.195

System OS : Linux
Linux Distribution : jessie/sid
OS Kernel : 3.16.0-30-generic
Kernel Type : x86_64

Processor and Memory Information for bora

CPU Name : Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
CPU Instructions : sandybridge
CPU Type : 64-bit
CPU Speed : 1200 MHz

EMS Determined Processor Count
Physical CPUs : 2
Cores per CPU : 14
Total Processors : 28

Hyper-Threading : On

Note: Attempting to use virtual "Hyper-threaded" CPUs while
running the EMS may result in a degradation in performance.

System Memory : 125.7 Gbytes

EMS Release Information for bora

EMS Release : 3.4.1.14.16
EMS Binaries : x64


EMS NMM core benchmark simulation completed in 2 minutes


* Benchmark information is available in static/ems_benchmark.info

Larf
Posts: 16
Joined: Tue Dec 02, 2014 10:03 am
Location: Hamburg

Re: WRF EMS Benchmark results and hardware

Post by Larf » Fri Jun 12, 2015 7:58 am

Hi everybody,
I'm trying to analyze the costs for a new work station and the corresponding time consumption for generating a 10 year time series.
For that reason, I've got a fairly old work station for comparison reasons and to analyze times based on postet benchmark test.
So, if anyone cares for the results, it would be nasty to do all this calculating without posting the results, even if the Hardware is obsolete.
Anyway, this is the Hardware:

Basic System Information for etf4-wrf-testing2
System Date : Wed Jun 10 15:52:43 2015 UTC
System Hostname : etf4-wrf-testing2
System Address : 10.64.6.163
System OS : Linux
Linux Distribution : CentOS Linux release 7.1.1503 (Core)
OS Kernel : 3.10.0-229.1.2.el7.x86_64
Kernel Type : x86_64
Processor and Memory Information for etf4-wrf-testing2
CPU Name : Intel(R) Xeon(R) CPU X5272 @ 3.40GHz
CPU Instructions : penryn
CPU Type : 64-bit
CPU Speed : 2400 MHz
EMS Determined Processor Count
Physical CPUs : 2
Cores per CPU : 2
Total Processors : 4
Hyper-Threading : Off
System Memory : 31.4 Gbytes
EMS Release Information for etf4-wrf-testing2
EMS Release : 3.4.1.14.16
EMS Binaries : x64


And those are the times:

NMM l noNest 56,0 min
NMM l 1Nest 132,4 min
ARW l noNest 114,2 min
ARW l 1Nest 264,8 min
NMM s noNest 12,8 min
NMM s 1Nest 63,2 min
ARW s noNest 19,0 min
ARW s 1Nest 63,4 min

Does anyone have a suggestion for the best price-performance-ratio?
I want to generate 10 year historical time series in Europe in a discretisation of 50 km. Is that reasonable/ possible without clustering?
So far I extrapolated the calculation time by comparison from my old test work station to a high end work station (16 Cores, sufficient RAM- no swapping)
and found out that a single no nested 10 years time series on one spot consumes 86 days!!?!!
Impossibly long!
Feel invited to comment!
Greets
Larf

meteoadriatic
Posts: 1510
Joined: Wed Aug 19, 2009 10:05 am

Re: WRF EMS Benchmark results and hardware

Post by meteoadriatic » Fri Jun 12, 2015 11:35 am

Larf wrote:Does anyone have a suggestion for the best price-performance-ratio?
Hi, I believe that Intel i5 4-cores based system gives best ratio.
Larf wrote:I want to generate 10 year historical time series in Europe in a discretisation of 50 km.
That means 50km horizontal resolution over whole Europe?


If I'm right, without testing or calculating, just doing some quick mental estimation, I would say that you can do 10yr reanalysis over Europe with 50km resolution in few months of constant running on Core i5. If you need that to be completed faster, then of course you will need more computers available.

windyweek
Posts: 28
Joined: Thu Aug 21, 2014 8:46 am

Re: WRF EMS Benchmark results and hardware

Post by windyweek » Fri Jun 12, 2015 4:58 pm

NMM large (no-nesting) benchmark for 3 different systems as I played with them all these days:

Intel(R) Xeon(R) CPU E5540 @ 2.53GHz, 2 CPUs, 8 cores, HT off - 18 minutes 28 seconds
Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz, 1 CPU, 6 cores, HT off - 16 minutes 6 seconds
Intel(R) Xeon(R) CPU E3-1245 v2 @ 3.40 GHz, 1 CPU, 4 cores, HT off - 25 minutes 31 seconds

What is interesting to note is that according to a "conventional CPU mark" (http://www.cpubenchmark.net):
- 3930K should be much more faster than dual E5540.
- E3-1245v2 should be similar in performance to a dual E5540.

And both are not true for WRF.

My explanation (and perhaps the official verdict ;) ) is (again) the memory bandwidth and especially the memory channels per core ratio. E5540 despite being the oldest processor had a total of 6 memory channels in the configuration above (1:0.75 ratio core per channel), 3930K had 4 (1:0.66), and E3-1245 v2 had only 2 (1:0.5).

So, yeah - don't look for CPU frequency or a lot of memory but look for memory channels per core when running WRF.
Just sharing my observations ;)

Cheers,
Ivan

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest