WRF EMS Benchmark results and hardware

Looking for new hardware to run WRF? Intel or AMD? Check this forum.
meteoadriatic
Posts: 1587
Joined: Wed Aug 19, 2009 10:05 am

Re: WRF EMS Benchmark results and hardware

Post by meteoadriatic » Wed Sep 18, 2013 4:27 pm

Hello,

low power consumption and high processing speed are opposite requirements, you can't satisfy both :)

theocarter2911
Posts: 70
Joined: Fri Mar 08, 2013 5:22 am

Re: WRF EMS Benchmark results and hardware

Post by theocarter2911 » Fri Sep 20, 2013 2:36 am

Hi All,
I am setting up an operational machine to run nmm and am doing benchmarks in order to tweak my new dual octacore xeon machine to optimum (without breaking anything). It has to grind along for the next 3 years running 4 times a day and uptime is essential.

When I did my first set of benches, I used the nmm_small without nesting in my comparisons against an i5 quadcore machine running the exact same OS (scientific linux 6.2 - which is the same as centos 6.2 and rhel 6.2) and the same install.pl from Robert in order to compare apples with apples. That raised an alarm bell, as the 16-core finished in ~5mins and the 4-core in ~6mins. After consulting with Robert, he explained that one should do the larger nested tests for bigger hardware, and that the new benchmarks now will use one less core than what is available on a system, as that is most likely to lead to a speed improvement, and he was spot on right!

I did a whole load of benchmarks, and will post the very last one from the 16-core here, but the summation is that the 16-core is about twice as fast as the 4-core, around 30 mins versus 60 mins. And that using only 15 of the 16 cores with dcomp 1 (not 0) set gives a 10% improvement over using all 16 cores. Funnily enough, even with 15 cores set and dcomp 0, it is 10% slower. (As it is nmm one cannot change numtiles)

Basic System Information for localhost
System Date : Thu Sep 19 07:45:15 2013 UTC
System Hostname : localhost
System Address : 127.0.0.1
System OS : Linux
Linux Distribution : Scientific Linux release 6.2 (Carbon)
Kernel \r on an \m
OS Kernel : 2.6.32-358.18.1.el6.x86_64
Kernel Type : x86_64
Processor and Memory Information for localhost
CPU Name : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
CPU Instructions : sandybridge
CPU Type : 64-bit
CPU Speed : 2900 MHz
EMS Determined Processor Count
Physical CPUs : 2
Cores per CPU : 8
Total Processors : 16
Hyper-Threading : Off
System Memory : 31.3 Gbytes
EMS Release Information for localhost
EMS Release : 3.4.1.13.37
EMS Binaries : x64
Benchmark simulation length was 24 hours
Summary of nodes and processors used for benchmark simulation:
* 15 Processors on localhost
------------------------------
* 15 Total Processors
* 1 Tile per Processor
* 1 x 15 Domain Decomposition
EMS NMM core (LARGE) benchmark nested simulation completed in 29 minutes 42 seconds

What I did notice during running these benchmarks, the 16th (disabled) processor runs continuously at 05-10%, which may explain that there is a fair bit of work being done on it, and taking it out of the cpu's queue doing work on tiles allow the other 15 to just number crunch without interruptions, making the whole thing faster.

I have a question though, does anyone have any further ideas on other speed improvements that will not break the machine? I would rather have an optimized tractor, than a short-life ferrari!

Cheers,
Theo Carter - Dubai

meteoadriatic
Posts: 1587
Joined: Wed Aug 19, 2009 10:05 am

Re: WRF EMS Benchmark results and hardware

Post by meteoadriatic » Fri Sep 20, 2013 6:45 am

Hello, this is not good. It must run faster.

First question and first suspicion: you have 32GB of RAM, what is your RAM configuration? (how many memory modules, and in which slots)? Also it would be useful to know what motherboard (or computer model) is it.

Explanation. You have three-channel memory controller, but if you use it as only one-channel and possibly on lower clock speed than maximum, you severly limit your WRF performance. It doesn't matter what and how many CPUs you have if your memory bandwidth is bottleneck. I got 50% speed improvement when moved from one-channel to three-channel with two Xeon E-5570 CPUs. With faster CPUs difference could be even bigger.

Second is an advice; you will get faster calculation if you build your own WRF exe binaries with Intel compiler (it is free for non-commercial use). Your CPUs support AVX instruction set. If you optimize binary code to use AVX instructions, it will run around 10% faster than code compiled for SSE instruction set. I don't know how much are optimized new EMS binaries for new CPUs (Robert use PGI), but when I compared those from v3.1 and v3.2, my binaries build with Intel and optimized for AVX were 10-20% faster than EMS binaries.

theocarter2911
Posts: 70
Joined: Fri Mar 08, 2013 5:22 am

Re: WRF EMS Benchmark results and hardware

Post by theocarter2911 » Fri Sep 20, 2013 9:51 am

Hi Meteoadriatic,

Thanks for the fast response and suggestions! I will certainly follow up on the ram 3-channel idea. I have two off days and will sort that out on Monday.

As for the idea on compiling my own executable, I remember loads of issues when I built wrf some years back myself - if you built on PGI for one part of wrf, you had to use the same compilers for all parts. Is that not the case anymore?

Another quick question while we converse - have you tried ingesting your own observations (obsproc or such)? I will post a query on these pages soon to beg for a how-to.

Thanks again!
Theo

meteoadriatic
Posts: 1587
Joined: Wed Aug 19, 2009 10:05 am

Re: WRF EMS Benchmark results and hardware

Post by meteoadriatic » Fri Sep 20, 2013 11:59 am

theocarter2911 wrote:As for the idea on compiling my own executable, I remember loads of issues when I built wrf some years back myself - if you built on PGI for one part of wrf, you had to use the same compilers for all parts. Is that not the case anymore?
You must compile with same compiler as wrf, those things that are coded into real and wrf exe binaries during compiling wrf. It is basically only netcdf and mpich2. All pieces of code other than wrf and real binaries itself can be used from EMS distribution without recompiling. You can if you want recompile mpich binaries, wps (geogrid, metgrid), emspost, ... but all that isn't needed to work with recompiled real/wrf.

But I suggest, try to optimize hardware first, then you can play with software optimization.
theocarter2911 wrote:Another quick question while we converse - have you tried ingesting your own observations (obsproc or such)? I will post a query on these pages soon to beg for a how-to.
I'm considering local observations to use with obsgrid, however it does not look like basic task so I didn't tried it yet.

meteoadriatic
Posts: 1587
Joined: Wed Aug 19, 2009 10:05 am

Re: WRF EMS Benchmark results and hardware

Post by meteoadriatic » Fri Sep 20, 2013 3:35 pm

I decided to see how my dual Xeon E-5570 with 3-Ch memory holds against those results.

So this is fresh new EMS installation, completely unmodified, and without changing anything in domain decomposition, etc...


NMM LARGE:

Code: Select all

            Basic System Information for wrf
            
                System Date           : Fri Sep 20 15:18:33 2013 UTC
                System Hostname       : wrf
                System Address        : 127.0.0.1
            
                System OS             : Linux
                Linux Distribution    : CentOS release 6.4 (Final)
            Kernel \r on an \m
                OS Kernel             : 2.6.32-358.14.1.el6.x86_64
                Kernel Type           : x86_64
            
            Processor and Memory Information for wrf
            
                CPU Name              : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
                CPU Instructions      : nehalem
                CPU Type              : 64-bit
                CPU Speed             : 2933.65 MHz
            
                EMS Determined Processor Count
                    Physical CPUs     : 2
                    Cores per CPU     : 4
                    Total Processors  : 8
            
                Hyper-Threading       : Off
                  
                System Memory         : 11.6 Gbytes
            
            EMS Release Information for wrf
            
                EMS Release           : 3.4.1.13.37
                EMS Binaries          : x64


         EMS NMM core benchmark nested simulation completed in 42 minutes 38 seconds
How does it compare to fastest Haswell i5 system with overclocked memory (2133MHz)? Well... 42 minutes (dual Xeon) vs 48 minutes (i5 Haswell) for large bench. Those cheap i5 CPUs, especially Haswells are really the best you can buy for what they cost. But if you need more... you must pay a lot more.

How does it compare to your results? 42 minutes vs 30 minutes (but you did domain decomposition?). Whatsoever, by looking at this, your result does not seem too bad! It should be faster but not much.

I like to check this table:
http://www.cpubenchmark.net/multi_cpu.html

It seems to me that those banchmarks can be taken as pretty close approximation how fast your WRF will run. If you take 20.750 for dual E5-2690 vs 9.800 for dual X5570, it should estimate that your WRF runs will take around half time that my runs takes (something between 20 and 25 minutes for NMM large). I guess that this is a difference which you can get with faster memory setup and that will be it.

theocarter2911
Posts: 70
Joined: Fri Mar 08, 2013 5:22 am

Re: WRF EMS Benchmark results and hardware

Post by theocarter2911 » Sat Sep 21, 2013 5:23 am

Hi Meteoadriatic,

My domain decomposition took me from 34 mins to 29 mins, around 10%. Which was the same when I took it from 16 to 15 processors, strangely enough. In any case - I also note that my quad i5 was around 60 mins with stock settings (but with 3 processors). And you got 42 mins which is a lot better.

I will certainly have a look at the ram, it had been in the back of my mind, but as it was a Dell workstation agency build I had put it at the bottom of my mental priority list (my first thought was that they would know enough to make it triple channel). I will make sure about this tomorrow afternoon.

As for rebuilding software from source, I probably would not go there, the whole reason I went the wrfems route is that Robert puts it all together in such a brilliant way, and building all the stuff oneself took me a month the last time I tried it 5 years ago, I am an operational forecaster, not a programmer - and am seriously out of my depth with the dependency hell problems and with tweaking configuration settings on compilers etc.

I will do basic tweaks like trying to make read/writes to hard disks faster, making sure the ram is good, making sure the settings that Robert suggests are optimized (like the 15 instead of 16 processors) etc.. And then my time must go to tweaking usgs landuse tables (I have some land where there is sea locally - and modis is even worse here now -this is strange because modis used to be pretty good over the uae) and maybe maybe look at ingesting local obs if I can find a very good beginners-how-to from someone who has done it for this system. I know that will be a big task for me personally. But the benefit may be worth the time cost.

In any case, I will look at the ram tomorrow. I appreciate the help and suggestions from you!
Theo

theocarter2911
Posts: 70
Joined: Fri Mar 08, 2013 5:22 am

Re: WRF EMS Benchmark results and hardware

Post by theocarter2911 » Tue Sep 24, 2013 2:25 am

Hi Meteoadriatic,

My assumption was right, the ram is tri-channel and fitted correctly, and I ran a memtest86+ showing zero errors, so it appears that the memory is fine. Four 8 gig modules. But it was a really good suggestion to make sure of that!

So apart from building the executables on the machine, I am out of further ideas to speed up my system. I have noticed a sluggishness to mouse and desktop graphics (moving windows around and such) that I am not used to seeing on a linux machine, and will see if I can error trace that. Perhaps it has something to do with the "new" processors and system.. But it could also be poor graphics card or display which would not affect model runtime.

Thanks for all your suggestions!
Theo

meteoadriatic
Posts: 1587
Joined: Wed Aug 19, 2009 10:05 am

Re: WRF EMS Benchmark results and hardware

Post by meteoadriatic » Tue Sep 24, 2013 6:41 am

First, what is your exact computer model?

Secondy, did you upgraded the system from 6.2 to latest? Maybe you don't have fresh kernel modules if you didn't.

theocarter2911
Posts: 70
Joined: Fri Mar 08, 2013 5:22 am

Re: WRF EMS Benchmark results and hardware

Post by theocarter2911 » Tue Sep 24, 2013 2:52 pm

It is a Dell Poweredge T620. It is the dell dual cpu motherboard running two 8-core E5-2690 xeons and 32M ram. My linux kernel is 2.6.32-358.18.1.el6.x86_64. The operating system is Scientific Lunix (same thing as centos or rhel but I like the idea of supporting the scientific community) 6.2 Carbon.

Post Reply