suggestions and curiosities about running WRF on HPC.

Looking for new hardware to run WRF? Intel or AMD? Check this forum.
Post Reply
Antonix
Posts: 256
Joined: Fri Oct 16, 2009 8:53 am

suggestions and curiosities about running WRF on HPC.

Post by Antonix » Sat Apr 04, 2015 2:20 pm

By some days I'm using WRF on a new cluster composed of 3 node.
Each node consists of :
2 X Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz (2x14 core)
with 256 gb RAM DDR4 in quad-channel.
Ethernet and switch configurate at 10 giga. (9.9 throughput)
Ubuntu 14.04 LTS server


performances are good but not as much as I expected. Do you have any advice to improve performance?
something very strange that I have noticed (maybe stupid) is that the iterations of the run are not constant. Every six minutes is done one iteration VERY slow (25 sec.). as seen by the following lines.
How is it going?

WRF TILE 1 IS 1 IE 114 JS 1 JE 5
WRF TILE 2 IS 1 IE 114 JS 6 JE 10
WRF TILE 3 IS 1 IE 114 JS 11 JE 15
WRF TILE 4 IS 1 IE 114 JS 16 JE 19
WRF TILE 5 IS 1 IE 114 JS 20 JE 23
WRF TILE 6 IS 1 IE 114 JS 24 JE 27
WRF TILE 7 IS 1 IE 114 JS 28 JE 31
WRF TILE 8 IS 1 IE 114 JS 32 JE 35
WRF TILE 9 IS 1 IE 114 JS 36 JE 39
WRF TILE 10 IS 1 IE 114 JS 40 JE 43
WRF TILE 11 IS 1 IE 114 JS 44 JE 47
WRF TILE 12 IS 1 IE 114 JS 48 JE 52
WRF TILE 13 IS 1 IE 114 JS 53 JE 57
WRF TILE 14 IS 1 IE 114 JS 58 JE 62
WRF NUMBER OF TILES = 14
Timing for main: time 2015-04-02_06:00:30 on domain 1: 66.62803 elapsed seconds
Timing for main: time 2015-04-02_06:01:00 on domain 1: 3.03232 elapsed seconds
Timing for main: time 2015-04-02_06:01:30 on domain 1: 3.02832 elapsed seconds
Timing for main: time 2015-04-02_06:02:00 on domain 1: 3.04470 elapsed seconds
Timing for main: time 2015-04-02_06:02:30 on domain 1: 3.06013 elapsed seconds
Timing for main: time 2015-04-02_06:03:00 on domain 1: 3.02011 elapsed seconds
Timing for main: time 2015-04-02_06:03:30 on domain 1: 3.01019 elapsed seconds
Timing for main: time 2015-04-02_06:04:00 on domain 1: 3.02287 elapsed seconds
Timing for main: time 2015-04-02_06:04:30 on domain 1: 3.02146 elapsed seconds
Timing for main: time 2015-04-02_06:05:00 on domain 1: 3.14987 elapsed seconds
Timing for main: time 2015-04-02_06:05:30 on domain 1: 3.03700 elapsed seconds
Timing for main: time 2015-04-02_06:06:00 on domain 1: 3.09038 elapsed seconds
Timing for main: time 2015-04-02_06:06:30 on domain 1: 26.79108 elapsed seconds
Timing for main: time 2015-04-02_06:07:00 on domain 1: 3.09991 elapsed seconds
Timing for main: time 2015-04-02_06:07:30 on domain 1: 3.09688 elapsed seconds
Timing for main: time 2015-04-02_06:08:00 on domain 1: 3.09161 elapsed seconds
Timing for main: time 2015-04-02_06:08:30 on domain 1: 3.10296 elapsed seconds
Timing for main: time 2015-04-02_06:09:00 on domain 1: 3.10976 elapsed seconds
Timing for main: time 2015-04-02_06:09:30 on domain 1: 3.06776 elapsed seconds
Timing for main: time 2015-04-02_06:10:00 on domain 1: 3.23266 elapsed seconds
Timing for main: time 2015-04-02_06:10:30 on domain 1: 3.09183 elapsed seconds
Timing for main: time 2015-04-02_06:11:00 on domain 1: 3.09651 elapsed seconds
Timing for main: time 2015-04-02_06:11:30 on domain 1: 3.10491 elapsed seconds
Timing for main: time 2015-04-02_06:12:00 on domain 1: 3.09970 elapsed seconds
Timing for main: time 2015-04-02_06:12:30 on domain 1: 26.51385 elapsed seconds
Timing for main: time 2015-04-02_06:13:00 on domain 1: 3.12292 elapsed seconds
Timing for main: time 2015-04-02_06:13:30 on domain 1: 3.12404 elapsed seconds
Timing for main: time 2015-04-02_06:14:00 on domain 1: 3.11552 elapsed seconds
Timing for main: time 2015-04-02_06:14:30 on domain 1: 3.13329 elapsed seconds
Last edited by Antonix on Sun Apr 05, 2015 9:05 am, edited 1 time in total.

meteoadriatic
Posts: 1510
Joined: Wed Aug 19, 2009 10:05 am

Re: suggestions and curiosities about running WRF on HPC.

Post by meteoadriatic » Sat Apr 04, 2015 8:40 pm

Antonix wrote:performances are good but not as much as I expected.
I would guess network congestion (either throughput or latency) or too much threads for domain size or... I just don't like network clusters.
Do you have any advice to improve performance?
Break your domain into three separate pieces (domains) and run them on it's own computer, if that is acceptable solution.
something very strange that I have noticed (maybe stupid) is that the iterations of the run are not constant. Every six minutes is done one iteration VERY slow (25 sec.). as seen by the following lines.
How is it going?
When phyisics scheme is called it has to do it's job... probably radiation scheme is what causes those slowdowns every 6 minutes (360s) in your case.

Antonix
Posts: 256
Joined: Fri Oct 16, 2009 8:53 am

Re: suggestions and curiosities about running WRF on HPC.

Post by Antonix » Sun Apr 05, 2015 7:17 am

Hi meteoadriatic

For me it would be a good solution but how does this decomposition?
now I use a tiling of a 14x6 and a numtile=14 (14 core for CPU. I have 28 physical cores for node.)
how can I divide the domain in 3 part, 1 for each node?
The size of the domain are 900x800, I do not think it's a problem domain, small for too many cpu.

For ethernet configuration i have this result with netperf :

netperf -H 10.0.0.2 -f m
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.2 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.00 9733.03

9733.03...this is very high (and good) value for 10 giga ethernet

meteoadriatic
Posts: 1510
Joined: Wed Aug 19, 2009 10:05 am

Re: suggestions and curiosities about running WRF on HPC.

Post by meteoadriatic » Sun Apr 05, 2015 11:11 am

Antonix wrote:Hi meteoadriatic

For me it would be a good solution but how does this decomposition?
how can I divide the domain in 3 part, 1 for each node?
Just use three separate geographic areas that will together cover whole area of interest, create three different domains, and run each on it's own computer. With three you will probably not be able to nice cover square area. With four you would be able to do it ideally.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest