Inconsistent time per timestep using adaptive ts

All other topics about postprocessing model data (GrADS and other software), about other numerical weather prediction software (including WRF-NMM and WRF-ARW discussion unrelated to UEMS/WRF EMS), and general meteorology talk go in this forum.
Post Reply
meteoadriatic
Posts: 1510
Joined: Wed Aug 19, 2009 10:05 am

Inconsistent time per timestep using adaptive ts

Post by meteoadriatic » Fri Aug 11, 2017 3:07 pm

Hello.

I have strange behaviour on one computer. It is dual Xeon E5-2620 v3. Generally nice and fast but if I use adaptive time step (as I always want to) I have very strange inconsistency with performance.

Actual time step is just fine but what is not is time per timestep!

For example this is one rsl.out.0000 output at run start:

Code: Select all

 Tile Strategy is not specified. Assuming 1D-Y
WRF TILE   1 IS      1 IE     66 JS      1 JE      3
WRF TILE   2 IS      1 IE     66 JS      4 JE      6
WRF TILE   3 IS      1 IE     66 JS      7 JE      8
WRF TILE   4 IS      1 IE     66 JS      9 JE     11
WRF NUMBER OF TILES =   4
Timing for main (dt= 30.00): time 2017-08-11_00:00:30 on domain   1:    1.60754 elapsed seconds
Timing for main (dt= 31.50): time 2017-08-11_00:01:01 on domain   1:    0.10679 elapsed seconds
Timing for main (dt= 33.08): time 2017-08-11_00:01:34 on domain   1:    0.10773 elapsed seconds
Timing for main (dt= 34.73): time 2017-08-11_00:02:09 on domain   1:    0.11009 elapsed seconds
Timing for main (dt= 36.47): time 2017-08-11_00:02:45 on domain   1:    0.10913 elapsed seconds
Timing for main (dt= 38.29): time 2017-08-11_00:03:24 on domain   1:    0.10669 elapsed seconds
Timing for main (dt= 40.20): time 2017-08-11_00:04:04 on domain   1:    0.10780 elapsed seconds
Timing for main (dt= 42.21): time 2017-08-11_00:04:46 on domain   1:    0.11021 elapsed seconds
Timing for main (dt= 44.32): time 2017-08-11_00:05:30 on domain   1:    0.10720 elapsed seconds
Timing for main (dt= 46.54): time 2017-08-11_00:06:17 on domain   1:    0.10789 elapsed seconds
Timing for main (dt= 48.87): time 2017-08-11_00:07:06 on domain   1:    0.10854 elapsed seconds
Timing for main (dt= 51.31): time 2017-08-11_00:07:57 on domain   1:    0.10734 elapsed seconds
Timing for main (dt= 53.88): time 2017-08-11_00:08:51 on domain   1:    0.10791 elapsed seconds
.....
Then if I stop it and start it again (everything the same!), time step generally increases the same, BUT what does differ is last column that displays time elapsed to calculate one time step. And sometimes it is just fine, but frequently (and without any regularity), this goes wild. For example, elapsed seconds for a timestep now are ~0.11 but if it goes crazy it can reach even 3x more (~0.3 or something) for the same steps.. Yes same steps, I'm not talking about increasing elapsed time when time step increases or when radiation schema kicks in or something like that. I'm talking about same forecast time steps (i.e. start of the same run like shown above).

Of course there is no obvious things like something else running on server, nothing runs... system load is exact = 22.0 (I use 22 of 24 available cores), nothing obvious happens but run slows down significantly.

I was banging my head against the wall several days. Until I tried fixed time step. Guess what? It runs at exact speed as expected, comparable performance from run to run is within seconds from each other of total running time, elapsed seconds per timestep are exactly the same up to 2nd or 3rd decimal place, and so on... no problems whatsoever.

And to mention - I tried 3.9, 3.8.1, same result. Get rid of everything but essentials from namelist, same result. Any other computer runs just fine with adaptive timestep, this one does not and as a result, adaptive timestep runs are usually slower than fixed. And I can't count that run will reliably finish in expected time frame at all :(

Anybody encountered anything similar?

meteoadriatic
Posts: 1510
Joined: Wed Aug 19, 2009 10:05 am

Re: Inconsistent time per timestep using adaptive ts

Post by meteoadriatic » Fri Aug 11, 2017 4:43 pm

It is also on fixed :( It seems it was a coincidence leading me to the wrong conclusion.

Probably faulty hardware... will see.

emsiwx
Posts: 75
Joined: Sun Aug 12, 2012 11:07 am

Re: Inconsistent time per timestep using adaptive ts

Post by emsiwx » Tue Aug 22, 2017 4:01 pm

Hi Ivan,

I did not think of this as a problem, before you posted your experience.

I face the same thing on my Intel Xeon E5-2630 v4 (Procesor 10-Core, 2.20GHz (85W), TurboBoost 3.1GHz, HT, 25MB L3 cache, socket 2011-3, 14nm, Broadwell) with 2x Corsair 16GB KIT DDR4 2133MHz CL15 Vengeance LPX , Quad channel.

Moreover, I think I am not getting the speed I had expected before I bought and assembled my WRF machine :/

Marian

meteoadriatic
Posts: 1510
Joined: Wed Aug 19, 2009 10:05 am

Re: Inconsistent time per timestep using adaptive ts

Post by meteoadriatic » Tue Aug 22, 2017 4:20 pm

Are your run times also non-consistent if you use fixed timestep?

emsiwx
Posts: 75
Joined: Sun Aug 12, 2012 11:07 am

Re: Inconsistent time per timestep using adaptive ts

Post by emsiwx » Tue Aug 22, 2017 6:39 pm

I will try the next run, now I have one running. I will let you know.

Marian

emsiwx
Posts: 75
Joined: Sun Aug 12, 2012 11:07 am

Re: Inconsistent time per timestep using adaptive ts

Post by emsiwx » Wed Aug 23, 2017 3:05 pm

Hi Ivan,

so here are my findings:

Running just single domain: time is inconsistent on both settings (adaptive/fixed).

Running mother + one nested domain (12/4km) : adaptive - inconsistent when calc mother domain, all times for nested domain are OK.
Fixed - both mother and nested domain times are inconsistent (including final/total run time) ... maybe beacause of selecting a bad fixed time.

Anyway, I have found out that my setting for time step was Auto_S (which is default, but I did not know that :/ ).
After running 12+4km setup with adaptive time steps I saved 2 hours of computational time. With default Auto_S, run time varied between 5 to 5 and a half hours.
With adaptive time step, the total run time was 3h 16m.

Can this have a negative impact on the output? (I mean using adaptive instead of Auto_S, or fixed)

Thank you.

Marian

meteoadriatic
Posts: 1510
Joined: Wed Aug 19, 2009 10:05 am

Re: Inconsistent time per timestep using adaptive ts

Post by meteoadriatic » Wed Aug 23, 2017 3:12 pm

emsiwx wrote:
Wed Aug 23, 2017 3:05 pm
With default Auto_S, run time varied between 5 to 5 and a half hours.
That should not happen. It should be the same pretty much within minutes if not seconds of difference. Of course if computer does only that job.
Can this have a negative impact on the output? (I mean using adaptive instead of Auto_S, or fixed)
Possibly but probably nothing to worry about.

emsiwx
Posts: 75
Joined: Sun Aug 12, 2012 11:07 am

Re: Inconsistent time per timestep using adaptive ts

Post by emsiwx » Wed Aug 23, 2017 3:33 pm

Yes , the PC is dedicated only to WRF, even more only for pure computing, all Grads and other jobs are taken by another PC.

Using all 10 cores... when I left 1 core free, the run time was slower. (But that was on Auto_S, I can try with adaptive time step.)

emsiwx
Posts: 75
Joined: Sun Aug 12, 2012 11:07 am

Re: Inconsistent time per timestep using adaptive ts

Post by emsiwx » Wed Aug 23, 2017 7:44 pm

Since I switched to adaptive ts, 5 runs so far, total run time ranging from 3:14 to 3:16. I realize you said it should be the same within a minute if not within few seconds, but for me this not only almost half the time compared to auto_s, but so far the range is not half an hour like before.

Anyway, thinking about switching to Intel Core i9-7980XE
Procesor 18-Core, HT, 2,6 GHz (165 W), TurboBoost 4,2 GHz, TurboBoost 3.0 4,4 GHz, 24,75 MB L3 cache, 44 PCIe lanes, DDR4 SDRAM 2666Mhz, socket 2066, Skylake-X

Much cheaper with pre-order price than Xeon CPUs. I know that the power consumption is double what I have now, but hope it will be worth of the speed.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest