Intel libraries
Re: Intel libraries
OK, I'm still using 19.4 - I haven't tried 19.5.
What was the output you received when you ran the recommended test (ldd wrfm_arw.exe.intel)?
Are you only running on a single machine? (specs?)
Can you post the contents of your /etc/hosts file?
What was the output you received when you ran the recommended test (ldd wrfm_arw.exe.intel)?
Are you only running on a single machine? (specs?)
Can you post the contents of your /etc/hosts file?
Re: Intel libraries
@meteoadriatic
I didn't try yet...
I didn't try yet...
Re: Intel libraries
Thanks god it's working !!!!!!!!!!
Oh dear , I am shocked !!
I mean it is working with uems intel compiled binaries ! It is approximately faster than PGI compiled binaries by approximately 20-30% on my system.
I have found the solution by digging deeper and deeper into the internet.
See here : https://software.intel.com/en-us/forums ... pic/270043
And here : http://wiki.seas.harvard.edu/geos-chem/ ... n_Compiler
So the point is the stack size. In my system it was set by default at 64 MB. Obviously this is not enough for Intel binaries. In the terminal window I have typed :
ulimit -s unlimited
export OMP_STACKSIZE=500M
And then run !
Thanks to all of you guys for your support,
Alain


Oh dear , I am shocked !!
I mean it is working with uems intel compiled binaries ! It is approximately faster than PGI compiled binaries by approximately 20-30% on my system.
I have found the solution by digging deeper and deeper into the internet.
See here : https://software.intel.com/en-us/forums ... pic/270043
And here : http://wiki.seas.harvard.edu/geos-chem/ ... n_Compiler
So the point is the stack size. In my system it was set by default at 64 MB. Obviously this is not enough for Intel binaries. In the terminal window I have typed :
ulimit -s unlimited
export OMP_STACKSIZE=500M
And then run !
Thanks to all of you guys for your support,
Alain
-
- Posts: 1604
- Joined: Wed Aug 19, 2009 10:05 am
Re: Intel libraries
Hi,
I didn't realize that your post on official wrf forum is from you until now
Great that you solved the problem, yes stack size is usually first thing that has to be adjusted for WRF generally, it is strange that UEMS does not set it high enough by default.
I didn't realize that your post on official wrf forum is from you until now

Re: Intel libraries
Hello,
I also see a really good speedup of approx. 30% at least when using the Intel binaries - a really huge improvement! I am only running on one machine and I only ran into one error. The solution was printed in the error message fortunately, leaving this here for future reference.
I had to set:
...and to make this change permanently on my system (Ubuntu 18.04 on AWS) I had to edit /etc/sysctl.d/10-ptrace.conf, setting "0" instead of "1" there.
Best regards,
Jonas
I also see a really good speedup of approx. 30% at least when using the Intel binaries - a really huge improvement! I am only running on one machine and I only ran into one error. The solution was printed in the error message fortunately, leaving this here for future reference.
I had to set:
Code: Select all
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
Best regards,
Jonas
Re: Intel libraries
I'm trying to use intel executables on a new cluster:
Old configuration: Just one master server 28 cores, 1 domain non nested (resolution 4km), intel executables: no problem
New config: 1 master + 1 node (same servers) 48 cores, 1 domain (same config but 3km), intel executables: sometimes I have an error:
In rsl.error.0000:
In rsl.error.0024:
If I retry the run, exactly the same bug, at the same time.
Of course I have typed:
ulimit -s unlimited
export OMP_STACKSIZE=500M
And at the next run: it's OK.... or not and again the same bug. Sometimes it's at the beginning, at the middle, at the end of the run. And if I retry the run, the bug occurs again at the same time.
It's very random
I will try a couple of runs of old domain (4km) on this cluster
Old configuration: Just one master server 28 cores, 1 domain non nested (resolution 4km), intel executables: no problem
New config: 1 master + 1 node (same servers) 48 cores, 1 domain (same config but 3km), intel executables: sometimes I have an error:
Code: Select all
Simulation Failed (101)! I hate when this #%^!#!!% happens.
System Signal Code (SN) : 101 (Unknown Signal)
Here is some information from rsl.error.0000:
----------------------------------------------------------------------------------------
Error Log: forrtl: severe (174): SIGSEGV, segmentation fault occurred
----------------------------------------------------------------------------------------
System Signal Code (SN) : 101 (Unknown Signal)
Here is some information from rsl.error.0024:
----------------------------------------------------------------------------------------
Error Log: MPIDU_Complete_posted_with_error(1710): Process failed
Code: Select all
Timing for main (dt= 24.00): time 2019-10-12_01:12:24 on domain 1: 0.55257 elapsed seconds
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Code: Select all
WRF NUMBER OF TILES = 4
Fatal error in PMPI_Wait: Unknown error class, error stack:
PMPI_Wait(219)........................: MPI_Wait(request=0x5a33b7c, status=0x7ffe94cb6d80) failed
MPIR_Wait_impl(100)...................: fail failed
MPIDU_Complete_posted_with_error(1710): Process failed
Of course I have typed:
ulimit -s unlimited
export OMP_STACKSIZE=500M
And at the next run: it's OK.... or not and again the same bug. Sometimes it's at the beginning, at the middle, at the end of the run. And if I retry the run, the bug occurs again at the same time.
It's very random
I will try a couple of runs of old domain (4km) on this cluster
Re: Intel libraries
Hello meteo60,
In the newest UEMS version I have a similar problem.
The intel libraries worked fine by setting unlimited stack size and OMP_STACKSIZE at an arbitrary high value, for the previous UEMS version. But with this newest UEMS version, it doesn't work. I have the same 174 SIGSERV error, and always after the first wrfout time step.
For the moment I gave up and use the PGI compiled libraties.
Robert might have changed something in the compilation options, or there is a new bug introduced somewhere in the code.
Alain
In the newest UEMS version I have a similar problem.
The intel libraries worked fine by setting unlimited stack size and OMP_STACKSIZE at an arbitrary high value, for the previous UEMS version. But with this newest UEMS version, it doesn't work. I have the same 174 SIGSERV error, and always after the first wrfout time step.
For the moment I gave up and use the PGI compiled libraties.
Robert might have changed something in the compilation options, or there is a new bug introduced somewhere in the code.
Alain
Re: Intel libraries
Which OS do you use?
I have always that issue (174) sometimes, despite OMP_STACKSIZE etc.
It's very random again: most runs are ok, but sometimes it crashes.
When a run crashes, I retry it with other datas. I usually use GFSP25PT, if crash I retry with GFSP25 and it's ok... strange...
I have always that issue (174) sometimes, despite OMP_STACKSIZE etc.
It's very random again: most runs are ok, but sometimes it crashes.
When a run crashes, I retry it with other datas. I usually use GFSP25PT, if crash I retry with GFSP25 and it's ok... strange...
Re: Intel libraries
Hello meteo60,
I use Centos 7.
I have tried the standard kernel 3.10 and also the newest kernel 5.2. No change.
I can set OMP_STACKSIZE at any arbitrary high value but no change.
Other strange behaviour : I have a very large domain but with DX = 25km. This one runs but all the wrfout files seem corrupted.
With a 3 domains run, it always crashes.
I use Centos 7.
I have tried the standard kernel 3.10 and also the newest kernel 5.2. No change.
I can set OMP_STACKSIZE at any arbitrary high value but no change.
Other strange behaviour : I have a very large domain but with DX = 25km. This one runs but all the wrfout files seem corrupted.
With a 3 domains run, it always crashes.

Re: Intel libraries
I use centos 7 too
I think we have to wait for Robert to resolve this issue....
I think we have to wait for Robert to resolve this issue....