Fatal Error in PMPI_Wait?

Posted: Mon Dec 19, 2016 12:17 pm
by fsugrizz
I've been getting the following error over the pass 3 days in a model run that we've been using daily with no issues for a couple of months now.

taskid: 9 hostname:node1
Fatal error in PMPI_Wait: Unknown error class, error stack:
PMPI_Wait(203)..................: MPI_Wait(request)=0x4e52440, status=0x7ffff288ffc0) failed
MPIDU_Complete_posted_with_error(1149):Process failed
The cluster still passes MPI_check and I can provide the rest of the logs but I'm just curious if anyone has an idea. I did notice that NASA Sports database changed the ftp location the other day so my gribinfo file was incorrect but I'm still at a loss now.

Re: Fatal Error in PMPI_Wait?

Posted: Thu Jun 18, 2020 9:50 am
by Hachi-ait
Did you solve this problem?
Could you please share for me?
I encountered for the same, I run successfully the cluster for several times, using the command:

mpiexe --machinefile machinefile.txt ./wrf.exe
But now I ran for a larger nesting domain, after 20mins it got the error:

WRF TILE   1 IS      1 IE    154 JS     41 JE     80
Fatal error in PMPI_Wait: Unknown error class, error stack:
PMPI_Wait(216)........................: MPI_Wait(request=0x87a719c, status=0x7ffe675bdca0) failed
MPIDU_Complete_posted_with_error(1137): Process failed