EMS_RUN Failed - MPI Errors

Forum dedicated to older versions of EMS package (WRFEMS v3.2, v3.1 or older). Support is user-to-user based, so please help others if you can.
Post Reply
John.Brost
Posts: 5
Joined: Mon Aug 20, 2012 8:46 pm

EMS_RUN Failed - MPI Errors

Post by John.Brost » Mon Aug 20, 2012 8:56 pm

Greetings all,

I recently installed WRF EMS on my system and I was able to run the benchmark just fine and I configured my domain fine. I then ran EMS_PREP which also worked just fine. However, when I ran EMS_RUN, I get the following error when the script is trying to produce my initial and boundary conditions I receive the following error:
Creating the WRF initial and boundary condition files


EMS ERROR : UGH! Creation of model initial and boundary conditions failed!

I hate when this #%^!#!!% happens.

Hopefully nobody was hurt during this attempt!


! Here are the last few lines of the the log/real.0000.err log file:

Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(392)..............:
MPID_Init(139).....................: channel initialization failed
MPIDI_CH3_Init(38).................:
MPID_nem_init(234).................:
MPID_nem_tcp_init(108).............:
MPID_nem_tcp_get_business_card(346):
MPID_nem_tcp_init(305).............: gethostbyname failed, warf1 (errno 1)
[0]0:Return code = 0, signaled with Interrupt
[0]1:Return code = 1



* Gathering information for localhost warf1

! System information is also available from static/wrfems_system.info


WRF EMS Program ems_run failure (3) at Mon Aug 20 20:53:25 2012 UTC
Has anybody experienced this before and if so, how do I resolve this issue?

Thank you for your assistance,

JJ

pattim
Posts: 186
Joined: Sun Jun 24, 2012 8:42 pm
Location: Los Angeles, CA, USA

Re: EMS_RUN Failed - MPI Errors

Post by pattim » Sun Aug 26, 2012 12:26 am

Wow, no - have you checked the number of CPUs being used? Just a guess.

Robert Dewey
Posts: 2
Joined: Fri Aug 17, 2012 3:13 pm

Re: EMS_RUN Failed - MPI Errors

Post by Robert Dewey » Mon Aug 27, 2012 2:39 pm

John.Brost wrote: MPID_nem_tcp_init(305).............: gethostbyname failed, warf1 (errno 1)
[0]0:Return code = 0, signaled with Interrupt
[0]1:Return code = 1
I'm just guessing, but this seem to indicate a problem accessing machine warf1. Double check the hostname and make sure it's correct. Can you ping it?

Are you running a cluster? If so, can you issue the command "ssh warf1 hostname" and get a response? If not, this is most likely a networking issue.

I'd check your /etc/hosts file to make sure warf1 has the appropriate entry.

Post Reply