New Install - Benchmark hangs

All issues/questions about EMS v3.4 package, please ask here.
Post Reply
User avatar
johnbasham
Posts: 14
Joined: Thu Jun 07, 2012 2:51 pm
Location: Fort Worth, Texas
Contact:

New Install - Benchmark hangs

Post by johnbasham » Fri Jan 03, 2014 3:20 am

Running commands as user 'emsuser'....

Using: 'ems_autorun --verbose 3 --length 24' from the '/util/benchmark/arw_small/' directory.
'ems_prep' runs as it should in 1 minute and 11 seconds. BUT.... script hangs in 'ems_run' in part II '7 processors on Godzilla (1 tile per processor) * Creating WRF initial and boundary condition files'

Then it hangs here for hours at a time until I give up and CTRL 'C'....

Have tried it manually and without any modifiers.... still hangs in same place.

This is a new machine and new install with 16GB of RAM and an 8 processor CPU.....
Any thoughts? or similar problems?
Warmest Regards,
John Basham
Project Director/Senior Meteorologist
The Storm Spotter Project
Fort Worth, Texas

www.JohnBasham.com

meteoadriatic
Posts: 1565
Joined: Wed Aug 19, 2009 10:05 am

Re: New Install - Benchmark hangs

Post by meteoadriatic » Fri Jan 03, 2014 8:52 am

Hello, is anything useful in log directory there inside arw_small directory?

User avatar
johnbasham
Posts: 14
Joined: Thu Jun 07, 2012 2:51 pm
Location: Fort Worth, Texas
Contact:

Re: New Install - Benchmark hangs

Post by johnbasham » Sat Jan 04, 2014 3:23 am

meteoadriati,

In fact there is useful information. The run_real.log shows the following error:
ssh: connect to host Godzilla.StormSpotter.com port 22: Connection timed out
[mpiexec@Godzilla.StormSpotter.com] Sending Ctrl-C to processes as requested

[mpiexec@Godzilla.StormSpotter.com] Press Ctrl-C again to force abort

[mpiexec@Godzilla.StormSpotter.com] HYDU_sock_write (./utils/sock/sock.c:291): write error (Bad file descriptor)

[mpiexec@Godzilla.StormSpotter.com] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:170): unable to write data to proxy
[mpiexec@Godzilla.StormSpotter.com] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream

[mpiexec@Godzilla.StormSpotter.com] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@Godzilla.StormSpotter.com] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec@Godzilla.StormSpotter.com] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
This appears to me to be an issue with the naming convention of the machine itself. This machine is going to be taken from developmental to operational, therefore the machines hostname and domain are set for that future use. The 'c alias' of 'Godzilla' is what will send all commands (in the future) to this machine. BUT, right now there is no DNS pointing to this machine in any way... therefore, since there is an attempt to make an SSH connection with a domain (in the real world) that does no exists... I think this is where it breaks. I'm going to chance the /etc/sysconfig/network file in my centOS back to 'localhost.localdomain' and see if that solves my problem.

What do you think?
Warmest Regards,
John Basham
Project Director/Senior Meteorologist
The Storm Spotter Project
Fort Worth, Texas

www.JohnBasham.com

User avatar
johnbasham
Posts: 14
Joined: Thu Jun 07, 2012 2:51 pm
Location: Fort Worth, Texas
Contact:

Re: New Install - Benchmark hangs

Post by johnbasham » Sat Jan 04, 2014 3:50 am

Okay I made the change and the machine is now the default 'localhost.localdomain'... BUT, I still get the following:
ssh: Could not resolve hostname localhost.localdomain: Name or service not known
[mpiexec@localhost.localdomain] Sending Ctrl-C to processes as requested

[mpiexec@localhost.localdomain] Press Ctrl-C again to force abort

[mpiexec@localhost.localdomain] HYDU_sock_write (./utils/sock/sock.c:291): write error (Bad file descriptor)

[mpiexec@localhost.localdomain] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:170): unable to write data to proxy
[mpiexec@localhost.localdomain] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream

[mpiexec@localhost.localdomain] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@localhost.localdomain] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event

[mpiexec@localhost.localdomain] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
So now I'm scratching my head???? Ideas, anyone?
Warmest Regards,
John Basham
Project Director/Senior Meteorologist
The Storm Spotter Project
Fort Worth, Texas

www.JohnBasham.com

User avatar
johnbasham
Posts: 14
Joined: Thu Jun 07, 2012 2:51 pm
Location: Fort Worth, Texas
Contact:

Re: New Install - Benchmark hangs

Post by johnbasham » Sat Jan 04, 2014 4:49 am

It was a problem with the local systems naming convention.

I made the following changes (on CentOS 6) to the following files:
/etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
localhost.localdomain localhost
and:
/etc/sysconfig/network
NETWORKING=yes
HOSTNAME=Godzilla
That got the Benchmark test running beyond where it WAS hanging. It has not yest completed, but I'm confident that THIS issue is solved, and I wanted to share the fix with others just in case they run into anything similar.
Warmest Regards,
John Basham
Project Director/Senior Meteorologist
The Storm Spotter Project
Fort Worth, Texas

www.JohnBasham.com

beunprepared
Posts: 1
Joined: Mon Nov 10, 2014 1:48 am

Re: New Install - Benchmark hangs

Post by beunprepared » Mon Nov 10, 2014 2:04 am

Thank you very much. I encountered the same thing when I was using boost.mpi and boost.serialization.
http://eolaus.blogspot.com/2014/11/boos ... oblem.html

Post Reply

Who is online

Users browsing this forum: Baidu [Spider] and 1 guest