cluster - UEMS install required on each node?

Questions and other topics related to UEMS 18.
Post Reply
pattim
Posts: 171
Joined: Sun Jun 24, 2012 8:42 pm
Location: Los Angeles, CA, USA

cluster - UEMS install required on each node?

Post by pattim » Thu Oct 04, 2018 4:26 am

Dear UEMS users: I'm having an odd issue on a cluster. I set up and tested passwordless ssh on my mini-cluster (2 12-cpu boxes). It works well, except but UEMS crashes from an error I don't understand - does UEMS have to be installed on all nodes in a cluster? I didn't see that suggested anywhere in the docs.
If so,this could cause issues on heterogeneous clusters. But I think it may be an issue with my calling syntax, which isn't clear from the docs or the .conf file text. Is this a known error and/or is my calling syntax (NODECPUS) correct?

The error originally was:
MPICH executables not found on linux-68t1 (/run/media/patti/00_uems/uems/util/mpich2/bin)!

So I copied that directory tree to node1 (linux-68t1) and received this error:
UEMS executables not found on linux-68t1 (/run/media/patti/dc648309-128b-489f-a9c7-387482966210/Users/00_uems/uems/bin)!
(UEMS is installed on a removable drive on linux-68t0)

My head-node is linux-68t0 - and that's where UEMS is installed.

In run-ncpus.conf
=================
REAL_NODECPUS = local:10
WRFM_NODECPUS = linux-68t0:10,linux-68t1:10

also tried...
=============
ems_run --nodes linux-68t0:10,linux-68t1:10 --length 24 --cycle 6 --domain 4

Result:
=======

Code: Select all

           *  Simulation start and end times:

              Domain         Start                   End              Parent
              ----------------------------------------------------------------
                 1     2018-10-03_06:00:00     2018-10-04_06:00:00       
                 2     2018-10-03_06:00:00     2018-10-04_06:00:00      1
                 3     2018-10-03_06:00:00     2018-10-04_06:00:00      2
                 4     2018-10-03_06:00:00     2018-10-04_06:00:00      3

              Primary domain simulation length will be 24 hours.

           *  Gathering system information for running WRF REAL

           *  Gathering system information for running WRF ARW

           ☠  UEMS executables not found on linux-68t1 (/run/media/patti/dc648309-128b-489f-a9c7-387482966210/Users/00_uems/uems/bin)!

         !  Oh Poop! There is a problem with one or more hosts requested - Exit

pattim
Posts: 171
Joined: Sun Jun 24, 2012 8:42 pm
Location: Los Angeles, CA, USA

Re: cluster - UEMS install required on each node?

Post by pattim » Sun Oct 07, 2018 1:02 am

Nope - not required. Turns out the UEMS directory must be shared ("exported") to all compute nodes. Problem solved. Did I miss that in the docs somewhere or is that just something a cluster-builder knows intrinsically? :roll:

lcana
Posts: 68
Joined: Wed Nov 30, 2011 4:34 pm

Re: cluster - UEMS install required on each node?

Post by lcana » Wed Oct 10, 2018 4:31 pm

Hi Pattim,

Thanks for sharing this with us. It’s interesting how you build up UEMS in a cluster. BTW, which distro did you used for building it? Rocks cluster perhaps?

Best,

Luis

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests