Page 1 of 1

Uem ​​in clusters info

Posted: Sat Dec 26, 2020 10:19 am
by dominic
Hi everyone, I'm trying to figure out how to run uems in cluster. I have a server called "master" and one I called node1.

The idea is to use node1 to run clustered uems, in the same local lan network.

At the master pc I set an ip of this type: 192.168.1.200, while in the node I set ip 192.168.1.201


But when I go to change the WRFM_NODECPUS fate in run_ncpus.conf like this:

WRF_NODECPUS = 192.168.1.201 uems gives me this error:

Code: Select all

    II.  Creating initial and boundary condition files for domain 1

           !  Number of cores reduced to 12 due to risk of over-decomposition

           *  The WRF REAL will use the following nodes and cores:

                12 processors on localhost (localhost) (1 tile per processor)

           *  Creating initial and boundary conditions  - Success! Let's do it again!

              Initial and boundary condition files created in 3 seconds

           ☺  Moving on to bigger and better delusions of grandeur
            !  Could not map "192.168.1.201" to a hostname
Use of uninitialized value $maxcpus in foreach loop entry at /home/dominic/Desktop/uems/strc/Uutils/Others.pm line 194.
Use of uninitialized value $maxcpus in concatenation (.) or string at /home/dominic/Desktop/uems/strc/Urun/Rutils.pm line 46.


         ☺  About the size of your domain:
            
            The UEMS was unable to determine a viable domain decomposition for domain 1 (100
            (NX) x 126 (NY)) using or fewer CPUs. This is likely due to the inadequate size
            (I'm looking at you) of your domain. The UEMS looks forward to working you again
            once you've increased the dimensions to suitable values.


    Your UEMS Simulation rave was busted at Sat Dec 26 10:08:13 2020 UTC - Ya know, 'cause stuff just happens

  As the alchemists at Cambridge are fond of stating: "Think Globally, Model Locally!"

On the server node I have installed uems in the same directory and with the same username and password, it works regularly there, but it is not recognized by the master pc when running the simulation.

What can it be, it is not yet clear to me how to run a cluster run, on the UEMS guide there is nothing written about how to configure it, Robert was a little superficial on this aspect, I don't know.

Anyone have any suggestions?

Re: Uem ​​in clusters info

Posted: Sat Dec 26, 2020 6:13 pm
by dominic
I'm trying to find the node on the network, through the terminal on the master pc I give the command netcheck node1


netcheck however does not exist on my system, as if it were not installed

in fact it gives me the error: netcheck: command not found

Re: Uem ​​in clusters info

Posted: Mon Dec 28, 2020 11:36 am
by dominic
ok with a little luck I did it. Now the PCs communicate on the network. I have a problem but when I go to run the simulation, I get an error, it probably can't communicate with the cpu

Re: Uem ​​in clusters info

Posted: Tue Dec 29, 2020 12:11 pm
by dominic
when i run the run now i am faced with this error

Code: Select all

    III. Running WRF ARW while thinking happy thoughts

           *  The WRF ARW core will use the following nodes and cores:

                1  processors on servermaster            (1 tile per processor)


           *  A large time step of 75 seconds will be used for this simulation


           *  Output Frequency    Primary wrfout
              --------------------------------------------------------
                Domain 01      :   1 hour          
              --------------------------------------------------------

           *  Running your simulation like I'm the "Little Engine that Could"!
              
              You can tap dance to the progress of the simulation while watching:
              
                %  tail -f /home/dominic/Desktop/uems/runs/test/rsl.out.0000
              
              Unless you have something better to do with your time.

           !  Your simulation failed (127 - File Not Found, File IO Error, or Disk full (You decide))

              Found in RunProgramARW2.log : [mpiexec@servermaster] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion


         ☺  You're so close. You can almost taste the bitter smell of success.


    Your UEMS Simulation rave was busted at Tue Dec 29 12:08:26 2020 UTC - Ya know, 'cause stuff just happens

  Smoke signals from the Vatican are often interpreted as: "Think Globally, Model Locally!"