wrong CPU cores number and much more

Questions and other topics related to UEMS 18.
Post Reply
alfe
Posts: 86
Joined: Thu Nov 25, 2010 8:13 pm

wrong CPU cores number and much more

Post by alfe » Sun Jan 14, 2018 10:23 am

Hello,
During the install process of uems 18, the program does not correctly count the number of CPUs of my machine. That should not be a problem if you change the value in the PROFILE (for bash) file. But the problem is that UEMS refuses to run with all my 16 CPUs, saying that I have only 2 CPUs and that's it ! :twisted:
On the uems 15 it worked correctly on the same machine.
------UEMS 15 --------------

Code: Select all

Processor and Memory Information for calcul2
          
              CPU Name              : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
              CPU Instructions      : sandybridge
              CPU Type              : 64-bit
              CPU Speed             : 2000.07 MHz
          
              EMS Determined Processor Count
                  Physical CPUs     : 2
                  Cores per CPU     : 8
                  Total Processors  : 16
          
              EMS.cshrc Specified Processor Count
                  Physical CPUs     : 2 
                  Cores per CPU     : 8
                  Total Processors  : 16
          
              Hyper-Threading       : Off
                
              System Memory         : 31.3 Gbytes
          
          EMS User Information for uems on calcul2
          
              User  ID              : 503
              Group ID              : 503
              Home Directory        : /home/uems
              Home Directory Mount  : Local
              User Shell            : /bin/tcsh
              Shell Installed       : Yes
              Shell Login Files     : .cshrc
              EMS.cshrc Sourced     : .cshrc
              EMS.cshrc Port Range  : None Defined
          
          EMS Installation Information for calcul2
          
              EMS Release           : 15.98.1,WRF3.7.1
              EMS Home Directory    : /home/uems/uems
              EMS Home Mount        : Local
              EMS User ID           : 503
              EMS Group ID          : 503
              EMS Binaries          : x64
          
              EMS Run Directory     : /home/uems/uems/runs
              EMS Run Dir Mount     : Local
              EMS Run Dir User ID   : 503
              EMS Run Dir Group ID  : 503
          
              Run Dir Avail Space   : 166.22 Gb
              Run Dir Space Used    : 69%
          
              EMS Util Directory    : /home/uems/uems/util


    Your awesome EMS sysinfo party is complete - Sun Jan 14 10:10:06 2018 UTC

  As the alchemists at Cambridge are fond of stating: "Think Globally, Model Locally!"


Code: Select all

Processor and Memory Information for calcul2
        
            CPU Name              : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
            CPU Microarchitecture : Sandybridge
            CPU Type              : x86_64
            CPU Speed             : 1.999 MHz
        
            UEMS Determined Processor Count
                Sockets           : 2
                Cores per Socket  : 1
                Total Cores       : 2
        
            UEMS.profile Specified Processor Count
                Sockets           : 2 
                Cores per Socket  : 8
                Total Cores       : 16
        
            Hyper-Threading       : Off  
        
            System Memory         : 32 Gbytes
        
        UEMS User Information for uems_18 on calcul2
        
            Home Directory        : /home/uems_18
            User  ID              : 504 (uems_18)
            Group ID              : 504 (uems_18)
            Home Directory Mount  : Local
            User Shell            : /bin/bash
            Shell Login Files     : .bash_profile
        
        UEMS Installation Information for calcul2
        
            UEMS Home Directory   : /home/uems_18/uems
            UEMS User ID          : 504 (uems_18)
            UEMS Group ID         : 504 (uems_18)
            UEMS Home Mount       : Local
            UEMS Binaries         : x64
            UEMS Release          : 18.1.0
            UEMS WRF Release      : 3.9.1
        
            UEMS Run Directory    : /home/uems_18/uems/runs
            UEMS Run Dir User ID  : 504 (uems_18)
            UEMS Run Dir Group ID : 504 (uems_18)
            UEMS Run Dir Mount    : Local
        
            Run Dir Total Space   : 560.84 Gb
            Run Dir Space Used    : 366.4 Gb
            Run Dir Avail Space   : 165.95 Gb



    Your awesome UEMS Run Information party is complete - Sun Jan 14 10:11:25 2018 UTC

  As Shields and Yarnell loved to gesticulate: "Think Globally, Model Locally!"

What is happening ??

alfe
Posts: 86
Joined: Thu Nov 25, 2010 8:13 pm

Re: wrong CPU cores number and much more

Post by alfe » Sat Jan 20, 2018 5:28 pm

Just to push the subject to the top.

I have checked into the uems program files including the install script, but I did not find how the number of cores could be incorrectly determined. I tried to force (I mean to hardcode it !) the number of cpu to the correct value in the strc files, but I only succeeded in running the preprocessor. For real and wrf program I did not find where the number of CPUs is specified/calculated. So the system still runs with only 1 core/socket :twisted: :twisted:

Does anybody know from which file in uems system are REAL and WRF programs launched ?

Any help would be highly appreciated !

meteo60
Posts: 103
Joined: Tue Apr 17, 2012 4:50 pm

Re: wrong CPU cores number and much more

Post by meteo60 » Mon Jan 22, 2018 10:07 am

Did you check in runs/xxxxx/conf/ems_run/run_ncpus.conf and conf/ems_post too?

alfe
Posts: 86
Joined: Thu Nov 25, 2010 8:13 pm

Re: wrong CPU cores number and much more

Post by alfe » Mon Jan 22, 2018 9:19 pm

Hello meteo60,
Thank you for your reply.
Yes I have set the local variables for the total number of CPUs in the corresponding config files :
REAL_NODECPUS = local:16
WRFM_NODECPUS = local:16

EMSUPP_NODECPUS = local:16

lscpu and /proc/cpuinfo give the right config, but the uems install script is wrong.

I have just tried once agin and always the same result : REAL and ARW core only run with 2 processors, instead of 16.
Crazy ! :twisted:

alfe
Posts: 86
Joined: Thu Nov 25, 2010 8:13 pm

Re: wrong CPU cores number and much more [solved]

Post by alfe » Tue Jan 23, 2018 7:42 pm

OK,
I have found the solution in the post viewtopic.php?f=12&t=1207
Thank you Meteosciez !! :)

The problem is the output text strings of lscpu and /proc/cpuinfo. If you have a non english OS, it will NOT work or at random if you are lucky.
Setting the variable described in the above post solves the problem.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest