User Tools

Site Tools


slurm_tutorial

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
slurm_tutorial [2017/04/11 08:48] – [Tips] lenocilslurm_tutorial [2019/01/16 08:35] (current) – [Less-common user commands] lenocil
Line 3: Line 3:
 Slurm is a resource manager and job scheduler.  Slurm is a resource manager and job scheduler. 
 Users can submit jobs (i.e. scripts containing execution instructions) to slurm so that it can schedule their execution and allocate the appropriate resources (CPU, RAM, etc..) on the basis of a user's preferences or the limits imposed by the system administrators. Users can submit jobs (i.e. scripts containing execution instructions) to slurm so that it can schedule their execution and allocate the appropriate resources (CPU, RAM, etc..) on the basis of a user's preferences or the limits imposed by the system administrators.
-Clearly, the advantages of using slurm on a computational cluster are multiple. For an overview of them please read [[https://slurm.schedmd.com/|these pages]].+The advantages of using slurm on a computational cluster are multiple. For an overview of them please read [[https://slurm.schedmd.com/|these pages]].
  
 Slurm is **free** software distributed under the  [[http://www.gnu.org/licenses/gpl.html|GNU General Public License]]. Slurm is **free** software distributed under the  [[http://www.gnu.org/licenses/gpl.html|GNU General Public License]].
  
-==== What is parallel job? ====+==== What is parallel computing? ====
  
-//A parallel job consists of tasks that run simultaneously.// Parallelization can be achieved in different ways, among which: +//A parallel job consists of tasks that run simultaneously.// Parallelization can be achieved in different ways. Please read the relevant wiki page [[https://en.wikipedia.org/wiki/Parallel_computing|here]] to know more.
-  * by running a multi-process program, for example using [[https://www.open-mpi.org/|OpenMPI]]. +
-  * by running a multi-threaded program, for example see [[http://en.wikipedia.org/wiki/Pthreads|pthreads]]+
- +
-A multi-process program consists of multiple tasks orchestrated by MPI and possibly executed by different nodes. On the other hand, a multi-threaded program consists of multiple task using several CPUs on the same node. +
- +
-Slurm's command `srun' (see below) allows users to create tasks and/or request CPUs for a particular task such that both types of parallelizations mentioned above can be achieved easily. For instance, the //--ntasks n (-N)// option will create **n processes**, while the //--cpus-per-task n (-c)// option will created a single **n-threaded process**. Tasks cannot be split across several compute nodes. See the examples below.+
 ==== Slurm's architecture ==== ==== Slurm's architecture ====
  
Line 55: Line 49:
 <code> <code>
 $ sinfo $ sinfo
-PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST 
-playground*    up   infinite      4  alloc maris[029-032] 
-playground*    up   infinite     32   idle maris[004-022,033,035-046] 
-lowmem         up 7-00:00:00      1    mix maris047 
-lowmem         up 7-00:00:00     20   idle maris[048-050,052-068] 
-lowmem-inf     up   infinite      1    mix maris047 
-lowmem-inf     up   infinite     20   idle maris[048-050,052-068] 
-highmem        up 7-00:00:00      6   idle maris[069-074] 
-highmem-inf    up   infinite      6   idle maris[069-074] 
-notebook       up   infinite      2    mix maris[023-024] 
-notebook       up   infinite      4   idle maris[025-028] 
- 
- 
 </code> </code>
 A * near a partition name indicates the default partition. See ''man sinfo'' A * near a partition name indicates the default partition. See ''man sinfo''
  
-**What jobs exist on the system?**+**Display all active jobs by user bongo?**
  
 <code> <code>
-$squeue  +$squeue -u <username>
-             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) +
-             12277    lowmem CFnumder marxxxel  R    1:23:18      1 maris047 +
-             12276 playgroun pkequal_ maxxxxel  R    1:24:57      1 maris032 +
-              8439  notebook obrien-j   oxxxen  R   18:10:55      1 maris024 +
-              8749 playgroun slurm_co oxxxxxkh  R   17:28:55      1 maris029 +
-              5801  notebook ostroukh oxxxxxkh  R 4-05:02:54      1 maris023 +
-              8750 playgroun slurm_en oxxxxxkh  R   17:28:52      4 maris[029-032] +
- +
 </code> </code>
  
Line 92: Line 64:
 <code> <code>
 $ scontrol show partition notebook $ scontrol show partition notebook
-PartitionName=notebook 
-   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL 
-   AllocNodes=ALL Default=NO QoS=notebook 
-   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO 
-   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED 
-   Nodes=maris0[23-28] 
-   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO 
-   OverTimeLimit=NONE PreemptMode=OFF 
-   State=UP TotalCPUs=48 TotalNodes=6 SelectTypeParameters=NONE 
-   DefMemPerNode=UNLIMITED MaxMemPerCPU=4096 
- 
 </code> </code>
  
 <code> <code>
 $scontrol show node maris004 $scontrol show node maris004
-NodeName=maris004 Arch=x86_64 CoresPerSocket=4 
-   CPUAlloc=8 CPUErr=0 CPUTot=8 CPULoad=0.01 
-   AvailableFeatures=(null) 
-   ActiveFeatures=(null) 
-   Gres=(null) 
-   NodeAddr=maris004 NodeHostName=maris004 Version=16.05 
-   OS=Linux RealMemory=16046 AllocMem=16000 FreeMem=2082 Sockets=2 Boards=1 
-   State=ALLOCATED ThreadsPerCore=1 TmpDisk=9951 Weight=1 Owner=N/A MCS_label=N/A 
-   BootTime=2016-12-22T12:08:05 SlurmdStartTime=2017-02-17T09:19:46 
-   CapWatts=n/a 
-   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 
-   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s 
- 
 </code> </code>
  
 <code> <code>
 novamaris [1087] $ scontrol show jobs 1052 novamaris [1087] $ scontrol show jobs 1052
-JobId=1052 JobName=slurm_engine.sbatch 
-   UserId=xxxxxxx(1261909) GroupId=lorentz(9999) MCS_label=N/A 
-   Priority=1 Nice=0 Account=zzzzzz QOS=normal 
-   JobState=RUNNING Reason=None Dependency=(null) 
-   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 
-   RunTime=00:49:06 TimeLimit=UNLIMITED TimeMin=N/A 
-   SubmitTime=2017-02-23T12:17:34 EligibleTime=2017-02-23T12:17:34 
-   StartTime=2017-02-23T12:17:36 EndTime=Unknown Deadline=N/A 
-   PreemptTime=None SuspendTime=None SecsPreSuspend=0 
-   Partition=average-computation AllocNode:Sid=maris004:20658 
-   ReqNodeList=(null) ExcNodeList=(null) 
-   NodeList=maris[024-033,035-040] 
-   BatchHost=maris024 
-   NumNodes=16 NumCPUs=128 NumTasks=128 CPUs/Task=1 ReqB:S:C:T=0:0:*:* 
-   TRES=cpu=128,mem=514784M,node=16 
-   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* 
-   MinCPUsNode=1 MinMemoryNode=32174M MinTmpDiskNode=0 
-   Features=(null) Gres=(null) Reservation=(null) 
-   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) 
-   Command=./slurm_engine.sbatch 
-   WorkDir=/marisdata/%u/ 
-   StdErr=/marisdata/xxxxxxx/.log/abcd.err 
-   StdIn=/dev/null 
-   StdOut=/marisdata/xxxxxxx/.log/abcd.out 
-   Power= 
- 
 </code> </code>
  
Line 161: Line 83:
 0: maris005 0: maris005
 1: maris006 1: maris006
- 
 </code> </code>
 +
 **Create three tasks running on the same node** **Create three tasks running on the same node**
 <code> <code>
Line 171: Line 93:
 </code> </code>
 **Create three tasks running on different nodes specifying which nodes should __at least__ be used** **Create three tasks running on different nodes specifying which nodes should __at least__ be used**
 +
 <code> <code>
  srun -N3 -w "maris00[5-6]" -l /bin/hostname  srun -N3 -w "maris00[5-6]" -l /bin/hostname
Line 195: Line 118:
 **Create a job script and submit it to slurm for execution** **Create a job script and submit it to slurm for execution**
  
-:!: Use `swizardto generate a batch script.+Suppose ''batch.sh'' has the following contents 
 +<code> 
 +#!/bin/env bash 
 +#SBATCH -n 2 
 +#SBATCH -w maris00[5-6] 
 +srun hostname 
 +</code> 
 + 
 +then submit it using  ''sbatch script.sh''.
  
-Or wrote your own script to submit using  ''sbatch script.sh''.+See ''man sbatch''.
  
 ==== Less-common user commands ==== ==== Less-common user commands ====
Line 205: Line 136:
   * **sshare**   * **sshare**
   * **sprio**   * **sprio**
 +  * **sacct**
  
 === sacctmgr === === sacctmgr ===
Line 211: Line 143:
 <code> <code>
 $ sacctmgr show qos format=Name,MaxCpusPerUser,MaxJobsPerUser,Flags $ sacctmgr show qos format=Name,MaxCpusPerUser,MaxJobsPerUser,Flags
-      Name MaxCPUsPU MaxJobsPU                Flags  
----------- --------- --------- --------------------  
-    normal                                           
-playground        32                    DenyOnLimit  
-  notebook                          DenyOnLimit  
 </code> </code>
  
Line 232: Line 159:
 </code> </code>
  
-:!: If your job is serial (not parallel, that is not submitted using `srun') do not forget to append ''.batch'' to the job id. +:!: Note that in the example above the job is identified by id ''8749.batch'' in which the word `batch' is appended to the id displayed using the squeue commandThis is a necessary addition whenever a running program is not parallel i.e. not using `srun'.
- +
-:!: For parallel jobs ''sstat <jobid>'will work.+
  
 === sshare === === sshare ===
Line 240: Line 165:
  
 <code> <code>
-$ sshare -U -u xxxxx+$ sshare -U -u <username>
              Account       User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare               Account       User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
 -------------------- ---------- ---------- ----------- ----------- ------------- ----------  -------------------- ---------- ---------- ----------- ----------- ------------- ---------- 
-xxxxx                    yyyyyy         1    0.055556      691078      0.005142   0.937857 +xxxxx                    yyyyyy          1    0.024390    37733389      0.076901   0.112428
  
 </code> </code>
  
-:!: On maris, usage parameters will decay over time according to a PriorityDecayHalfLife of 14 days. 
  
 === sprio === === sprio ===
Line 257: Line 181:
 </code> </code>
  
 +To find what priority a running job was given type 
 +<code>
 +squeue -o %Q -j <jobid>
 +</code>
 +
 +=== sacct ===
 +It displays accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database. For instance
 +
 +<code>
 +sacct -o JobID,JobName,User,AllocNodes,AllocTRES,AveCPUFreq,AveRSS,Start,End -j 13180,13183
 +       JobID    JobName      User AllocNodes  AllocTRES AveCPUFreq     AveRSS               Start                 End 
 +------------ ---------- --------- ---------- ---------- ---------- ---------- ------------------- ------------------- 
 +13180             test2   xxxxxxx          1 cpu=8,mem+                       2017-04-10T13:34:33 2017-04-10T14:08:24 
 +13180.batch       batch                    1 cpu=8,mem+    116.13M    354140K 2017-04-10T13:34:33 2017-04-10T14:08:24 
 +13183             test3   xxxxxxx          1 cpu=8,mem+                       2017-04-10T13:54:52 2017-04-10T14:26:34 
 +13183.batch       batch                    1 cpu=8,mem+         2G     10652K 2017-04-10T13:54:52 2017-04-10T14:26:34 
 +13183.0         xpyxmci                    1 cpu=8,mem+      1.96G     30892K 2017-04-10T13:54:53 2017-04-10T14:26:34 
 +
 +
 +</code>
 +
 +:!: Use ''--noconvert'' if you want sacct to display consistent units across jobs.
 ===== Tips ===== ===== Tips =====
  
-To minimize the time your job spends in the queue you could specify multiple partitions so that the job can start as soon as possible. Use ''--partition=notebook,playground,lowmem''+To minimize the time your job spends in the queue you could specify multiple partitions so that the job could start as soon as possible. Use ''--partition=notebook,playground,computation'' for instance.
  
 To have a rough estimate of when your queued job will start type ''squeue --start'' To have a rough estimate of when your queued job will start type ''squeue --start''
Line 271: Line 217:
  
 <code> <code>
-watch -n 1 -x sinfo -S"-O" -o "%.9n %.6t %.10e/%m %.10O %.15C"+sinfo -i 5 -S"-O" -o "%.9n %.6t %.10e/%m %.10O %.15C"
 </code> </code>
  
Line 300: Line 246:
 For instance ''#SBATCH --nodelist=maris0xx'' For instance ''#SBATCH --nodelist=maris0xx''
  
-=== Environment variables available to any slurm job ===+=== Environment variables available to slurm jobs ===
  
-You can use any of the following variables in your jobs+Type ''printenv | grep -i slurm'' to display them.
  
-<code> 
-$ salloc -p playground -N 10 
-salloc: Granted job allocation 13709 
-$  printenv | grep -i slurm_ 
-SLURM_NODELIST=maris[031-033,035-041] 
-SLURM_JOB_NAME=bash 
-SLURM_NODE_ALIASES=(null) 
-SLURM_JOB_QOS=normal 
-SLURM_NNODES=10 
-SLURM_JOBID=13709 
-SLURM_TASKS_PER_NODE=1(x10) 
-SLURM_JOB_ID=13709 
-SLURM_SUBMIT_DIR=/tmp/bla-bla 
-SLURM_JOB_NODELIST=maris[031-033,035-041] 
-SLURM_CLUSTER_NAME=maris 
-SLURM_JOB_CPUS_PER_NODE=1(x10) 
-SLURM_SUBMIT_HOST=novamaris.lorentz.leidenuniv.nl 
-SLURM_JOB_PARTITION=playground 
-SLURM_JOB_ACCOUNT=yuyuysu 
-SLURM_JOB_NUM_NODES=10 
-SLURM_MEM_PER_NODE=32174 
- 
-</code> 
slurm_tutorial.1491900494.txt.gz · Last modified: 2017/04/11 08:48 by lenocil