This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
slurm_tutorial [2017/10/27 14:38] – [Examples] lenocil | slurm_tutorial [2019/01/16 08:35] (current) – [Less-common user commands] lenocil | ||
---|---|---|---|
Line 7: | Line 7: | ||
Slurm is **free** software distributed under the [[http:// | Slurm is **free** software distributed under the [[http:// | ||
- | ==== What is a parallel | + | ==== What is parallel |
- | //A parallel job consists of tasks that run simultaneously.// | + | //A parallel job consists of tasks that run simultaneously.// |
- | * by running a multi-process program, for example using [[https:// | + | |
- | * by running a multi-threaded program, for example see [[http:// | + | |
- | + | ||
- | A multi-process program consists of multiple tasks orchestrated by MPI and possibly executed by different nodes. On the other hand, a multi-threaded program consists of multiple task using several CPUs on the same node. | + | |
- | + | ||
- | Slurm' | + | |
==== Slurm' | ==== Slurm' | ||
Line 55: | Line 49: | ||
< | < | ||
$ sinfo | $ sinfo | ||
- | PARTITION | ||
- | playground* | ||
- | playground* | ||
- | computation | ||
- | computation | ||
- | computation | ||
- | emergency | ||
- | emergency | ||
- | notebook | ||
- | notebook | ||
- | notebook | ||
- | gpu up | ||
- | computation-intel | ||
- | computation-intel | ||
</ | </ | ||
A * near a partition name indicates the default partition. See '' | A * near a partition name indicates the default partition. See '' | ||
Line 75: | Line 55: | ||
< | < | ||
- | $squeue -u bongo | + | $squeue -u < |
- | JOBID PARTITION | + | |
- | | + | |
- | + | ||
</ | </ | ||
Line 87: | Line 64: | ||
< | < | ||
$ scontrol show partition notebook | $ scontrol show partition notebook | ||
- | PartitionName=notebook | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | |||
</ | </ | ||
< | < | ||
$scontrol show node maris004 | $scontrol show node maris004 | ||
- | NodeName=maris004 Arch=x86_64 CoresPerSocket=4 | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | |||
</ | </ | ||
< | < | ||
novamaris [1087] $ scontrol show jobs 1052 | novamaris [1087] $ scontrol show jobs 1052 | ||
- | JobId=1052 JobName=slurm_engine.sbatch | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | |||
</ | </ | ||
Line 156: | Line 83: | ||
0: maris005 | 0: maris005 | ||
1: maris006 | 1: maris006 | ||
- | |||
</ | </ | ||
+ | |||
**Create three tasks running on the same node** | **Create three tasks running on the same node** | ||
< | < | ||
Line 166: | Line 93: | ||
</ | </ | ||
**Create three tasks running on different nodes specifying which nodes should __at least__ be used** | **Create three tasks running on different nodes specifying which nodes should __at least__ be used** | ||
+ | |||
< | < | ||
srun -N3 -w " | srun -N3 -w " | ||
Line 215: | Line 143: | ||
< | < | ||
$ sacctmgr show qos format=Name, | $ sacctmgr show qos format=Name, | ||
- | Name MaxCPUsPU MaxJobsPU | ||
- | ---------- --------- --------- -------------------- | ||
- | normal | ||
- | playground | ||
- | notebook | ||
</ | </ | ||
Line 236: | Line 159: | ||
</ | </ | ||
- | :!: If your job is serial (not parallel, that is not submitted using `srun' | + | :!: Note that in the example above the job is identified by id '' |
- | + | ||
- | :!: For parallel | + | |
=== sshare === | === sshare === | ||
Line 244: | Line 165: | ||
< | < | ||
- | $ sshare -U -u xxxxx | + | $ sshare -U -u < |
| | ||
-------------------- ---------- ---------- ----------- ----------- ------------- ---------- | -------------------- ---------- ---------- ----------- ----------- ------------- ---------- | ||
- | xxxxx yyyyyy | + | xxxxx yyyyyy |
</ | </ | ||
- | :!: On maris, usage parameters will decay over time according to a PriorityDecayHalfLife of 14 days. | ||
=== sprio === | === sprio === | ||
Line 281: | Line 201: | ||
</ | </ | ||
+ | |||
+ | :!: Use '' | ||
===== Tips ===== | ===== Tips ===== | ||
- | To minimize the time your job spends in the queue you could specify multiple partitions so that the job can start as soon as possible. Use '' | + | To minimize the time your job spends in the queue you could specify multiple partitions so that the job could start as soon as possible. Use '' |
To have a rough estimate of when your queued job will start type '' | To have a rough estimate of when your queued job will start type '' | ||
Line 295: | Line 217: | ||
< | < | ||
- | watch -n 1 -x sinfo -S" | + | sinfo -i 5 -S" |
</ | </ | ||
Line 324: | Line 246: | ||
For instance ''# | For instance ''# | ||
- | === Environment variables available to any slurm job === | + | === Environment variables available to slurm jobs === |
- | You can use any of the following variables in your jobs | + | Type '' |
- | < | ||
- | $ salloc -p playground -N 10 | ||
- | salloc: Granted job allocation 13709 | ||
- | $ printenv | grep -i slurm_ | ||
- | SLURM_NODELIST=maris[031-033, | ||
- | SLURM_JOB_NAME=bash | ||
- | SLURM_NODE_ALIASES=(null) | ||
- | SLURM_JOB_QOS=normal | ||
- | SLURM_NNODES=10 | ||
- | SLURM_JOBID=13709 | ||
- | SLURM_TASKS_PER_NODE=1(x10) | ||
- | SLURM_JOB_ID=13709 | ||
- | SLURM_SUBMIT_DIR=/ | ||
- | SLURM_JOB_NODELIST=maris[031-033, | ||
- | SLURM_CLUSTER_NAME=maris | ||
- | SLURM_JOB_CPUS_PER_NODE=1(x10) | ||
- | SLURM_SUBMIT_HOST=novamaris.lorentz.leidenuniv.nl | ||
- | SLURM_JOB_PARTITION=playground | ||
- | SLURM_JOB_ACCOUNT=yuyuysu | ||
- | SLURM_JOB_NUM_NODES=10 | ||
- | SLURM_MEM_PER_NODE=32174 | ||
- | |||
- | </ |