User Tools

Site Tools


institute_lorentz:institutelorentz_maris_slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
institute_lorentz:institutelorentz_maris_slurm [2018/04/09 09:57]
lenocil [Maris partitions]
institute_lorentz:institutelorentz_maris_slurm [2019/01/31 11:45]
lenocil [Using GPUs with slurm on Maris]
Line 1: Line 1:
-====== ​slurm on the Maris Cluster ======+====== ​Slurm on the Maris Cluster ======
  
 All maris nodes have been configured to use [[http://​slurm.schedmd.com/​|slurm]] as a workload manager. Its use is enforced on all nodes. Direct access to any node other than the headnode `novamaris'​ is not allowed. ​ All maris nodes have been configured to use [[http://​slurm.schedmd.com/​|slurm]] as a workload manager. Its use is enforced on all nodes. Direct access to any node other than the headnode `novamaris'​ is not allowed. ​
Line 7: Line 7:
 A snapshot of the cluster usage can be found at http://​slurm.lorentz.leidenuniv.nl/​ (only accessible within the IL workstations network). A snapshot of the cluster usage can be found at http://​slurm.lorentz.leidenuniv.nl/​ (only accessible within the IL workstations network).
  
-Maris runs SLURM v17.02.+Maris runs SLURM v17.11.12
  
 Suggested readings: Suggested readings:
Line 19: Line 19:
 Account information for a given <​username>​ can be displayed using Account information for a given <​username>​ can be displayed using
  
-<​code>​+<​code ​bash>
 sacctmgr list associations cluster=maris user=<​username>​ format=Account,​Cluster,​User,​Fairshare sacctmgr list associations cluster=maris user=<​username>​ format=Account,​Cluster,​User,​Fairshare
 </​code>​ </​code>​
Line 42: Line 42:
  
 ^ Name ^ CPUs ^ Memory ^ Deafult Job Memory,​Time^ GRES ^ Nodes ^# Nodes ^ MaxCPUPu ^ MaxJobsPu^ Max JobTime ^ QOS ^ Access ^ ^ Name ^ CPUs ^ Memory ^ Deafult Job Memory,​Time^ GRES ^ Nodes ^# Nodes ^ MaxCPUPu ^ MaxJobsPu^ Max JobTime ^ QOS ^ Access ^
-|playground|288|851832M|all| |maris0[04-22,​29-33,​35-46]|36| ​ | inf inf | playground | all | +|playground|288|851832M|all| |maris0[04-22,​29-33,​35-46]|36| ​ |   | playground | all | 
-|notebook| 48|193044M|all| |maris0[23-28]|6|4| 1| inf | notebook | all | +|notebook| 48|193044M|all| |maris0[23-28]|6|4| 1|  | notebook | all | 
-|computation| 1552|6578050M |400M, 3Days | inf |maris0[47-74] | 28 | inf inf inf | normal | all | +|computation| 1552|6578050M |400M, 3Days |  |maris0[47-74] | 28 |    | normal | all | 
-|compintel| 192 |1030000M|400M,​ 1 Day| |maris0[76,77] |6inf inf | 3 days | normal | all +|compintel| 192 |1030000M|400M,​ 1 Day| |maris0[76-77] |2  | 3 days | normal | beenakker | 
-|emergency| 384 |2773706M|all| |maris0[69-74] |6| inf inf inf | normal | NOBODY | +|ibintel| 96 |512000M|400M,​ 1 Day| |maris078 |1|  |  | 10 days | normal | beenakker ​
-|gpu| 56 |256000M|400M,​ 3 Days|2 gpu|maris075 |1| inf inf inf | normal | all |+|emergency| 384 |2773706M|all| |maris0[69-74] |6|    | normal | NOBODY | 
 +|gpu| 56 |256000M|400M,​ 3 Days|2 gpu|maris075 |1|    | normal | all |
  
 The `playground'​ partition should be used for test runs. The `playground'​ partition should be used for test runs.
Line 57: Line 58:
 The `gpu' partition should be used only for jobs requiring GPUs. Note that these latter must be requested to slurm explicitly using ''​--gres=gpu:​1''​ for instance. The `gpu' partition should be used only for jobs requiring GPUs. Note that these latter must be requested to slurm explicitly using ''​--gres=gpu:​1''​ for instance.
  
-The `computation'​ and `compintel'​ partitions should be used for production runs.+The `computation'​ and `compintel'​ partitions should be used for production runs. Note that the `compintel'​ partition is made of intel CPUs. 
 + 
 +The `ibintel'​ is a partition made of nodes which have InfinBand connections to an iSCSI scrap storage system to allow efficient I/O operations.
  
  
Line 76: Line 79:
 ---------- --------- --------- -------- -------- ----------- -------------------- ​ ---------- --------- --------- -------- -------- ----------- -------------------- ​
     normal ​                                                                       ​     normal ​                                                                       ​
-playground ​       ​32 ​                                                 DenyOnLimit ​+playground ​                                                            ​
   notebook ​        ​4 ​        ​1 ​                                       DenyOnLimit ​   notebook ​        ​4 ​        ​1 ​                                       DenyOnLimit ​
-    guests ​       ​64 ​                                                 ​DenyOnLimit ​+    guests ​      128                                                  ​DenyOnLimit ​
  
    
Line 108: Line 111:
 Maris075 is the only GPU node in the maris cluster. GPUs are configured as generic resources or GRES.  In order to use a GPU in your calculations,​ you must explicitly request it as a generic resource using the ''​--gres''​ option supported by the salloc, sbatch and srun commands. For instance, if you are submitting a batch script to slurm, then use the format ''#​SBATCH --gres=gpu:​tesla:​1''​ to request one GPU. Maris075 is the only GPU node in the maris cluster. GPUs are configured as generic resources or GRES.  In order to use a GPU in your calculations,​ you must explicitly request it as a generic resource using the ''​--gres''​ option supported by the salloc, sbatch and srun commands. For instance, if you are submitting a batch script to slurm, then use the format ''#​SBATCH --gres=gpu:​tesla:​1''​ to request one GPU.
  
-:!: Please note that on maris GPUs are configured as __consumable__ ​generic resources (i.e. multiple jobs cannot ​use the same GPU).+:!: Please note that on maris GPUs are configured as __not-consumable__ ​generic resources (i.e. multiple jobs can use the same GPU).
  
 +To compile your cuda application on maris using slurm, note that in your submission script you might have to export the libdevice library path and include the path in which the cuda headers can be found, for instance
 +<code bash>
 +#!/bin/env bash
 +....
 +NVVMIR_LIBRARY_DIR=/​usr/​local/​cuda/​lib64/​ /​usr/​local/​cuda/​bin/​nvcc -I/​usr/​local/​cuda/​include my_code.cu
 +</​code>​
  
  
Line 123: Line 132:
  
 In principle to run an MPI application you could just execute it using mpirun as shown in the session below In principle to run an MPI application you could just execute it using mpirun as shown in the session below
-<​code>​+<​code ​bash>
 novamaris$ cat slurm_script.sh novamaris$ cat slurm_script.sh
 #!/bin/env bash #!/bin/env bash
Line 132: Line 141:
 </​code>​ </​code>​
 However, __**it is highly advised you use slurm'​s ''​srun''​ to submit a parallel job in any circumstances**__. However, __**it is highly advised you use slurm'​s ''​srun''​ to submit a parallel job in any circumstances**__.
-<​code>​+<​code ​bash>
 novamaris$ cat slurm_script.sh novamaris$ cat slurm_script.sh
 #!/bin/env bash #!/bin/env bash
Line 142: Line 151:
 At the moment maris supports only OpenMPI with slurm so you are required to load a particular openmpi/​slurm module to get things to work, for instance At the moment maris supports only OpenMPI with slurm so you are required to load a particular openmpi/​slurm module to get things to work, for instance
  
-<​code>​+<​code ​bash>
 # load openMPI # load openMPI
 module load openmpi-slurm/​2.0.2 module load openmpi-slurm/​2.0.2
Line 171: Line 180:
 An example of batch script is given below: An example of batch script is given below:
  
-<​code>​+<​code ​bash>
 #!/bin/env bash #!/bin/env bash
 ##comment out lines by adding at least two `#' at the beginning ##comment out lines by adding at least two `#' at the beginning
Line 197: Line 206:
 Consider the batch script below Consider the batch script below
  
-<​code>​+<​code ​bash>
 $ cat slurmcp.sh $ cat slurmcp.sh
 #!/bin/env bash #!/bin/env bash
Line 224: Line 233:
  
 Slurm can be instructed to email any job state changes to a chosen email address. This is accomplished by using the ''​--mail-type''​ option in sbatch for instance Slurm can be instructed to email any job state changes to a chosen email address. This is accomplished by using the ''​--mail-type''​ option in sbatch for instance
-<​code>​+<​code ​bash>
 ... ...
 #SBATCH --mail-user=myemail@address.org #SBATCH --mail-user=myemail@address.org
institute_lorentz/institutelorentz_maris_slurm.txt · Last modified: 2019/01/31 11:45 by lenocil