This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
institute_lorentz:institutelorentz_maris_slurm [2018/10/08 07:44] – [Maris partitions] lenocil | institute_lorentz:institutelorentz_maris_slurm [2020/01/15 13:34] (current) – removed lenocil | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Slurm on the Maris Cluster ====== | ||
- | All maris nodes have been configured to use [[http:// | ||
- | |||
- | Maris' slurm has been configured to manage consumable resources, such as CPUs and RAM, and generic resources (GPUs) using cgroups. | ||
- | |||
- | A snapshot of the cluster usage can be found at http:// | ||
- | |||
- | Maris runs SLURM v17.02. | ||
- | |||
- | Suggested readings: | ||
- | * [[https:// | ||
- | * [[: | ||
- | ===== Accounting ===== | ||
- | Maris accounting scheme has been set up such that each principal investigator (PI) at IL has its own slurm account. Collaborators, | ||
- | |||
- | Accounting allows system managers to track cluster usages to improve services and enables the assignment of different powers/ | ||
- | |||
- | Account information for a given < | ||
- | |||
- | <code bash> | ||
- | sacctmgr list associations cluster=maris user=< | ||
- | </ | ||
- | |||
- | if no results are returned, then please contact '' | ||
- | |||
- | Similarly, if you encounter the following error message upon submission of your batch job | ||
- | < | ||
- | error: Unable to allocate resources: Invalid account or account/ | ||
- | </ | ||
- | please make sure that you have specified the account associated to your user name. | ||
- | ===== Available partitions and nodes ===== | ||
- | |||
- | Available partitions and their configurations are available by typing `sinfo' | ||
- | |||
- | A similar type of information plus all jobs currently in a queue can be seen using the GUI program '' | ||
- | |||
- | |||
- | |||
- | ==== Maris partitions ==== | ||
- | `playground' | ||
- | |||
- | ^ Name ^ CPUs ^ Memory ^ Deafult Job Memory, | ||
- | |playground|288|851832M|all| |maris0[04-22, | ||
- | |notebook| 48|193044M|all| |maris0[23-28]|6|4| 1| | notebook | all | | ||
- | |computation| 1552|6578050M |400M, 3Days | |maris0[47-74] | 28 | | | | normal | all | | ||
- | |compintel| 192 |1030000M|400M, | ||
- | |ibintel| 192 |1030000M|400M, | ||
- | |emergency| 384 |2773706M|all| |maris0[69-74] |6| | | | normal | NOBODY | | ||
- | |gpu| 56 |256000M|400M, | ||
- | |||
- | The `playground' | ||
- | |||
- | The `notebook' | ||
- | |||
- | The `emergency' | ||
- | |||
- | The `gpu' partition should be used only for jobs requiring GPUs. Note that these latter must be requested to slurm explicitly using '' | ||
- | |||
- | The `computation' | ||
- | |||
- | |||
- | ===== Maris QOS ===== | ||
- | |||
- | A quality of service (QOS) can be associated with a job, user, partition, etc... and can modify | ||
- | |||
- | * priorities | ||
- | * limits | ||
- | |||
- | Maris uses the concept of QOS to impose usage limits on the `notebook', | ||
- | |||
- | To display all defined QOS use '' | ||
- | |||
- | < | ||
- | #sacctmgr show qos format=Name, | ||
- | Name MaxCPUsPU MaxJobsPU GrpNodes | ||
- | ---------- --------- --------- -------- -------- ----------- -------------------- | ||
- | normal | ||
- | playground | ||
- | notebook | ||
- | guests | ||
- | |||
- | |||
- | </ | ||
- | |||
- | Any users can submit jobs specifying a QOS via the option '' | ||
- | ===== Slurm job priority on Maris ===== | ||
- | |||
- | Maris' slurm uses the [[https:// | ||
- | * Age | ||
- | * Fairshare | ||
- | * Job size and TRES | ||
- | * Partition | ||
- | * QOS | ||
- | Furthermore, | ||
- | |||
- | In summary, Maris' slurm has been set up such that: | ||
- | |||
- | * small jobs are given high priority. | ||
- | * jobs submitted to the '' | ||
- | * QOS have no influence on job priority. | ||
- | * fairshare is an important factor when ordering the queue. | ||
- | * after a wait of 7 days, a job will be given the maximum Age factor weight. | ||
- | * fairshare is only based on the past 14 days. That is, usage decays to 0 within 14 days. | ||
- | |||
- | The relevant configuration options can be displayed via the command '' | ||
- | ===== Using GPUs with slurm on Maris ===== | ||
- | |||
- | Maris075 is the only GPU node in the maris cluster. GPUs are configured as generic resources or GRES. In order to use a GPU in your calculations, | ||
- | |||
- | :!: Please note that on maris GPUs are configured as __not-consumable__ generic resources (i.e. multiple jobs can use the same GPU). | ||
- | |||
- | To compile your cuda application on maris using slurm, note that in your submission script you might have to export the libdevice library path and include the path in which the cuda headers can be found, for instance | ||
- | <code bash> | ||
- | #!/bin/env bash | ||
- | .... | ||
- | NVVMIR_LIBRARY_DIR=/ | ||
- | </ | ||
- | |||
- | |||
- | ===== slurm and MPI ===== | ||
- | |||
- | OpenMPI on the maris cluster supports launching parallel jobs in all three methods that SLURM supports: | ||
- | |||
- | * with //salloc// | ||
- | * with //sbatch// | ||
- | * with //srun// | ||
- | |||
- | Please read https:// | ||
- | |||
- | In principle to run an MPI application you could just execute it using mpirun as shown in the session below | ||
- | <code bash> | ||
- | novamaris$ cat slurm_script.sh | ||
- | #!/bin/env bash | ||
- | mpirun mpi_app.exe | ||
- | novamaris$ sbatch -N 4 slurm_script.sh | ||
- | srun: jobid 1234 submitted | ||
- | novamaris$ | ||
- | </ | ||
- | However, __**it is highly advised you use slurm' | ||
- | <code bash> | ||
- | novamaris$ cat slurm_script.sh | ||
- | #!/bin/env bash | ||
- | srun mpi_app.exe | ||
- | novamaris$ sbatch -N 4 slurm_script.sh | ||
- | srun: jobid 1234 submitted | ||
- | novamaris$ | ||
- | </ | ||
- | At the moment maris supports only OpenMPI with slurm so you are required to load a particular openmpi/ | ||
- | |||
- | <code bash> | ||
- | # load openMPI | ||
- | module load openmpi-slurm/ | ||
- | |||
- | # run on 1 node usind 3 CPUs | ||
- | srun -n 3 < | ||
- | |||
- | # run on 4 nodes node usind 4 CPUs | ||
- | srun -N 4 -n 4 mpi_example | ||
- | |||
- | # if job is multithreaded and requires more than one CPU per task | ||
- | srun -c 4 mpi_example | ||
- | </ | ||
- | |||
- | :!: '' | ||
- | |||
- | :!: Any application that uses MPI with slurm must be compiled against the mpi in the module openmpi-slurm otherwise it will behave erratically. | ||
- | |||
- | ==== module: openmpi-slurm ==== | ||
- | It includes a version of openMPI built with slurm support. It also includes mpi-enabled '' | ||
- | ===== Example batch script ===== | ||
- | |||
- | Whenever writing a batch script, users are HIGHLY advised to explicitly specify resources (mem, cpus, etc...). | ||
- | |||
- | maris offers a helper program to get user started with their first batch script, just type `swizard' | ||
- | |||
- | Batch scripts come handy when you have several options you would like to pass slurm. Instead of having a very long cmd line, you could create a batch script and you could submit it using `sbatch' | ||
- | An example of batch script is given below: | ||
- | |||
- | <code bash> | ||
- | #!/bin/env bash | ||
- | ##comment out lines by adding at least two `#' at the beginning | ||
- | #SBATCH --job-name=lel-rAUtV | ||
- | #SBATCH --account=wyxxl | ||
- | #SBATCH --partition=computation | ||
- | #SBATCH --output=/ | ||
- | #SBATCH --error=/ | ||
- | #SBATCH --time=1-00: | ||
- | #SBATCH --mem=400 | ||
- | #SBATCH --ntasks=16 | ||
- | #SBATCH --ntasks-per-node=1 | ||
- | |||
- | module load openmpi-slurm/ | ||
- | |||
- | srun a.out | ||
- | </ | ||
- | |||
- | ===== Example: how to use a node's scratch disks ===== | ||
- | |||
- | How can I tranfer | ||
- | |||
- | slurm provides the command '' | ||
- | |||
- | Consider the batch script below | ||
- | |||
- | <code bash> | ||
- | $ cat slurmcp.sh | ||
- | #!/bin/env bash | ||
- | |||
- | DEFAULT_SOURCE=${SLURM_SUBMIT_DIR} | ||
- | SOURCE=${1: | ||
- | |||
- | #SBATCH -N 1 | ||
- | #SBATCH --nodelist=maris066 | ||
- | |||
- | SCRATCH=/ | ||
- | |||
- | srun mkdir -p ${SCRATCH} || exit $? | ||
- | # note that srun cp is equivalent to loop over each node and copy the files | ||
- | srun cp -r ${DEFAULT_SOURCE}/ | ||
- | |||
- | # now do whatever you need to do with the local data | ||
- | |||
- | # do NOT forget to remove data that are no longer needed | ||
- | srun rm -rf ${SCRATCH} || exit $? | ||
- | </ | ||
- | |||
- | and its invocation '' | ||
- | |||
- | ===== Example: instruct slurm to send emails upon job state changes ===== | ||
- | |||
- | Slurm can be instructed to email any job state changes to a chosen email address. This is accomplished by using the '' | ||
- | <code bash> | ||
- | ... | ||
- | #SBATCH --mail-user=myemail@address.org | ||
- | #SBATCH --mail-type=ALL | ||
- | ... | ||
- | </ | ||
- | |||
- | If the '' | ||
- | |||
- | In the event a job failure (exit status different than zero), maris will include in the notification email a few lines from the job's stderr. Please note that this feature will **only** work if a job's stdout and stderr were not | ||
- | specified using '' | ||
- | ===== Python Notebooks on maris ===== | ||
- | We have set up a jupyterhub environment that uses the slurm facilities to launch users' notebooks. Please | ||
- | refer to [[institute_lorentz: | ||
- | ===== Notes ===== | ||
- | |||
- | :!: ssh-ing from novamaris to a maris compute node produces a top-like output. | ||
- | |||
- |