This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
institute_lorentz:institutelorentz_maris_slurm [2018/04/09 10:00] – [Maris QOS] lenocil | institute_lorentz:institutelorentz_maris_slurm [2020/01/15 13:34] (current) – removed lenocil | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== slurm on the Maris Cluster ====== | ||
- | All maris nodes have been configured to use [[http:// | ||
- | |||
- | Maris' slurm has been configured to manage consumable resources, such as CPUs and RAM, and generic resources (GPUs) using cgroups. | ||
- | |||
- | A snapshot of the cluster usage can be found at http:// | ||
- | |||
- | Maris runs SLURM v17.02. | ||
- | |||
- | Suggested readings: | ||
- | * [[https:// | ||
- | * [[: | ||
- | ===== Accounting ===== | ||
- | Maris accounting scheme has been set up such that each principal investigator (PI) at IL has its own slurm account. Collaborators, | ||
- | |||
- | Accounting allows system managers to track cluster usages to improve services and enables the assignment of different powers/ | ||
- | |||
- | Account information for a given < | ||
- | |||
- | < | ||
- | sacctmgr list associations cluster=maris user=< | ||
- | </ | ||
- | |||
- | if no results are returned, then please contact '' | ||
- | |||
- | Similarly, if you encounter the following error message upon submission of your batch job | ||
- | < | ||
- | error: Unable to allocate resources: Invalid account or account/ | ||
- | </ | ||
- | please make sure that you have specified the account associated to your user name. | ||
- | ===== Available partitions and nodes ===== | ||
- | |||
- | Available partitions and their configurations are available by typing `sinfo' | ||
- | |||
- | A similar type of information plus all jobs currently in a queue can be seen using the GUI program '' | ||
- | |||
- | |||
- | |||
- | ==== Maris partitions ==== | ||
- | `playground' | ||
- | |||
- | ^ Name ^ CPUs ^ Memory ^ Deafult Job Memory, | ||
- | |playground|288|851832M|all| |maris0[04-22, | ||
- | |notebook| 48|193044M|all| |maris0[23-28]|6|4| 1| | notebook | all | | ||
- | |computation| 1552|6578050M |400M, 3Days | |maris0[47-74] | 28 | | | | normal | all | | ||
- | |compintel| 192 |1030000M|400M, | ||
- | |emergency| 384 |2773706M|all| |maris0[69-74] |6| | | | normal | NOBODY | | ||
- | |gpu| 56 |256000M|400M, | ||
- | |||
- | The `playground' | ||
- | |||
- | The `notebook' | ||
- | |||
- | The `emergency' | ||
- | |||
- | The `gpu' partition should be used only for jobs requiring GPUs. Note that these latter must be requested to slurm explicitly using '' | ||
- | |||
- | The `computation' | ||
- | |||
- | |||
- | ===== Maris QOS ===== | ||
- | |||
- | A quality of service (QOS) can be associated with a job, user, partition, etc... and can modify | ||
- | |||
- | * priorities | ||
- | * limits | ||
- | |||
- | Maris uses the concept of QOS to impose usage limits on the `notebook', | ||
- | |||
- | To display all defined QOS use '' | ||
- | |||
- | < | ||
- | #sacctmgr show qos format=Name, | ||
- | Name MaxCPUsPU MaxJobsPU GrpNodes | ||
- | ---------- --------- --------- -------- -------- ----------- -------------------- | ||
- | normal | ||
- | playground | ||
- | notebook | ||
- | guests | ||
- | |||
- | |||
- | </ | ||
- | |||
- | Any users can submit jobs specifying a QOS via the option '' | ||
- | ===== Slurm job priority on Maris ===== | ||
- | |||
- | Maris' slurm uses the [[https:// | ||
- | * Age | ||
- | * Fairshare | ||
- | * Job size and TRES | ||
- | * Partition | ||
- | * QOS | ||
- | Furthermore, | ||
- | |||
- | In summary, Maris' slurm has been set up such that: | ||
- | |||
- | * small jobs are given high priority. | ||
- | * jobs submitted to the '' | ||
- | * QOS have no influence on job priority. | ||
- | * fairshare is an important factor when ordering the queue. | ||
- | * after a wait of 7 days, a job will be given the maximum Age factor weight. | ||
- | * fairshare is only based on the past 14 days. That is, usage decays to 0 within 14 days. | ||
- | |||
- | The relevant configuration options can be displayed via the command '' | ||
- | ===== Using GPUs with slurm on Maris ===== | ||
- | |||
- | Maris075 is the only GPU node in the maris cluster. GPUs are configured as generic resources or GRES. In order to use a GPU in your calculations, | ||
- | |||
- | :!: Please note that on maris GPUs are configured as __consumable__ generic resources (i.e. multiple jobs cannot use the same GPU). | ||
- | |||
- | |||
- | |||
- | ===== slurm and MPI ===== | ||
- | |||
- | OpenMPI on the maris cluster supports launching parallel jobs in all three methods that SLURM supports: | ||
- | |||
- | * with //salloc// | ||
- | * with //sbatch// | ||
- | * with //srun// | ||
- | |||
- | Please read https:// | ||
- | |||
- | In principle to run an MPI application you could just execute it using mpirun as shown in the session below | ||
- | < | ||
- | novamaris$ cat slurm_script.sh | ||
- | #!/bin/env bash | ||
- | mpirun mpi_app.exe | ||
- | novamaris$ sbatch -N 4 slurm_script.sh | ||
- | srun: jobid 1234 submitted | ||
- | novamaris$ | ||
- | </ | ||
- | However, __**it is highly advised you use slurm' | ||
- | < | ||
- | novamaris$ cat slurm_script.sh | ||
- | #!/bin/env bash | ||
- | srun mpi_app.exe | ||
- | novamaris$ sbatch -N 4 slurm_script.sh | ||
- | srun: jobid 1234 submitted | ||
- | novamaris$ | ||
- | </ | ||
- | At the moment maris supports only OpenMPI with slurm so you are required to load a particular openmpi/ | ||
- | |||
- | < | ||
- | # load openMPI | ||
- | module load openmpi-slurm/ | ||
- | |||
- | # run on 1 node usind 3 CPUs | ||
- | srun -n 3 < | ||
- | |||
- | # run on 4 nodes node usind 4 CPUs | ||
- | srun -N 4 -n 4 mpi_example | ||
- | |||
- | # if job is multithreaded and requires more than one CPU per task | ||
- | srun -c 4 mpi_example | ||
- | </ | ||
- | |||
- | :!: '' | ||
- | |||
- | :!: Any application that uses MPI with slurm must be compiled against the mpi in the module openmpi-slurm otherwise it will behave erratically. | ||
- | |||
- | ==== module: openmpi-slurm ==== | ||
- | It includes a version of openMPI built with slurm support. It also includes mpi-enabled '' | ||
- | ===== Example batch script ===== | ||
- | |||
- | Whenever writing a batch script, users are HIGHLY advised to explicitly specify resources (mem, cpus, etc...). | ||
- | |||
- | maris offers a helper program to get user started with their first batch script, just type `swizard' | ||
- | |||
- | Batch scripts come handy when you have several options you would like to pass slurm. Instead of having a very long cmd line, you could create a batch script and you could submit it using `sbatch' | ||
- | An example of batch script is given below: | ||
- | |||
- | < | ||
- | #!/bin/env bash | ||
- | ##comment out lines by adding at least two `#' at the beginning | ||
- | #SBATCH --job-name=lel-rAUtV | ||
- | #SBATCH --account=wyxxl | ||
- | #SBATCH --partition=computation | ||
- | #SBATCH --output=/ | ||
- | #SBATCH --error=/ | ||
- | #SBATCH --time=1-00: | ||
- | #SBATCH --mem=400 | ||
- | #SBATCH --ntasks=16 | ||
- | #SBATCH --ntasks-per-node=1 | ||
- | |||
- | module load openmpi-slurm/ | ||
- | |||
- | srun a.out | ||
- | </ | ||
- | |||
- | ===== Example: how to use a node's scratch disks ===== | ||
- | |||
- | How can I tranfer | ||
- | |||
- | slurm provides the command '' | ||
- | |||
- | Consider the batch script below | ||
- | |||
- | < | ||
- | $ cat slurmcp.sh | ||
- | #!/bin/env bash | ||
- | |||
- | DEFAULT_SOURCE=${SLURM_SUBMIT_DIR} | ||
- | SOURCE=${1: | ||
- | |||
- | #SBATCH -N 1 | ||
- | #SBATCH --nodelist=maris066 | ||
- | |||
- | SCRATCH=/ | ||
- | |||
- | srun mkdir -p ${SCRATCH} || exit $? | ||
- | # note that srun cp is equivalent to loop over each node and copy the files | ||
- | srun cp -r ${DEFAULT_SOURCE}/ | ||
- | |||
- | # now do whatever you need to do with the local data | ||
- | |||
- | # do NOT forget to remove data that are no longer needed | ||
- | srun rm -rf ${SCRATCH} || exit $? | ||
- | </ | ||
- | |||
- | and its invocation '' | ||
- | |||
- | ===== Example: instruct slurm to send emails upon job state changes ===== | ||
- | |||
- | Slurm can be instructed to email any job state changes to a chosen email address. This is accomplished by using the '' | ||
- | < | ||
- | ... | ||
- | #SBATCH --mail-user=myemail@address.org | ||
- | #SBATCH --mail-type=ALL | ||
- | ... | ||
- | </ | ||
- | |||
- | If the '' | ||
- | |||
- | In the event a job failure (exit status different than zero), maris will include in the notification email a few lines from the job's stderr. Please note that this feature will **only** work if a job's stdout and stderr were not | ||
- | specified using '' | ||
- | ===== Python Notebooks on maris ===== | ||
- | We have set up a jupyterhub environment that uses the slurm facilities to launch users' notebooks. Please | ||
- | refer to [[institute_lorentz: | ||
- | ===== Notes ===== | ||
- | |||
- | :!: ssh-ing from novamaris to a maris compute node produces a top-like output. | ||
- | |||
- |