User Tools

Site Tools


institute_lorentz:xmaris

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
institute_lorentz:xmaris [2022/06/10 08:44] – [Web access] lenocilinstitute_lorentz:xmaris [2024/02/29 14:16] (current) – [Xmaris Partitions] jansen
Line 18: Line 18:
 Xmaris is the successor of the maris cluster, renamed with a prefix ''x'' because its nodes deployment is automated using the [[https://www.xcat.org/|xCAT]] software. Less formally, the presence of the ''x'' prefix also suggests the time of the year when xmaris was first made available to IL users, that is Christmas (Xmas). Xmaris is the successor of the maris cluster, renamed with a prefix ''x'' because its nodes deployment is automated using the [[https://www.xcat.org/|xCAT]] software. Less formally, the presence of the ''x'' prefix also suggests the time of the year when xmaris was first made available to IL users, that is Christmas (Xmas).
  
-[[https://www.gnu.org/|{{https://www.gnu.org/graphics/heckert_gnu.transp.small.png?50 }}]][[https://wiki.centos.org/|{{https://wiki.centos.org/ArtWork/Brand/Logo?action=AttachFile&do=get&target=centos-logo-light.png?200 }}]] [[https://openondemand.org/|{{https://openondemand.org/assets/images/ood_logo_stack_rgb.png?200  }}]] [[https://slurm.schedmd.com|{{https://slurm.schedmd.com/slurm_logo.png?60  }}]] [[https://easybuild.readthedocs.io/en/latest/|{{https://easybuild.readthedocs.io/en/latest/_static/easybuild_logo_alpha.png?200  }}]]+[[https://www.gnu.org/|{{https://www.gnu.org/graphics/heckert_gnu.transp.small.png?50 }}]][[https://wiki.centos.org/|{{https://wiki.centos.org/ArtWork/Brand/Logo?action=AttachFile&do=get&target=centos-logo-light.png?200 }}]] [[https://openondemand.org/|{{https://www.osc.edu/sites/default/files/OpenOnDemand_horiz_RGB.png?200  }}]] [[https://slurm.schedmd.com|{{https://slurm.schedmd.com/slurm_logo.png?60  }}]] [[https://easybuild.readthedocs.io/en/latest/|{{https://docs.easybuild.io/img/easybuild_logo.png?100  }}]]
 ===== Xmaris features and expected cluster lifetime ===== ===== Xmaris features and expected cluster lifetime =====
  
Line 44: Line 44:
 ^Mount Point^ Type ^Notes^ ^Mount Point^ Type ^Notes^
 |/scratch | HD | **temporary**, local| |/scratch | HD | **temporary**, local|
-|/marisdata |NetApp| 2TB/user quota, remote| +|/marisdata |NetApp| 2TB/user quota, medium-term storage, remote| 
-|/home |NetApp| 10GB/user quota, remote|+|/home |NetApp| 10GB/user quota, medium-term storage, remote| 
 +|/ilZone/home | [[institute_lorentz:irods_fair_storage|iRODS]]| 20GB/user quota, archive storage, remote| 
  
 Extra efficient scratch spaces are available to all nodes on the infiniband network (''ibIntel'') Extra efficient scratch spaces are available to all nodes on the infiniband network (''ibIntel'')
  
 ^Mount Point^ Type^ Notes^ ^Mount Point^ Type^ Notes^
-|/IBSSD| SSD |**temporary**, InfiniBand/iSER((iSER stands for “iSCSI Extensions for RDMA”. It is an extension of the iSCSI protocol that includes RDMA (Remote Dynamic Memory Access) support. BeeGFS is parallel filysystem. IBSSD will be discontinued by the end of 2022 in favour of PIBSSD.))|  +|/IBSSD| SSD |**DISCONTINUED**, InfiniBand/iSER((iSER stands for “iSCSI Extensions for RDMA”. It is an extension of the iSCSI protocol that includes RDMA (Remote Dynamic Memory Access) support. BeeGFS is parallel filysystem. IBSSD will be discontinued by the end of 2022 in favour of PIBSSD.))|  
 |/PIBSSD| SSD|**temporary**, InfiniBand/BeeGFS| |/PIBSSD| SSD|**temporary**, InfiniBand/BeeGFS|
  
Line 58: Line 60:
 xmaris users are strongly advised they delete (or at least move to the shared data disk), if any, their data from the compute nodes scratch disks upon completion of their calculations. All data on the scratch disks __might be cancelled without prior notice__. xmaris users are strongly advised they delete (or at least move to the shared data disk), if any, their data from the compute nodes scratch disks upon completion of their calculations. All data on the scratch disks __might be cancelled without prior notice__.
  
-Note that **disk policies might change at any time at the discretion of the cluster owners**.+Note that **disk policies might change at any time at the discretion of the cluster owners**. 
  
  
Line 117: Line 119:
 {{ :institute_lorentz:oodxmaris1.png?direct&1000 }}   {{ :institute_lorentz:oodxmaris1.png?direct&1000 }}  
  
-Similarly to a traditional shell access, Xmaris OpenOnDemand is available only for connections within the __IL intranet__. IL users  who wish to access OpenOnDemand from their remote home locations could for example instruct their browsers to SOCKS-proxy their connections via our SSH server. +Similarly to a traditional shell access, Xmaris OpenOnDemand is available only for connections within the __IL intranet__. IL users  who wish to access OpenOnDemand from their remote home locations could for example use the [[:vpn#lorentz_institute|IL VPN]] or instruct their browsers to SOCKS-proxy their connections via our SSH server. 
-Open a local terminal and type +Open a local terminal and type (substitute username with your IL username)
  
 <code bash> <code bash>
Line 132: Line 134:
   * Submit batch jobs to the slurm scheduler/resource manager.   * Submit batch jobs to the slurm scheduler/resource manager.
   * Open a terminal.   * Open a terminal.
-  * Launch interactive jupyter notebooks.+  * Launch interactive applications such as jupyter notebooks, tensorboard, virtual desktops, etc..
   * Monitor cluster usage.   * Monitor cluster usage.
   * Create and launch your very own OnDemand application (read [[https://osc.github.io/ood-documentation/master/app-development/tutorials-passenger-apps.html|here]]).   * Create and launch your very own OnDemand application (read [[https://osc.github.io/ood-documentation/master/app-development/tutorials-passenger-apps.html|here]]).
Line 140: Line 142:
  
 ^Partition^ Number nodes ^ Timelimit ^  Notes^ ^Partition^ Number nodes ^ Timelimit ^  Notes^
-|compAMD*| 11 | 15 days | |+|compAMD*| | 15 days | |
 |compAMDlong| 3 | 60 days | | |compAMDlong| 3 | 60 days | |
 |compIntel| 2 | 5 days and 12 hours| | |compIntel| 2 | 5 days and 12 hours| |
 |gpuIntel| 1 | 3 days and 12 hours | GPU | |gpuIntel| 1 | 3 days and 12 hours | GPU |
-|ibIntel | | 7 days | InfiniBand, Multiprocessing |+|ibIntel | | 7 days | InfiniBand, Multiprocessing |
  
 *: default partition *: default partition
Line 153: Line 155:
 |maris075 |gpuIntel|2 x Nvidia Tesla P100 16GB | 6.0| |maris075 |gpuIntel|2 x Nvidia Tesla P100 16GB | 6.0|
  
-Use the ''--gres'' slurm option to allocate them for your job,  for instance via ''srun -p gpuIntel --gres=gpu:1 --pty bash -i''.+Xmaris GPUs must be allocated using slurm'''--gres'' option,  for instance  
 +<code bash> 
 +srun -p gpuIntel --gres=gpu:1 --pty bash -i 
 +</code>
 ===== Xmaris scientific software ===== ===== Xmaris scientific software =====
  
Line 190: Line 195:
 Any pre-installed software can be made available in your environment via the ''module load <module_name>'' command. Any pre-installed software can be made available in your environment via the ''module load <module_name>'' command.
  
-It is possible to save a list of modules you use often in a //module collection// to load them in one command+It is possible to save a list of modules you use often in a //module collection// to load them just with one command
 <code bash> <code bash>
 module load mod1 mod2 mod3 mod4 mod5 mod6 module load mod1 mod2 mod3 mod4 mod5 mod6
Line 206: Line 211:
 |TensorFlow-1.15.0-Miniconda3/4.7.10| CPU| All| | |TensorFlow-1.15.0-Miniconda3/4.7.10| CPU| All| |
  
-To create and use a tensorflow-aware jupyter kernel that is compatible with xmaris' OpenOnDemand interface do+The following example shows how you can create a tensorflow-aware jupyter notebook kernel that you can use for instance via the OpenOnDemand interface 
  
 <code bash> <code bash>
-only on maris075 (GPU node)+We use maris075 (GPU node) and load the optimised tf module
 ml load TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4 ml load TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4
-pip install --user ipykernel==5.1.2 + 
-pip install --user jupyter-client==5.3.1 +# We install ipykernel, because necessary to run py notebooks 
-ipython kernel install --name=tf210gpuquantum --user+python -m pip install ipykernel --user 
 + 
 +# We create a kernel called TFQuantum based on python from TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4 
 +python -m ipykernel install --name TFQuantum --display-name "TFQuantum" --user 
 + 
 +# We edit the kernel such that it does not execute python directly 
 +# but via a custom wrapper script 
 +cat $HOME/.local/share/jupyter/kernels/tfquantum/kernel.json 
 + 
 +
 + "argv":
 +  "/home/lenocil/.local/share/jupyter/kernels/tfquantum/wrapper.sh", 
 +  "-m", 
 +  "ipykernel_launcher", 
 +  "-f", 
 +  "{connection_file}" 
 + ], 
 + "display_name": "TFQuantum", 
 + "language": "python", 
 + "metadata":
 +  "debugger": true 
 + } 
 +
 + 
 +# The wrapper script will call python but only after loading any 
 +# appropriate module 
 +cat /home/lenocil/.local/share/jupyter/kernels/tfquantum/wrapper.sh 
 + 
 +#!/bin/env bash 
 +ml load TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4 
 + 
 +exec python $@ 
 + 
 +# DONE. tfquantum will appear in the dropdown list of kernels 
 +# upon creating a new notebook 
 </code> </code>
  
-When launching a jupyter notebook remember to specify ''TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4'' as an extra runtime module.+
  
 === TensorFlow with Graphviz === === TensorFlow with Graphviz ===
Line 235: Line 275:
     * via a traditional //configure/make// procedure     * via a traditional //configure/make// procedure
  
-Whatever installation method you might choose, please note that you do not have administrative rights to the cluster. +Whatever installation method you might choose, please note that you **do not have** administrative rights to the cluster. 
  
  
Line 263: Line 303:
 </code> </code>
  
-:!: The environment variable ''EASYBUILD_OPTARCH'' instructs EasyBuild to compile software in a generic way so that it can be used on different CPUs. This is rather convenient in heterogeneous clusters such as xmaris to avoid recompilations of the same softwares on different compute nodes. This convenience comes of course at a cost; the executables so produced will not be as efficient as they would be on a given CPU. For more info read [[https://easybuild.readthedocs.io/en/latest/Controlling_compiler_optimization_flags.ht +|:!: The environment variable ''EASYBUILD_OPTARCH'' instructs EasyBuild to compile software in a generic way so that it can be used on different CPUs. This is rather convenient in heterogeneous clusters such as xmaris to avoid recompilations of the same softwares on different compute nodes. This convenience comes of course at a cost; the executables so produced will not be as efficient as they would be on a given CPU. For more info read [[https://easybuild.readthedocs.io/en/latest/Controlling_compiler_optimization_flags.ht 
-ml|here]].+ml|here]].|
  
-:!: When compiling OpenBLAS it is not sufficient to define ''EASYBUILD_OPTARCH'' to ''GENERIC'' to achieve portability of the executables. Some extra steps must be taken as described in https://github.com/easybuilders/easybuild/blob/master/docs/Controlling_compiler_optimization_flags.rst. A list of targets supported by OpenBLAS can be found [[https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt|here]].+|:!: When compiling OpenBLAS it is not sufficient to define ''EASYBUILD_OPTARCH'' to ''GENERIC'' to achieve portability of the executables. Some extra steps must be taken as described in https://github.com/easybuilders/easybuild/blob/master/docs/Controlling_compiler_optimization_flags.rst. A list of targets supported by OpenBLAS can be found [[https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt|here]].|
  
 Then execute  Then execute 
Line 276: Line 316:
 to make available to the ''module'' comamnd any of the softwares built in your EasyBuild userspace. to make available to the ''module'' comamnd any of the softwares built in your EasyBuild userspace.
  
-:!: ''module use <path>'' will prepend <path> to your  ''MODULEPATH''. Should you want to append it instead, then add the option ''-a''. To remove <path> from ''MODULEPATH'' execute ''module unuse <path>''.+|:!: ''module use <path>'' will prepend <path> to your  ''MODULEPATH''. Should you want to append it instead, then add the option ''-a''. To remove <path> from ''MODULEPATH'' execute ''module unuse <path>''.|
  
 Should you want to customise the building process of a given software please read how to implement [[https://easybuild.readthedocs.io/en/latest/Implementing-easyblocks.html|EasyBlocks]] and write [[https://easybuild.readthedocs.io/en/latest/Writing_easyconfig_files.html|EasyConfig]] files  or  Should you want to customise the building process of a given software please read how to implement [[https://easybuild.readthedocs.io/en/latest/Implementing-easyblocks.html|EasyBlocks]] and write [[https://easybuild.readthedocs.io/en/latest/Writing_easyconfig_files.html|EasyConfig]] files  or 
Line 582: Line 622:
 ===== Suggested readings ===== ===== Suggested readings =====
  
-  * https://slurm.schedmd.com/archive/slurm-18.08.6/+  * https://slurm.schedmd.com/archive/slurm-21.08.8-2/
   * https://osc.github.io/ood-documentation/master/   * https://osc.github.io/ood-documentation/master/
   * https://www.gnu.org/gnu/linux-and-gnu.en.html   * https://www.gnu.org/gnu/linux-and-gnu.en.html
institute_lorentz/xmaris.1654850642.txt.gz · Last modified: 2022/06/10 08:44 by lenocil