Both sides previous revisionPrevious revisionNext revision | Previous revision |
institute_lorentz:xmaris [2022/06/10 09:40] – [Xmaris scientific software] lenocil | institute_lorentz:xmaris [2023/03/29 06:54] (current) – [Compute nodes data disks] lenocil |
---|
Xmaris is the successor of the maris cluster, renamed with a prefix ''x'' because its nodes deployment is automated using the [[https://www.xcat.org/|xCAT]] software. Less formally, the presence of the ''x'' prefix also suggests the time of the year when xmaris was first made available to IL users, that is Christmas (Xmas). | Xmaris is the successor of the maris cluster, renamed with a prefix ''x'' because its nodes deployment is automated using the [[https://www.xcat.org/|xCAT]] software. Less formally, the presence of the ''x'' prefix also suggests the time of the year when xmaris was first made available to IL users, that is Christmas (Xmas). |
| |
[[https://www.gnu.org/|{{https://www.gnu.org/graphics/heckert_gnu.transp.small.png?50 }}]][[https://wiki.centos.org/|{{https://wiki.centos.org/ArtWork/Brand/Logo?action=AttachFile&do=get&target=centos-logo-light.png?200 }}]] [[https://openondemand.org/|{{https://openondemand.org/assets/images/ood_logo_stack_rgb.png?200 }}]] [[https://slurm.schedmd.com|{{https://slurm.schedmd.com/slurm_logo.png?60 }}]] [[https://easybuild.readthedocs.io/en/latest/|{{https://easybuild.readthedocs.io/en/latest/_static/easybuild_logo_alpha.png?200 }}]] | [[https://www.gnu.org/|{{https://www.gnu.org/graphics/heckert_gnu.transp.small.png?50 }}]][[https://wiki.centos.org/|{{https://wiki.centos.org/ArtWork/Brand/Logo?action=AttachFile&do=get&target=centos-logo-light.png?200 }}]] [[https://openondemand.org/|{{https://www.osc.edu/sites/default/files/OpenOnDemand_horiz_RGB.png?200 }}]] [[https://slurm.schedmd.com|{{https://slurm.schedmd.com/slurm_logo.png?60 }}]] [[https://easybuild.readthedocs.io/en/latest/|{{https://docs.easybuild.io/img/easybuild_logo.png?100 }}]] |
===== Xmaris features and expected cluster lifetime ===== | ===== Xmaris features and expected cluster lifetime ===== |
| |
^Mount Point^ Type ^Notes^ | ^Mount Point^ Type ^Notes^ |
|/scratch | HD | **temporary**, local| | |/scratch | HD | **temporary**, local| |
|/marisdata |NetApp| 2TB/user quota, remote| | |/marisdata |NetApp| 2TB/user quota, medium-term storage, remote| |
|/home |NetApp| 10GB/user quota, remote| | |/home |NetApp| 10GB/user quota, medium-term storage, remote| |
| |/ilZone/home | [[institute_lorentz:irods_fair_storage|iRODS]]| 20GB/user quota, archive storage, remote| |
| |
Extra efficient scratch spaces are available to all nodes on the infiniband network (''ibIntel'') | Extra efficient scratch spaces are available to all nodes on the infiniband network (''ibIntel'') |
| |
^Mount Point^ Type^ Notes^ | ^Mount Point^ Type^ Notes^ |
|/IBSSD| SSD |**temporary**, InfiniBand/iSER((iSER stands for “iSCSI Extensions for RDMA”. It is an extension of the iSCSI protocol that includes RDMA (Remote Dynamic Memory Access) support. BeeGFS is parallel filysystem. IBSSD will be discontinued by the end of 2022 in favour of PIBSSD.))| | |/IBSSD| SSD |**DISCONTINUED**, InfiniBand/iSER((iSER stands for “iSCSI Extensions for RDMA”. It is an extension of the iSCSI protocol that includes RDMA (Remote Dynamic Memory Access) support. BeeGFS is parallel filysystem. IBSSD will be discontinued by the end of 2022 in favour of PIBSSD.))| |
|/PIBSSD| SSD|**temporary**, InfiniBand/BeeGFS| | |/PIBSSD| SSD|**temporary**, InfiniBand/BeeGFS| |
| |
xmaris users are strongly advised they delete (or at least move to the shared data disk), if any, their data from the compute nodes scratch disks upon completion of their calculations. All data on the scratch disks __might be cancelled without prior notice__. | xmaris users are strongly advised they delete (or at least move to the shared data disk), if any, their data from the compute nodes scratch disks upon completion of their calculations. All data on the scratch disks __might be cancelled without prior notice__. |
| |
Note that **disk policies might change at any time at the discretion of the cluster owners**. | Note that **disk policies might change at any time at the discretion of the cluster owners**. |
| |
| |
|TensorFlow-1.15.0-Miniconda3/4.7.10| CPU| All| | | |TensorFlow-1.15.0-Miniconda3/4.7.10| CPU| All| | |
| |
To create and use a tensorflow-aware jupyter kernel that is compatible with xmaris' OpenOnDemand interface do | The following example shows how you can create a tensorflow-aware jupyter notebook kernel that you can use for instance via the OpenOnDemand interface |
| |
<code bash> | <code bash> |
# only on maris075 (GPU node) | # We use maris075 (GPU node) and load the optimised tf module |
ml load TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4 | ml load TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4 |
pip install --user ipykernel==5.1.2 | |
pip install --user jupyter-client==5.3.1 | # We install ipykernel, because necessary to run py notebooks |
ipython kernel install --name=tf210gpuquantum --user | python -m pip install ipykernel --user |
| |
| # We create a kernel called TFQuantum based on python from TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4 |
| python -m ipykernel install --name TFQuantum --display-name "TFQuantum" --user |
| |
| # We edit the kernel such that it does not execute python directly |
| # but via a custom wrapper script |
| cat $HOME/.local/share/jupyter/kernels/tfquantum/kernel.json |
| |
| { |
| "argv": [ |
| "/home/lenocil/.local/share/jupyter/kernels/tfquantum/wrapper.sh", |
| "-m", |
| "ipykernel_launcher", |
| "-f", |
| "{connection_file}" |
| ], |
| "display_name": "TFQuantum", |
| "language": "python", |
| "metadata": { |
| "debugger": true |
| } |
| } |
| |
| # The wrapper script will call python but only after loading any |
| # appropriate module |
| cat /home/lenocil/.local/share/jupyter/kernels/tfquantum/wrapper.sh |
| |
| #!/bin/env bash |
| ml load TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4 |
| |
| exec python $@ |
| |
| # DONE. tfquantum will appear in the dropdown list of kernels |
| # upon creating a new notebook |
</code> | </code> |
| |
When launching a jupyter notebook remember to specify ''TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4'' as an extra runtime module. | |
| |
=== TensorFlow with Graphviz === | === TensorFlow with Graphviz === |
* via a traditional //configure/make// procedure | * via a traditional //configure/make// procedure |
| |
Whatever installation method you might choose, please note that you do not have administrative rights to the cluster. | Whatever installation method you might choose, please note that you **do not have** administrative rights to the cluster. |
| |
| |
</code> | </code> |
| |
:!: The environment variable ''EASYBUILD_OPTARCH'' instructs EasyBuild to compile software in a generic way so that it can be used on different CPUs. This is rather convenient in heterogeneous clusters such as xmaris to avoid recompilations of the same softwares on different compute nodes. This convenience comes of course at a cost; the executables so produced will not be as efficient as they would be on a given CPU. For more info read [[https://easybuild.readthedocs.io/en/latest/Controlling_compiler_optimization_flags.ht | |:!: The environment variable ''EASYBUILD_OPTARCH'' instructs EasyBuild to compile software in a generic way so that it can be used on different CPUs. This is rather convenient in heterogeneous clusters such as xmaris to avoid recompilations of the same softwares on different compute nodes. This convenience comes of course at a cost; the executables so produced will not be as efficient as they would be on a given CPU. For more info read [[https://easybuild.readthedocs.io/en/latest/Controlling_compiler_optimization_flags.ht |
ml|here]]. | ml|here]].| |
| |
:!: When compiling OpenBLAS it is not sufficient to define ''EASYBUILD_OPTARCH'' to ''GENERIC'' to achieve portability of the executables. Some extra steps must be taken as described in https://github.com/easybuilders/easybuild/blob/master/docs/Controlling_compiler_optimization_flags.rst. A list of targets supported by OpenBLAS can be found [[https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt|here]]. | |:!: When compiling OpenBLAS it is not sufficient to define ''EASYBUILD_OPTARCH'' to ''GENERIC'' to achieve portability of the executables. Some extra steps must be taken as described in https://github.com/easybuilders/easybuild/blob/master/docs/Controlling_compiler_optimization_flags.rst. A list of targets supported by OpenBLAS can be found [[https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt|here]].| |
| |
Then execute | Then execute |
to make available to the ''module'' comamnd any of the softwares built in your EasyBuild userspace. | to make available to the ''module'' comamnd any of the softwares built in your EasyBuild userspace. |
| |
:!: ''module use <path>'' will prepend <path> to your ''MODULEPATH''. Should you want to append it instead, then add the option ''-a''. To remove <path> from ''MODULEPATH'' execute ''module unuse <path>''. | |:!: ''module use <path>'' will prepend <path> to your ''MODULEPATH''. Should you want to append it instead, then add the option ''-a''. To remove <path> from ''MODULEPATH'' execute ''module unuse <path>''.| |
| |
Should you want to customise the building process of a given software please read how to implement [[https://easybuild.readthedocs.io/en/latest/Implementing-easyblocks.html|EasyBlocks]] and write [[https://easybuild.readthedocs.io/en/latest/Writing_easyconfig_files.html|EasyConfig]] files or | Should you want to customise the building process of a given software please read how to implement [[https://easybuild.readthedocs.io/en/latest/Implementing-easyblocks.html|EasyBlocks]] and write [[https://easybuild.readthedocs.io/en/latest/Writing_easyconfig_files.html|EasyConfig]] files or |
===== Suggested readings ===== | ===== Suggested readings ===== |
| |
* https://slurm.schedmd.com/archive/slurm-18.08.6/ | * https://slurm.schedmd.com/archive/slurm-21.08.8-2/ |
* https://osc.github.io/ood-documentation/master/ | * https://osc.github.io/ood-documentation/master/ |
* https://www.gnu.org/gnu/linux-and-gnu.en.html | * https://www.gnu.org/gnu/linux-and-gnu.en.html |