Size: 691
Comment:
|
Size: 7640
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
<<TableOfContents(1)>> | <<TableOfContents(2)>> |
Line 8: | Line 8: |
Do not hesitate to ask for some help wen needed. | |
Line 10: | Line 9: |
== Example of batch script == | Do not hesitate to ask for some help when needed. == Filesystems == Some but not all partitions are available to the compute nodes. Compute nodes will not be able to access any data from filesystems that are not listed here. /work /scratch /home/ == Slurm partitions == There are currently 2 partitions, '''normal''' and '''bigmem'''. The '''normal''' partition is the default partition if you submit a job without precising witch partition should be used you job will be placed in one of the normal partition. The normal partition has limited RAM of 250GB, in case you need more than that please use the '''bigmem''' partition. Use ''-p'' option can be used to specify the needed partition. == Load necessary software == By default only some software will be available when login. To be able use other software scripts you should first load then. The command '''module''' will help you to manage the modules dependencies. To to check which software are installed, can be used after importing, {{{ module list }}} Expected output is, {{{ Currently Loaded Modules: 1) autotools 2) prun/1.3 3) gnu8/8.3.0 4) openmpi3/3.1.4 5) ohpc }}} to check which software are available, ready to be used without importing anything. The same command can be used to search a specific package. {{{ module avail }}} Expected output is, {{{ --------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 ---------------------------------------------------- adios/1.13.1 hypre/2.18.1 netcdf-cxx/4.3.1 petsc/3.12.0 py2-scipy/1.2.1 scorep/6.0 trilinos/12.14.1 boost/1.71.0 imb/2018.1 netcdf-fortran/4.5.2 phdf5/1.10.5 py3-mpi4py/3.0.1 sionlib/1.7.4 dimemas/5.4.1 mfem/4.0 netcdf/4.7.1 pnetcdf/1.12.0 py3-scipy/1.2.1 slepc/3.12.0 extrae/3.7.0 mpiP/3.4.1 omb/5.6.2 ptscotch/6.0.6 scalapack/2.0.2 superlu_dist/6.1.1 fftw/3.3.8 mumps/5.2.1 opencoarrays/2.8.0 py2-mpi4py/3.0.2 scalasca/2.5 tau/2.28 -------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8 -------------------------------------------------------- hdf5/1.10.5 metis/5.1.0 mvapich2/2.3.2 openblas/0.3.7 pdtoolkit/3.25 py3-numpy/1.15.3 likwid/4.3.4 mpich/3.3.1 ocr/1.0.1 openmpi3/3.1.4 (L) py2-numpy/1.15.3 superlu/5.2.1 ------------------------------------------------------------- /tools/modulefiles -------------------------------------------------------------- MEGAHIT/1.2.9 ---------------------------------------------------------- /opt/ohpc/pub/modulefiles ---------------------------------------------------------- EasyBuild/3.9.4 clustershell/1.8.2 gnu7/7.3.0 llvm5/5.0.1 pmix/2.2.2 valgrind/3.15.0 autotools (L) cmake/3.15.4 gnu8/8.3.0 (L) ohpc (L) prun/1.3 (L) charliecloud/0.11 gnu/5.4.0 hwloc/2.1.0 papi/5.7.0 singularity/3.4.1 Where: L: Module is loaded Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys". }}} To search for a module, {{{ module avail <<keyword>> #OR module spider <<keyword>> }}} To load a module do, {{{ module load <<MODULENAME/VERSION>> }}} Loading a module can be done following those 3 steps, 1. Locate the module, module avail 2. Check how to load it, module spider <<MODULENAME/VERSION>> 3. Load you module using the instructions from step 2 Read more about '''module''' usage https://lmod.readthedocs.io/en/latest/010_user.html == Prototype of batch script == This prototype should be in a script file, for example, my_first_script.sbatch |
Line 17: | Line 108: |
#SBATCH -e /work/<<UID>>/job.%j.err # Name of stderr output file (%j expands to jobId) #SBATCH -p normal # Partition to use, another possible value is bigmem |
|
Line 18: | Line 111: |
#SBATCH -n 16 # Total number of mpi tasks requested #SBATCH -t 01:30:00 # Run time (hh:mm:ss) - 1.5 hours |
#SBATCH -n 16 # Total number of cpu requested or total number of mpi tasks #SBATCH -t 01:30:00 # Run time ([d-]hh:mm:ss) - 1.5 hours #SBATCH --mail-type=ALL #SBATCH --mail-user=your.email@wur.nl |
Line 22: | Line 117: |
module load CMD | module load CMD/version |
Line 27: | Line 122: |
To run a sbatch script use {{{ sbatch <<script name>> }}} Here are some explanation for obscure elements, Line starting with #SBATCH:: those lines are option given to sbatch. They completely different from the command you are running. we use %j in the -o option:: this is a place holder, it will be replace by the job id of your run. The use of that make it easier to find out which standard output correspond to which task. It could be removed but make sure that all the tasks have a specific output file. -N 1:: given the limited number of nodes, all users are invited to only use 1 node. Most bioinfo software can not be run on more than 1 node so don't waist resources. -t option:: This set a limit of time for your task to run. If 00:05:00 your job will run for 5minutes. What if it is not finished, you will have to rerun it again giving a higher time. If the command you are running has the ability to continue from a checkpoint, you can use that ability to reduce the running time. This parameter is difficult to estimate in most cases, do not hesitate to over estimate at the beginning. == Example of sbtach script == Let's assume a few things here, A. You need to be logged in A. Your data is available A. The needed software is available A. The test will be run in folder /work/test/ Preparing for the run, {{{#!highlight bash mkdir /work/test/ }}} Let's try to run an assembly using megahit, {{{#!highlight bash #!/bin/bash #SBATCH -J test # Job name #SBATCH -o /work/test/job.%j.out # Name of stdout output file (%j expands to jobId) #SBATCH -e /work/test/job.%j.err # Name of stderr output file (%j expands to jobId) #SBATCH -p normal # Partition to use, another possible value is bigmem #SBATCH -N 1 # Total number of nodes requested #SBATCH -n 16 # Total number of cpu or total number of mpi tasks #SBATCH -t 01:30:00 # Run time ([d-]hh:mm:ss) - 1.5 hours #SBATCH --mail-type=ALL #SBATCH --mail-user=felix.homa@wur.nl # Load your available meghit module load MEGAHIT/1.2.9 # Defining variable for work directory based=/work/test # Defining variable for temporary directory # We do this because our command uses tmp folder tmp_dir=$based/tmp # Defining variable for output directory output_dir=$based/output # Defining variable for forward and reverse read file f_read=/tools/test_data/assembly/r3_1.fa.gz r_read=/tools/test_data/assembly/r3_2.fa.gz # Creating temporary folder # Megahit will complain if "output" folder already exists mkdir $tmp_dir # Command to run # We use previously defined varibales to set the values of megahit options megahit -1 $f_read -2 $r_read --tmp-dir $tmp_dir --out-dir $output_dir --out-prefix r3 }}} |
Contents
Submitting jobs
Whatever you read here may need to be adjusted to fit your specific case.
Do not hesitate to ask for some help when needed.
Filesystems
Some but not all partitions are available to the compute nodes. Compute nodes will not be able to access any data from filesystems that are not listed here. /work /scratch /home/
Slurm partitions
There are currently 2 partitions, normal and bigmem.
The normal partition is the default partition if you submit a job without precising witch partition should be used you job will be placed in one of the normal partition.
The normal partition has limited RAM of 250GB, in case you need more than that please use the bigmem partition.
Use -p option can be used to specify the needed partition.
Load necessary software
By default only some software will be available when login. To be able use other software scripts you should first load then.
The command module will help you to manage the modules dependencies.
To to check which software are installed, can be used after importing,
module list
Expected output is,
Currently Loaded Modules: 1) autotools 2) prun/1.3 3) gnu8/8.3.0 4) openmpi3/3.1.4 5) ohpc
to check which software are available, ready to be used without importing anything. The same command can be used to search a specific package.
module avail
Expected output is,
--------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 ---------------------------------------------------- adios/1.13.1 hypre/2.18.1 netcdf-cxx/4.3.1 petsc/3.12.0 py2-scipy/1.2.1 scorep/6.0 trilinos/12.14.1 boost/1.71.0 imb/2018.1 netcdf-fortran/4.5.2 phdf5/1.10.5 py3-mpi4py/3.0.1 sionlib/1.7.4 dimemas/5.4.1 mfem/4.0 netcdf/4.7.1 pnetcdf/1.12.0 py3-scipy/1.2.1 slepc/3.12.0 extrae/3.7.0 mpiP/3.4.1 omb/5.6.2 ptscotch/6.0.6 scalapack/2.0.2 superlu_dist/6.1.1 fftw/3.3.8 mumps/5.2.1 opencoarrays/2.8.0 py2-mpi4py/3.0.2 scalasca/2.5 tau/2.28 -------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8 -------------------------------------------------------- hdf5/1.10.5 metis/5.1.0 mvapich2/2.3.2 openblas/0.3.7 pdtoolkit/3.25 py3-numpy/1.15.3 likwid/4.3.4 mpich/3.3.1 ocr/1.0.1 openmpi3/3.1.4 (L) py2-numpy/1.15.3 superlu/5.2.1 ------------------------------------------------------------- /tools/modulefiles -------------------------------------------------------------- MEGAHIT/1.2.9 ---------------------------------------------------------- /opt/ohpc/pub/modulefiles ---------------------------------------------------------- EasyBuild/3.9.4 clustershell/1.8.2 gnu7/7.3.0 llvm5/5.0.1 pmix/2.2.2 valgrind/3.15.0 autotools (L) cmake/3.15.4 gnu8/8.3.0 (L) ohpc (L) prun/1.3 (L) charliecloud/0.11 gnu/5.4.0 hwloc/2.1.0 papi/5.7.0 singularity/3.4.1 Where: L: Module is loaded Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
To search for a module,
module avail <<keyword>> #OR module spider <<keyword>>
To load a module do,
module load <<MODULENAME/VERSION>>
Loading a module can be done following those 3 steps,
- Locate the module, module avail
Check how to load it, module spider <<MODULENAME/VERSION>>
- Load you module using the instructions from step 2
Read more about module usage https://lmod.readthedocs.io/en/latest/010_user.html
Prototype of batch script
This prototype should be in a script file, for example, my_first_script.sbatch
1 #!/bin/bash
2
3 #SBATCH -J test # Job name
4 #SBATCH -o /work/<<UID>>/job.%j.out # Name of stdout output file (%j expands to jobId)
5 #SBATCH -e /work/<<UID>>/job.%j.err # Name of stderr output file (%j expands to jobId)
6 #SBATCH -p normal # Partition to use, another possible value is bigmem
7 #SBATCH -N 1 # Total number of nodes requested
8 #SBATCH -n 16 # Total number of cpu requested or total number of mpi tasks
9 #SBATCH -t 01:30:00 # Run time ([d-]hh:mm:ss) - 1.5 hours
10 #SBATCH --mail-type=ALL
11 #SBATCH --mail-user=your.email@wur.nl
12
13 # Load your software/command
14 module load CMD/version
15
16 # Run your command
17 CMD [OPTIONS] ARGUMENTS
To run a sbatch script use
sbatch <<script name>>
Here are some explanation for obscure elements,
- Line starting with #SBATCH
- those lines are option given to sbatch. They completely different from the command you are running.
- we use %j in the -o option
- this is a place holder, it will be replace by the job id of your run. The use of that make it easier to find out which standard output correspond to which task. It could be removed but make sure that all the tasks have a specific output file.
- -N 1
- given the limited number of nodes, all users are invited to only use 1 node. Most bioinfo software can not be run on more than 1 node so don't waist resources.
- -t option
- This set a limit of time for your task to run. If 00:05:00 your job will run for 5minutes. What if it is not finished, you will have to rerun it again giving a higher time. If the command you are running has the ability to continue from a checkpoint, you can use that ability to reduce the running time. This parameter is difficult to estimate in most cases, do not hesitate to over estimate at the beginning.
Example of sbtach script
Let's assume a few things here,
- You need to be logged in
- Your data is available
- The needed software is available
- The test will be run in folder /work/test/
Preparing for the run,
1 mkdir /work/test/
Let's try to run an assembly using megahit,
1 #!/bin/bash
2
3 #SBATCH -J test # Job name
4 #SBATCH -o /work/test/job.%j.out # Name of stdout output file (%j expands to jobId)
5 #SBATCH -e /work/test/job.%j.err # Name of stderr output file (%j expands to jobId)
6 #SBATCH -p normal # Partition to use, another possible value is bigmem
7 #SBATCH -N 1 # Total number of nodes requested
8 #SBATCH -n 16 # Total number of cpu or total number of mpi tasks
9 #SBATCH -t 01:30:00 # Run time ([d-]hh:mm:ss) - 1.5 hours
10 #SBATCH --mail-type=ALL
11 #SBATCH --mail-user=felix.homa@wur.nl
12
13 # Load your available meghit
14 module load MEGAHIT/1.2.9
15
16 # Defining variable for work directory
17 based=/work/test
18 # Defining variable for temporary directory
19 # We do this because our command uses tmp folder
20 tmp_dir=$based/tmp
21 # Defining variable for output directory
22 output_dir=$based/output
23 # Defining variable for forward and reverse read file
24 f_read=/tools/test_data/assembly/r3_1.fa.gz
25 r_read=/tools/test_data/assembly/r3_2.fa.gz
26
27 # Creating temporary folder
28 # Megahit will complain if "output" folder already exists
29 mkdir $tmp_dir
30
31 # Command to run
32 # We use previously defined varibales to set the values of megahit options
33 megahit -1 $f_read -2 $r_read --tmp-dir $tmp_dir --out-dir $output_dir --out-prefix r3