Space management

<<TableOfContents: execution failed [list index out of range] (see also the log)>>

Space usage

Every users should bear in mind that space management is tricky, on the one hand it is hard to know how much is needed and on the other hand, it is never enough. The amount of space cannot be infinite (for the time being). However, with a good management it is possible to extend the life span of the available storage and make sure everyone can work. On our system we distinguish 2 types of storage, the administrative storage and the working storage.

Administrative storage

The seq and home folders are administrative partitions and their access is limited. seq partition is where all the sequencing runs are saved. Only admins can modify the folders in this partition. Which means that only admins can save the sequencing run there. User can see and use all sequencing runs. In the home partition, users have access to their personal folders. These folders should not be used to store any sequencing runs or any output files related to sequencing run analysis or big files in general. Another administrative partition is tools, this folder is only meant to host all the bioinformatics tools, no data and no databases. Users can install tools via 'conda'. Please address a message to admins to install a tool which is not present on conda.

Working storage

The working partitions are projects and work, these two folders are meant to store users projects files. To start working in the work partition, make a personal folder with the same name as your login name. By default, other users can see files in that folder but not modify it, feel free to adjust the permissions of the folders. That personal folder will host users analyses data only. After running their analyses, all users are expected to clean their work space, remove any files/folders that are not useful for the upcoming steps of analyses (intermediate files). After finishing all analyses, it is important to take the time to delete any file/folder that is not worth keeping for publications and request a project folder to bioinfo.mib@wur.nl to save the important ones. For those who wants to keep intermediate output files until publication is accepted several options are available,

The work partition IS NOT backed up. It is meant to be a temporary storage. All users are asked to sort their files regularly, back up relevant FILES and delete the remaining ones.

projects partition is meant to host after publication files, the files that should be kept for 10 years and those folder should be kept clean with a README file that describe the content of the folders. Make sure to compress all data text files using either gzip or bzip2. Gzip is faster and more suitable for non redundant files, you can also use the multithreaded version pigz to speed up the compression. bzip2 offers a better compression level and takes a bit more time, compression can be accelerated using the multithreaded version pbzip2. Use bzip2 by default. DO NOT STORE RAW SEQUENCING FILES HERE!

Documentation

Documentation is a very important part of data management, it helps your supervisor and anyone interested in your data knowing what you have already done, what is left to do, where to find the data. Many platforms provide a wiki allowing users to document their work, here are a few options:

gitlab, this is managed by FB-IT @WUR. There is a MIB repository that could also be used jupyter notebook redmine, ask bioinfo.mib@wur.nl for questions galaxy with galaxy pages And many external ones... It is recommended to use internal tools for documentation as they are easily accessible to all and are not subject to external rules. A good tip could also be to print your documentations in a PDF format and place it at the root of the project folder. This is handy to unsure that the documentation is always next to the data. The downside is it involves manual work.

Databases

Bioinformatics database can be of very significant size, therefore it is recommended to manipulate them with care. A database partition exists on the server, admins will store database of common interest there. More specific databases are temporarily stored in the /work/database partition and REMOVED after use. The owner of the database must give permissions to all users (except for students) after creation of the database. to do that use the following command and scheme chmod -R g+rwx PATH_TO_FOLDER, this will help others keeping the DB up-to-date.