mibPotatoWiki

Monitor jobs

We assume that you have submitted jobs that are currently running. The idea now is to know the status of your jobs.

Email notifications

If you have used the options --mail-user & --mail-type, you should get email notification when your job reaches a specific status. When --mail-type is set to ALL you will be notified for every status of your job. Here are the possible values for --mail-type,

BEGIN
When job is allocated
END
When job is done
FAIL
When job has fail
REQUEUE
When a fail or finished job is put back into the queue
STAGE_OUT
Burst buffer stage out and teardown completed
TIME_LIMIT
When job has reached 100% of time limit
TIME_LIMIT_90
When job has reached 90% of time limit
TIME_LIMIT_80
When job has reached 80% of time limit
TIME_LIMIT_50
When job has reached 50% of time limit
ARRAY_TASKS
Send emails for each array task
ALL
BEGIN, END, FAIL, REQUEUE, STAGE_OUT

Have a look at the help of the command SBATCH to know more about notifications.

Status of submitted jobs

squeue

Check the status of the queue using.

   1 # Print all jobs currently in the queue
   2 squeue
   3 
   4 # Check owned jobs in the queue
   5 squeue -u <<USERNAME>>
   6 
   7 # Check a list of jobs
   8 squeue --jobs <<JOBID or comma separated list of JOBIDs>>
   9 
  10 # You can format the output of squeue. For example to get more details about the running jobs use,
  11 squeue -o "%.7i %.9P %.8j %.8u %.2t %.10M %.6D %C %.10m"
  12 
  13 # For more info about squeue
  14 man squeue

scancel

Cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.

   1 scancel <<JOBID or comma separated list of JOBIDs>>
   2 
   3 # learn more about scancel option
   4 man scancel

sstat

Prints information about the resources utilized by a running job or job step.

   1 sstat -j <<JOBID or comma separated list of JOBIDs>>
   2 
   3 # learn more about sstat
   4 man sstat

sacct

Reports job or job step accounting information about active or completed jobs.

   1 # Print admin info about a job
   2 sacct -j <<JOBID or comma separated list of JOBIDs>>
   3 
   4 
   5 # Print CPU time and memory usage of a job
   6 sacct --format="CPUTime,MaxRSS" -j <<JOBID>>
   7 
   8 # Print a list of all the job a user has been running
   9 sacct -u <<USERNAME>>
  10 
  11 # learn more about sacct options
  12 man sacct

Status of nodes

Use the command sinfo to report the state of partitions and nodes.

  1. Computers description

  2. Request an account

  3. storage description

  4. First login

  5. Receiving sequencing data

  6. transferring data

  7. Submitting jobs

  8. Monitor jobs

  9. Install tools

  10. Space management

#top

mibPotatoWiki: Monitor jobs (last edited 2023-04-25 11:29:10 by fhoma)