Monitor jobs
We assume that you have submitted jobs that are currently running. The idea now is to know the status of your jobs.
Email notifications
If you have used the options --mail-user & --mail-type, you should get email notification when your job reaches a specific status. When --mail-type is set to ALL you will be notified for every status of your job. Here are the possible values for --mail-type,
- BEGIN
- When job is allocated
- END
- When job is done
- FAIL
- When job has fail
- REQUEUE
- When a fail or finished job is put back into the queue
- STAGE_OUT
- Burst buffer stage out and teardown completed
- TIME_LIMIT
- When job has reached 100% of time limit
- TIME_LIMIT_90
- When job has reached 90% of time limit
- TIME_LIMIT_80
- When job has reached 80% of time limit
- TIME_LIMIT_50
- When job has reached 50% of time limit
- ARRAY_TASKS
- Send emails for each array task
- ALL
- BEGIN, END, FAIL, REQUEUE, STAGE_OUT
Have a look at the help of the command SBATCH to know more about notifications.
Status of submitted jobs
squeue
Check the status of the queue using.
scancel
Cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
sstat
Prints information about the resources utilized by a running job or job step.
sacct
Reports job or job step accounting information about active or completed jobs.
1 # Print admin info about a job
2 sacct -j <<JOBID or comma separated list of JOBIDs>>
3
4
5 # Print CPU time and memory usage of a job
6 sacct --format="CPUTime,MaxRSS" -j <<JOBID>>
7
8 # Print a list of all the job a user has been running
9 sacct -u <<USERNAME>>
10
11 # learn more about sacct options
12 man sacct
Status of nodes
Use the command sinfo to report the state of partitions and nodes.