<<TableOfContents: execution failed [list index out of range] (see also the log)>>
We assume that you have submitted jobs that are currently running. The idea now is to know the status of your jobs.
Email notifications
If you have used the options --mail-user & --mail-type, you should get email notification when your job reaches a specific status. When --mail-type is set to ALL you will be notified for every status of your job. Here are the possible values for --mail-type,
- BEGIN
- When job is allocated
- END
- When job is done
- FAIL
- When job has fail
- REQUEUE
- When a fail or finished job is put back into the queue
- STAGE_OUT
- Burst buffer stage out and teardown completed
- TIME_LIMIT
- When job has reached 100% of time limit
- TIME_LIMIT_90
- When job has reached 90% of time limit
- TIME_LIMIT_80
- When job has reached 80% of time limit
- TIME_LIMIT_50
- When job has reached 50% of time limit
- ARRAY_TASKS
- Send emails for each array task
- ALL
- BEGIN, END, FAIL, REQUEUE, STAGE_OUT
Have a look at the help of the command SBATCH to know more about notifications.
Status of submitted jobs
squeue
Check the status of the queue using.
1 # Print all jobs currently in the queue
2 squeue
3
4 # Check owned jobs in the queue
5 squeue -u <<USERNAME>>
6
7 # Check a list of jobs
8 squeue --jobs <<JOBID or comma separated list of JOBIDs>>
9
10 # You can format the output of squeue. For example to get more details about the running jobs use,
11 squeue -o "%.7i %.9P %.8j %.8u %.2t %.10M %.6D %C %.10m"
12
13 # For more info about squeue
14 man squeue
scancel
Cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
sstat
Prints information about the resources utilized by a running job or job step.
sacct
Reports job or job step accounting information about active or completed jobs.
1 # Print admin info about a job
2 sacct -j <<JOBID or comma separated list of JOBIDs>>
3
4
5 # Print CPU time and memory usage of a job
6 sacct --format="CPUTime,MaxRSS" -j <<JOBID>>
7
8 # Print a list of all the job a user has been running
9 sacct -u <<USERNAME>>
10
11 # learn more about sacct options
12 man sacct
Status of nodes
Use the command sinfo to report the state of partitions and nodes.