Job Management
Monitoring jobs
There are two methods to check the status of jobs, via the command line and via the web browser.
Command line
The command line gb2_project_summary
summarizes the status of your most recent projects. For example:
$ gb2_project_summary
Project Owner Status Done Fail Run Wait Submission Time(UTC) Duration
==================================================================================================
awesome_analysis myuser Good 300 0 0 0 2020-01-31 01:21:17 00:05:33
BSM_discovery myuser Bad 20 50 0 0 2020-01-31 01:21:17 00:05:33
If you want to look into the details for a given project, you can use gb2_job_status
, with the flag -p <project_name>
:
$ gb2_job_status -p awesome_analysis
300 jobs are selected.
Job id Status MinorStatus ApplicationStatus Site
========================================================================
133833224 Done Execution Complete Done ARC.KIT.de
133833225 Done Execution Complete Done ARC.KIT.de
133833226 Done Execution Complete Done ARC.KIT.de
...
Web browser
Alternately, you can check your jobs with the job monitor in the DIRAC web portal (https://dirac.cc.kek.jp:8443/DIRAC).
Open the portal, go to the Menu and then to Applications/Job Monitor. You will have to click on ‘Submit’ to display the information. You should be able to see something like this:
Once the status of your jobs is ‘Done’, you can download the output using gb2_ds_get <project_name>
. See the next
section Data Management for details.
Rescheduling jobs
When some of your jobs end with status Failed, most probably there was a temporal issue on the site where they were
being executed. In that case, you need to reschedule these jobs using gb2_ds_reschedule
:
$ gb2_job_reschedule -p <project_name>
Do you want to reschedule <project_name> project and N jobs?
Please type [Y] or [N]: y
N jobs rescheduled
After the confirmation, your jobs will be rescheduled with the same jobID.
Another mechanism for rescheduling jobs is the job monitor in DIRAC web portal. Select the failed jobs and click the button ‘Reschedule’.
There is a limit on the number of reschedules (20), after reaching the limit your jobs will not be rescheduled again. You can contact the users forum , an expert shifter will review if your jobs can be recoverable after reaching the limit.
Killing and deleting jobs
In case you would like to kill jobs running on the grid, use the command gb2_job_kill
:
$ gb2_job_kill -p <project_name>
For deleting the jobs information from the job databases (because the project was submitted with an error, for example),
use gb2_job_delete
:
$ gb2_job_delete -p <project_name>
Jobs being executed will be killed before the deletion.
Note
Please be aware that gb2_job_delete
will not delete output files, only the information displayed with monitoring
tools such as gb2_project_summary
or the DIRAC job monitor in the web portal.
Check all the available options in the Job management tools.