Job Management

Monitoring jobs

There are two methods to check the status of jobs, via the command line and via the web browser.

Command line

The command line gb2_project_summary summarizes the status of your most recent projects. For example:

$ gb2_project_summary

Project            Owner    Status   Done   Fail   Run   Wait   Submission Time(UTC)   Duration
==================================================================================================
awesome_analysis   myuser   Good     300    0      0     0      2020-01-31 01:21:17    00:05:33
BSM_discovery      myuser   Bad      20     50     0     0      2020-01-31 01:21:17    00:05:33

If you want to look into the details for a given project, you can use gb2_job_status, with the flag -p <project_name>:

$ gb2_job_status -p awesome_analysis
300 jobs are selected.

Job id     Status      MinorStatus       ApplicationStatus      Site
========================================================================
133833224   Done     Execution Complete   Done                ARC.KIT.de
133833225   Done     Execution Complete   Done                ARC.KIT.de
133833226   Done     Execution Complete   Done                ARC.KIT.de
...

Web browser

Alternately, you can check your jobs with the job monitor in the DIRAC web portal (https://dirac.cc.kek.jp:8443/DIRAC).

Open the portal, go to the Menu and then to Applications/Job Monitor. You will have to click on ‘Submit’ to display the information. You should be able to see something like this:

_images/BelleDIRACportal.png

Once the status of your jobs is ‘Done’, you can download the output using gb2_ds_get <project_name>. See the next section Data Management for details.

Rescheduling jobs

When some of your jobs end with status Failed, most probably there was a temporal issue on the site where they were being executed. In that case, you need to reschedule these jobs using gb2_ds_reschedule:

$ gb2_job_reschedule -p <project_name>

Do you want to reschedule <project_name> project and N jobs?
Please type [Y] or [N]: y
N jobs rescheduled

After the confirmation, your jobs will be rescheduled with the same jobID.

Another mechanism for rescheduling jobs is the job monitor in DIRAC web portal. Select the failed jobs and click the button ‘Reschedule’.

There is a limit on the number of reschedules (20), after reaching the limit your jobs will not be rescheduled again. You can contact the users forum , an expert shifter will review if your jobs can be recoverable after reaching the limit.

Killing and deleting jobs

In case you would like to kill jobs running on the grid, use the command gb2_job_kill:

$ gb2_job_kill -p <project_name>

For deleting the jobs information from the job databases (because the project was submitted with an error, for example), use gb2_job_delete:

$ gb2_job_delete -p <project_name>

Jobs being executed will be killed before the deletion.

Note

Please be aware that gb2_job_delete will not delete output files, only the information displayed with monitoring tools such as gb2_project_summary or the DIRAC job monitor in the web portal.

Check all the available options in the Job management tools.