.. highlight:: shell

.. _running-jobs:

Submitting jobs to the grid
***************************

GBasf2 is an extension of Basf2, from your desktop to the grid.
The same steering files used with Basf2 in your local environment can be used with GBasf2 on the grid.

The usual workflow is
developing a Basf2 steering file,
testing it locally,
and then submitting the jobs to the grid with the same steering file.

.. note::
   Before starting, please understand the following:

    * The grid is NOT a local computing system like KEKCC.

    * Once you submit jobs, your jobs will be assigned to computing systems around the world.

    * If you job is problematic, it will be distributed to the world and all sites will be affected.

    * **Therefore, you must check your jobs in a local computing system carefully before you submit them to the grid!**


.. note::

  If any issues occur, contact the `users forum <comp-users-forum@belle2.org>`_ for assistance.

    To receive assistance as quickly as possible, before posting in the forum, you can:

    * Check the `gbasf2 troubleshooting <https://xwiki.desy.de/xwiki/bin/view/BI/Belle%20II%20Internal/Computing%20WebHome/DistributedComputing/Computing%20GBasf2/GBasf2%20Troubleshooting/>`_ page for solutions to the issue.
    * Look to see if your issue has already been posted in the `gbasf2 FAQ <https://xwiki.desy.de/xwiki/bin/view/BI/Belle%20II%20Internal/Computing%20WebHome/DistributedComputing/Computing%20GBasf2/GBasf2%20FAQ/>`_ page or at `questions.belle2.org <https://questions.belle2.org/questions/>`_.

    You should also include all the details the experts will need to diagnose the problem, such as your user name, project name, etc.


Job submission
==============

A command line client, ``gbasf2``, is used for submitting grid-based Basf2 jobs. The basic usage is::

    $ gbasf2 -p <project_name> -s <basf2_release> <steering_file>

where ``<project_name>`` is a name assigned by you, and ``<basf2_release>`` is the available Basf2 software version to use.

.. note::

  Your project name must be unique and cannot be reused, even if the project is deleted.

.. note::

  Please, do not use special characters in the project names (``$, #, %, /,`` etc.),
  as they could create problems with file names in some sites and in the databases.

You can always use the flags ``-h`` and ``--usage`` to see a full list of available options and examples.

If the submission is correct, you will get a summary of your project with information about the number of jobs,
etc.

.. warning::
  Once again: before submitting jobs to the grid, be sure that your script works well on a local computer!
  

.. warning::
  If you do not set the CPU time or event throughput of your jobs manually, ``gbasf2`` sets a value as the CPU time for the jobs. Usually the estimated time is much larger than the actual time required for your jobs, for it needs to cover heavier use cases. This may prevent your jobs of being started on some sites! So you should consider using either the ``--cputime`` or the ``--evtpersec`` option. 


Jobs with input files
=====================

Submitting jobs with a single dataset/datablock is performed using the argument ``-i``::

    $ gbasf2 <steering_file> -p <project_name> -s <basf2_release> -i <input_LFN>

For example::

    $ gbasf2 myscript.py -p myproject_v1 -s light-2110-tartarus  -i /belle/MC/release-05-02-11/DB00001363/SkimM14ri_ax1/prod00020320/e1003/4S/r00000/charged/18360100/udst


List of datasets as input
-------------------------

If you want to use a list of datasets, like the ones obtained with the :ref:`dataset-searcher`,
you can store the list in a file and submit jobs with ``--input_dslist``::

    $ gbasf2 <steering_file> -p myproject -s release --input_dslist <LFN_list>


.. note::
    If gbasf2 warns that there is no input data, it may indicate that all of the input files you specified are marked as “bad”
    (not part of the good run list).


Input from the Dataset Searcher
-------------------------------

If the metadata of the datasets desired is known, one additional possibility is performing queries
to the Dataset Searcher directly during the gbasf2 submission.

The metadata can be specified with ``--input_ds_search``::

    $ gbasf2 <steering_file> -p myproject -s release --input_ds_search='metadata1=value;metadata2=value;exp=expLow:expHigh;run=runLow:runHigh'

gbasf2 will search the use datasets matching the query as input. The available attributes and values for performing the
queries are the same described at :ref:`bin_dssearcher`.

Dataset Collections
===================

Collection path works same as any other path, except only a single collection can be used in a single project.
Submitting jobs with a collection is performed using the argument ``-i``::

    $ gbasf2 <steering_file> -p <project_name> -s <basf2_release> -i <input_collection_path>

Additional Options
==================

Additional options for advanced usage of gbasf2 are described here, such as adding files into the input sandbox
or selection of the environment for the execution. The full list of available options for gbasf2 (and any gb2 tools)
can be always retrieved with ``--help`` and ``--usage``.

You can also check the command-line reference :ref:`bin_gbasf2`.

Submit jobs with multiple input files per job
---------------------------------------------

The option ``-n`` specifies the number of input files per job.

You may specify the number of input files to be fed to each job with in this limit, as far as the job finishes
within ~10 hours on a standard node. There is a limit of 5 GB on the total of the input file size.

Suggested maximum number of input files is 10. Otherwise your project could
hammer down a grid site by heavy file access.

Per default, input files are grouped by site (``--input_grouping site``). You can change it with ``--input_grouping size``, which groups input files per site and production until the maximum number of input files (``-n``) or the 5GB limit is reached.

.. note::
    Be aware that the meaning of the gbasf2 option ``-n`` is different from that of the basf2 option (number of events).

.. warning::

    If the input files come from multiple campaigns (like proc12 + bucket21, for example), you NEED to change the grouping of input files with ``--input_grouping size``. Otherwise, jobs will fail.
    See `comp-users-forum:2255 <https://lists.belle2.org/sympa/arc/comp-users-forum/2021-09/msg00078.html>`_ for details.

Passing basf2 parameters
------------------------

If you need to pass arguments to the basf2 steering file, option ``--basf2opt`` is available. For example::

    $ gbasf2 <steeringFile> --basf2opt='-n 100'

will process only 100 events per job.


Setting the CPU time
--------------------

To prevent your jobs from being stuck in waiting caused by an overestimated CPU time, you can set either the ``--cputime`` or the ``--evtpersec`` option.

* The option ``--cputime`` is to be set the expected cpu time consumption of the individual jobs in minutes in the normalized unit, common to all the jobs in the project. Good for processing run-independent data.
* The option ``--evtpersec`` is to set the expected the throughput, that is, the number of events processed per second in the normalized unit, the cputime of each job is calculated with the average number of events in the input (total number of events divided by the number of input files).

To get a proper esimate of the required CPU time one has to multiply the jobs runtime on KEKCC with the normalization factor of the KEKCC nodes of 20, e.g.::

    cputime = 20 * <total time on KEKCC in minutes>

So if a job is expected to run 1 hour on KEKCC you should specify 60 min. * 20 = 1200::

    $ gbasf2 <steeringFile> <other options...> --cputime 1200

The same normalization factor 20 needs be applied to the value for the option ``--evtpersec``. This time can be calculated in a similar way::

    evtpersec = nevents / (20 * <total time on KEKCC in seconds>)

.. note::
    The estimation of the run time on KEKCC can be done via copying one of the input files to KEKCC and running your script over it locally.


Adding files in the input sandbox
---------------------------------

The input sandbox is delivered in the sites that will execute the jobs of your project. It contains all the files
required during the execution, such as your steering file and additional dependencies.

If you need to attach a file in the input sandbox, like a ``.dec`` file or a required library, you can use the option
``-f`` (``--input_sandboxfiles``) together with the option ``--lfn_sandboxfiles``. The project summary will display the files attached for confirmation.

Using the option ``--lfn_sandboxfiles``, the files are uploaded on the Grid before job submission. 
Otherwise they are uploaded to the DIRAC SandboxStore, that can be overloaded.
You can also specify files that are already on the Grid as input sandbox files to the option 
``-f`` (``--input_sandboxfiles``) by adding the prefix ``LFN:`` to the grid name of the files (logical file names, or LFNs).


Submit jobs with my own basf2 module
------------------------------------

Executing basf2 on the grid with your own module is possible attaching the required libraries in the
input sandbox.

Create your own module and compile it following the instructions as in the
`basf2 manual <https://software.belle2.org/development/sphinx/build/tools_doc/index-01-tools.html#development-setup>`_.
Once compiled, find the ``.so`` and ``.b2mod`` files (usually below ``modules/Linux_x86_64/opt/``) and copy them in your
local directory.

You need to add a reference to your module in your basf2 steering file, like::

    import basf2 as b2
    b2.register_module('myModule' ,shared_lib_path='./myModule.so')
    path.add_module('myModule')

and include the ``.so`` and ``.b2mod`` files in the input sandbox using the option ``-f`` during the gbasf2 submission::

    $ gbasf2 <steeringFile> -f="myModule.so, myModule.b2mod" ... --lfn_sandboxfiles


.. note::
    Submitting jobs with compiled modules require to specify the platform in which the libraries were compiled using
    ``--resource_tag``. For example, for EL8::

        $ gbasf2 <steeringFile> --resource_tag EL8 ...

    Read more about the resource requirements in the section below.

.. note::
    If you have written a new module or variable, please consider to share it with other collaborators
    `submitting a pull request <https://software.belle2.org/development/sphinx/build/tools_doc/index-01-tools.html#opening-a-pull-request>`_.
    Then the new feature will be available in coming basf2 releases.
    

Resource requirements
---------------------

DIRAC chooses the site to run your job, based on the input file location and the environment of the job slot.
Additionally, resource requirements can be set with the options ``--resource_tag`` or ``--platform``. This can be particularly relevant if
a compiled library is used, for example, as the library must be compatible with the site's environment, 
or an old release is used, as it may not run at some sites with rather new OSes.

.. _resource-tag-label:

Resource Tag
++++++++++++

The option ``--resource_tag`` is used to specify the environment in which the job will be executed. The available options are:
``EL7``, ``EL7orEL8``, ``EL8``, ``EL9``.

When a resource tag is set, the job will be executed either (i) in sites with the specified environment, or (ii) inside an `Apptainer <https://apptainer.org/>`_
container when the site has a different environment than the one specified. This will ensure that the job is executed in the specified environment.

.. note::
    Some sites may not allow running containers with an OS after its end-of-life (such as CentOS7). Specifying such an OS would result in less resources available for the jobs.
    We recommend to use a valid (non-obsolete) OS for compiling any libraries attached to the input sandbox.

For example, if you need to use a Basf2 release not available for EL9, such as release-06, light-2210-devonrex or light-2212-foldex, you can avoid your jobs being assigned to the EL9 sites by setting the resource tag to EL7orEL8::

    $ gbasf2 <steeringFile> --resource_tag EL7orEL8 ...

Platform
++++++++

The option ``--platform`` is used to specify the operating system of the site where the job will be executed. The available options are:
``EL7``, ``Linux_x86_64_glibc2.17``, ``EL8``, ``Linux_x86_64_glibc-2.28``, ``EL9``, ``Linux_x86_64_glibc-2.34``, ``ANY``

Other values with the the form ``Linux_x86_64_glibc-X.YY`` may work for newer OSes, but are not guaranteed.

Using ``--platform`` will NOT launch a container with the specified platform, but will only match the jobs with
the sites that have the specified platform. Keep in mind it will reduce the number of sites where your jobs can run,
possibly increasing the waiting time, or even disabling some of the jobs to be assigned to any sites.