Data Management

Jobs executed on the grid produce output files organized within datasets, hosted on storage elements around the world. A dataset is located using its Logical Path Name (LPN).

In the case of the output dataset for jobs submitted with gbasf2, the LPN always starts with /belle/user/<your_username>, followed by the dataset name, which corresponds to the name of the project. A dataset on the grid is subdivided by datablocks subXX, where the last two digits are sequentially iterated.

A set of gb2 tools provided with the gbasf2 installation are intended for the management of the output datasets on the grid. Operations such as downloading, replication or deletion can be performed.

Downloading files

The intended tool for the downloading of files from the grid is gb2_ds_get. Once the jobs of a project have finished with status Done, the output can be downloaded.

If the project name is provided, the gb2_ds tools have been written to conveniently resolve the LPN of the associated output dataset:

$ gb2_ds_get <project_name>

After the confirmation, files will be downloaded by default inside a directory with name as the dataset. You can skip the confirmation using the flag -f. Check Dataset management tools for details and the options available.

Note

If some of the files were not downloaded because of errors, this will be reported at the end of the execution of gb2_ds_get. Please be particularly careful ensuring that all the files have been downloaded, because incomplete datasets will lead to incorrect physics results.

If some of the files contained in the dataset were not downloaded, gb2_ds_get can be executed again. Files already downloaded will be skipped after confirming their integrity using the checksum or the size.

Note

Some datasets are not intended to be downloaded with gb2_ds_get, like raw data files (random access to tape systems can potentially block all our operations). If you require such files in local resources, please contact Belle II Data Production managers.

Deleting datasets

Once your datasets have been downloaded and are no longer required, you can delete them from the grid using gb2_ds_rm.

You can simply specify the project name or the full LPN of the dataset to delete:

$ gb2_ds_rm myProject

$ gb2_ds_rm /belle/user/myUser/myProject

Check Dataset management tools for details and additional options.

Note

Your project will be set for deletion and treated asynchronously. It means the dataset will not disappear immediately after the execution of gb2_ds_rm.

Replica management

Sometimes you may want to replicate datasets. For example:

  • You would like the output dataset of your gbasf2 project to be located at a specific site.

  • You would like your output dataset to be located on a storage element close to where you work, eg. for easier downloading

Use gb2_ds_rep for this. check Dataset management tools for more detail. You can specify the project name or the full LPN of the dataset to replicate:

$ gb2_ds_rep -d <destination_SE> myProject

$ gb2_ds_rep -d <destination_SE> /belle/user/myUser/myProject

Once the command finishes successfully, the replication is taken care of by the system. To monitor the replication status use gb2_ds_rep_status:

$ gb2_ds_rep_status myProject

$ gb2_ds_rep_status /belle/user/myUser/myProject

For example:

$ gb2_ds_rep_status  /belle/user/anil123/v5r3p1_pre1_check_02
LFN/LPN                                         |  OK  |  Replicating  |  Stuck  |  Dest_SE        |
======================================================================================================
/belle/user/anil123/v5r3p1_pre1_check_02/sub00  |   5  |            0  |      0  |  ANY=true       |
/belle/user/anil123/v5r3p1_pre1_check_02/sub00  |   5  |            0  |      0  |  CESNET-TMP-SE  |

In the example above:

  • first line with Dest_SE: Any=True is the datablock registered by the gbasf2 jobs.

  • Second line with Dest_SE: CESNET-TMP-SE is the replica made using gb2_ds_rep.

check Dataset management tools for more detail.