Data Management
Jobs executed on the grid produce output files organized within datasets, hosted on storage elements around the world. A dataset is located using its Logical Path Name (LPN).
In the case of the output dataset for jobs submitted with gbasf2, the LPN always starts with /belle/user/<your_username>
,
followed by the dataset name, which corresponds to the name of the project. A dataset on the grid is subdivided by datablocks
subXX
, where the last two digits are sequentially iterated.
A set of gb2 tools provided with the gbasf2 installation are intended for the management of the output datasets on the grid. Operations such as downloading, replication or deletion can be performed.
Downloading files
The intended tool for the downloading of files from the grid is gb2_ds_get
. Once the jobs of a project have finished
with status Done, the output can be downloaded.
If the project name is provided, the gb2_ds
tools have been written to conveniently resolve the LPN of the
associated output dataset:
$ gb2_ds_get <project_name>
After the confirmation, files will be downloaded by default inside a directory with name as the dataset.
You can skip the confirmation using the flag -f
. Check Dataset management tools for details and the options available.
Note
If some of the files were not downloaded because of errors, this will be reported at the end of the execution
of gb2_ds_get
. Please be particularly careful ensuring that all the files have been downloaded, because incomplete
datasets will lead to incorrect physics results.
If some of the files contained in the dataset were not downloaded, gb2_ds_get
can be executed again. Files already
downloaded will be skipped after confirming their integrity using the checksum or the size.
Note
Some datasets are not intended to be downloaded with gb2_ds_get
, like raw data files (random access to
tape systems can potentially block all our operations).
If you require such files in local resources, please contact Belle II Data Production managers.
Deleting datasets
Once your datasets have been downloaded and are no longer required, you can delete them from the grid using gb2_ds_rm
.
You can simply specify the project name or the full LPN of the dataset to delete:
$ gb2_ds_rm myProject
$ gb2_ds_rm /belle/user/myUser/myProject
Check Dataset management tools for details and additional options.
Note
Your project will be set for deletion and treated asynchronously. It means the dataset will not disappear immediately after the
execution of gb2_ds_rm
.
Replica management
Sometimes you may want to replicate datasets. For example:
You would like the output dataset of your gbasf2 project to be located at a specific site.
You would like your output dataset to be located on a storage element close to where you work, eg. for easier downloading
Use gb2_ds_rep
for this. check Dataset management tools for more detail.
You can specify the project name or the full LPN of the dataset to replicate:
$ gb2_ds_rep -d <destination_SE> myProject
$ gb2_ds_rep -d <destination_SE> /belle/user/myUser/myProject
Once the command finishes successfully, the replication is taken care of by the system. To monitor the replication status use gb2_ds_rep_status
:
$ gb2_ds_rep_status myProject
$ gb2_ds_rep_status /belle/user/myUser/myProject
For example:
$ gb2_ds_rep_status /belle/user/anil123/v5r3p1_pre1_check_02
LFN/LPN | OK | Replicating | Stuck | Dest_SE |
======================================================================================================
/belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | ANY=true |
/belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | CESNET-TMP-SE |
In the example above:
first line with Dest_SE: Any=True is the datablock registered by the gbasf2 jobs.
Second line with Dest_SE: CESNET-TMP-SE is the replica made using
gb2_ds_rep
.
check Dataset management tools for more detail.