Data Management
Jobs executed on the grid produce output files organized within datasets, hosted on storage elements around the world. A dataset is located using its Logical Path Name (LPN).
In the case of the output dataset for jobs submitted with gbasf2, the LPN always starts with /belle/user/<your_username>,
followed by the dataset name, which corresponds to the name of the project. A dataset on the grid is subdivided by datablocks
subXX, where the last two digits are sequentially iterated.
A set of gb2 tools provided with the gbasf2 installation are intended for the management of the output datasets on the grid. Operations such as downloading, replication or deletion can be performed.
Downloading files
The intended tool for the downloading of files from the grid is gb2_ds_get. Once the jobs of a project have finished
with status Done, the output can be downloaded.
If the project name is provided, the gb2_ds tools have been written to conveniently resolve the LPN of the
associated output dataset:
$ gb2_ds_get <project_name>
After the confirmation, files will be downloaded by default inside a directory with name as the dataset.
You can skip the confirmation using the flag -f. Check Dataset management tools for details and the options available.
Note
If some of the files were not downloaded because of errors, this will be reported at the end of the execution
of gb2_ds_get. Please be particularly careful ensuring that all the files have been downloaded, because incomplete
datasets will lead to incorrect physics results.
If some of the files contained in the dataset were not downloaded, gb2_ds_get can be executed again. Files already
downloaded will be skipped after confirming their integrity using the checksum or the size.
Note
Some datasets are not intended to be downloaded with gb2_ds_get, like raw data files (random access to
tape systems can potentially block all our operations).
If you require such files in local resources, please contact Belle II Data Production managers.
Deleting datasets
Once your datasets have been downloaded and are no longer required, you can delete them from the grid using gb2_ds_rm.
You can simply specify the project name or the full LPN of the dataset to delete:
$ gb2_ds_rm myProject
$ gb2_ds_rm /belle/user/myUser/myProject
Check Dataset management tools for details and additional options.
Note
Your project will be set for deletion and treated asynchronously. It means the dataset will not disappear immediately after the
execution of gb2_ds_rm.
Replica management
Sometimes you may want to replicate datasets. For example:
You would like the output dataset of your gbasf2 project to be located at a specific site.
You would like your output dataset to be located on a storage element close to where you work, eg. for easier downloading
Use gb2_ds_rep for this. check Dataset management tools for more detail.
You can specify the project name or the full LPN of the dataset to replicate:
$ gb2_ds_rep -d <destination_SE> myProject
$ gb2_ds_rep -d <destination_SE> /belle/user/myUser/myProject
Once the command finishes successfully, the replication is taken care of by the system. To monitor the replication status use gb2_ds_rep_status:
$ gb2_ds_rep_status myProject
$ gb2_ds_rep_status /belle/user/myUser/myProject
For example:
$ gb2_ds_rep_status /belle/user/anil123/v5r3p1_pre1_check_02
LFN/LPN | OK | Replicating | Stuck | Dest_SE |
======================================================================================================
/belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | ANY=true |
/belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | CESNET-TMP-SE |
In the example above:
first line with Dest_SE: Any=True is the datablock registered by the gbasf2 jobs.
Second line with Dest_SE: CESNET-TMP-SE is the replica made using
gb2_ds_rep.
check Dataset management tools for more detail.
Lifetime management
Lifetime is the time after which a datablock CAN be deleted. After the expiration of the lifetime datablock is eligible for automatic deletion.
At this moment, for user output, lifetime is set to 3 Months from the point of project completion.
It is recommended for user to download the files using gb2_ds_get and then remove the project files manually using gb2_ds_rm.
If you still want to keep the datasets on the grid for longer time than the default value (3 months), you can use gb2_ds_rep to extend the lifetime.
To check when the lifetime expires for your datablocks, you can use gb2_ds_rep_status -l:
$ gb2_ds_rep_status /belle/user/anil123/v5r3p1_pre1_check_02 -l
RuleID | account | LFN/LPN | OK | Replicating | Stuck | Dest_SE | Src_SE | Created (UTC) | Expires at (UTC) |
===================================================================================================================================================================================================================
9a97c02aac2d4655bde7b8374982fdc5 | anil123 | /belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | BNL-TMP-SE | None | 2022-10-04 08:59:29 | 2022-10-11 08:59:29 |
gb2_ds_rep will set the lifetime with default value of 1 Month and also provides option to input the lifetime manually:
$ gb2_ds_rep -d <destination_SE> myProject --lifetime <xh/d/m>
where, xh/d/m = x hours/days/months (lifetime expiration time length). Please be aware that no lifetime larger than the maximum of 3 months can be set.
Quota management
Each user has finite quota in terms of grid storage they can use. There are two kinds of quota:
Global quota: Storage limit on all the usage summed-up (i.e. total storage)
Local quota: Storage limits on individual SEs
You can check your quota using gb2_ds_quota as:
$ gb2_ds_quota
Global Quota:
SE | USAGE | LIMIT | QUOTA LEFT |
=======================================================
ANY=true | 350.0 GB | 1000.0 GB | 650.0 GB |
Local Quota:
SE | USAGE | LIMIT | QUOTA LEFT |
===========================================================
DESY-TMP-SE | 150 GB | 400 GB | 250.0 GB |
Once your quota is completely filled you can’t register any files to grid i.e. all of your job fail. To free up quota please delete your projects using gb2_ds_rm, or delete replicas with gb2_ds_rm_rep.
A protection has been placed during the gbasf2 submission as:
If your usage is at 100% of the global quota, you can’t submit any new projects, meaning gbasf2 will fail at the submission point.
- If your usage is >90% of the global quota, you will see a warning to delete the existing project at the gbasf2 submission point but the jobs will be submitted.
Please be aware of the risk of having jobs failing when registering the output due to reaching the limit in quota.
Warning
The quota (global) is not active yet, and it will be set in the future with a roughly O(TB) limit. Please keep the usage reasonable to avoid issues with the quota in the future.
Note
Quota is already active for group data.