.. highlight:: shell .. _data-management: Data Management *************** Jobs executed on the grid produce output files organized within datasets, hosted on storage elements around the world. A dataset is located using its **Logical Path Name (LPN)**. In the case of the output dataset for jobs submitted with gbasf2, the LPN always starts with ``/belle/user/``, followed by the dataset name, which corresponds to the name of the project. A dataset on the grid is subdivided by datablocks ``subXX``, where the last two digits are sequentially iterated. A set of gb2 tools provided with the gbasf2 installation are intended for the management of the output datasets on the grid. Operations such as downloading, replication or deletion can be performed. Downloading files ================= The intended tool for the downloading of files from the grid is ``gb2_ds_get``. Once the jobs of a project have finished with status Done, the output can be downloaded. If the project name is provided, the ``gb2_ds`` tools have been written to conveniently resolve the LPN of the associated output dataset:: $ gb2_ds_get After the confirmation, files will be downloaded by default inside a directory with name as the dataset. You can skip the confirmation using the flag ``-f``. Check :ref:`bin_ds` for details and the options available. .. note:: If some of the files were not downloaded because of errors, this will be reported at the end of the execution of ``gb2_ds_get``. Please be particularly careful ensuring that all the files have been downloaded, because incomplete datasets will lead to incorrect physics results. If some of the files contained in the dataset were not downloaded, ``gb2_ds_get`` can be executed again. Files already downloaded will be skipped after confirming their integrity using the checksum or the size. .. note:: Some datasets are not intended to be downloaded with ``gb2_ds_get``, like raw data files (random access to tape systems can potentially block all our operations). If you require such files in local resources, please contact Belle II Data Production managers. Deleting datasets ================= Once your datasets have been downloaded and are no longer required, you can delete them from the grid using ``gb2_ds_rm``. You can simply specify the project name or the full LPN of the dataset to delete:: $ gb2_ds_rm myProject $ gb2_ds_rm /belle/user/myUser/myProject Check :ref:`bin_ds` for details and additional options. .. note:: Your project will be set for deletion and treated asynchronously. It means the dataset will not disappear immediately after the execution of ``gb2_ds_rm``. Replica management ================== Sometimes you may want to replicate datasets. For example: - You would like the output dataset of your gbasf2 project to be located at a specific site. - You would like your output dataset to be located on a storage element close to where you work, eg. for easier downloading Use ``gb2_ds_rep`` for this. check :ref:`bin_ds` for more detail. You can specify the project name or the full LPN of the dataset to replicate:: $ gb2_ds_rep -d myProject $ gb2_ds_rep -d /belle/user/myUser/myProject Once the command finishes successfully, the replication is taken care of by the system. To monitor the replication status use ``gb2_ds_rep_status``:: $ gb2_ds_rep_status myProject $ gb2_ds_rep_status /belle/user/myUser/myProject For example:: $ gb2_ds_rep_status /belle/user/anil123/v5r3p1_pre1_check_02 LFN/LPN | OK | Replicating | Stuck | Dest_SE | ====================================================================================================== /belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | ANY=true | /belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | CESNET-TMP-SE | In the example above: - first line with Dest_SE: Any=True is the datablock registered by the gbasf2 jobs. - Second line with Dest_SE: CESNET-TMP-SE is the replica made using ``gb2_ds_rep``. check :ref:`bin_ds` for more detail. Lifetime management =================== Lifetime is the time after which a datablock CAN be deleted. After the expiration of the lifetime datablock is eligible for automatic deletion. At this moment, for user output, lifetime is set to **3 Months** from the point of project completion. It is recommended for user to download the files using ``gb2_ds_get`` and then remove the project files manually using ``gb2_ds_rm``. If you still want to keep the datasets on the grid for longer time than the default value (3 months), you can use ``gb2_ds_rep`` to extend the lifetime. To check when the lifetime expires for your datablocks, you can use ``gb2_ds_rep_status -l``:: $ gb2_ds_rep_status /belle/user/anil123/v5r3p1_pre1_check_02 -l RuleID | account | LFN/LPN | OK | Replicating | Stuck | Dest_SE | Src_SE | Created (UTC) | Expires at (UTC) | =================================================================================================================================================================================================================== 9a97c02aac2d4655bde7b8374982fdc5 | anil123 | /belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | BNL-TMP-SE | None | 2022-10-04 08:59:29 | 2022-10-11 08:59:29 | ``gb2_ds_rep`` will set the lifetime with default value of 1 Month and also provides option to input the lifetime manually:: $ gb2_ds_rep -d myProject --lifetime where, ``xh/d/m = x hours/days/months`` (lifetime expiration time length). Please be aware that no lifetime larger than the maximum of 3 months can be set. Quota management =================== Each user has finite quota in terms of grid storage they can use. There are two kinds of quota: 1. Global quota: Storage limit on all the usage summed-up (i.e. total storage) 2. Local quota: Storage limits on individual SEs You can check your quota using ``gb2_ds_quota`` as:: $ gb2_ds_quota Global Quota: SE | USAGE | LIMIT | QUOTA LEFT | ======================================================= ANY=true | 350.0 GB | 1000.0 GB | 650.0 GB | Local Quota: SE | USAGE | LIMIT | QUOTA LEFT | =========================================================== DESY-TMP-SE | 150 GB | 400 GB | 250.0 GB | Once your quota is completely filled you can't register any files to grid i.e. all of your job fail. To free up quota please delete your projects using ``gb2_ds_rm``, or delete replicas with ``gb2_ds_rm_rep``. A protection has been placed during the gbasf2 submission as: 1. If your usage is at 100% of the global quota, you can't submit any new projects, meaning gbasf2 will fail at the submission point. 2. If your usage is >90% of the global quota, you will see a warning to delete the existing project at the gbasf2 submission point but the jobs will be submitted. - Please be aware of the risk of having jobs failing when registering the output due to reaching the limit in quota. .. warning:: The quota (global) is not active yet, and it will be set in the future with a roughly O(TB) limit. Please keep the usage reasonable to avoid issues with the quota in the future. .. note:: Quota is already active for group data.