.. highlight:: shell .. _data-management: Data Management *************** Jobs executed on the grid produce output files organized within datasets, hosted on storage elements around the world. A dataset is located using its **Logical Path Name (LPN)**. In the case of the output dataset for jobs submitted with gbasf2, the LPN always starts with ``/belle/user/``, followed by the dataset name, which corresponds to the name of the project. A dataset on the grid is subdivided by datablocks ``subXX``, where the last two digits are sequentially iterated. A set of gb2 tools provided with the gbasf2 installation are intended for the management of the output datasets on the grid. Operations such as downloading, replication or deletion can be performed. Downloading files ================= The intended tool for the downloading of files from the grid is ``gb2_ds_get``. Once the jobs of a project have finished with status Done, the output can be downloaded. If the project name is provided, the ``gb2_ds`` tools have been written to conveniently resolve the LPN of the associated output dataset:: $ gb2_ds_get After the confirmation, files will be downloaded by default inside a directory with name as the dataset. You can skip the confirmation using the flag ``-f``. Check :ref:`bin_ds` for details and the options available. .. note:: If some of the files were not downloaded because of errors, this will be reported at the end of the execution of ``gb2_ds_get``. Please be particularly careful ensuring that all the files have been downloaded, because incomplete datasets will lead to incorrect physics results. If some of the files contained in the dataset were not downloaded, ``gb2_ds_get`` can be executed again. Files already downloaded will be skipped after confirming their integrity using the checksum or the size. .. note:: Some datasets are not intended to be downloaded with ``gb2_ds_get``, like raw data files (random access to tape systems can potentially block all our operations). If you require such files in local resources, please contact Belle II Data Production managers. Deleting datasets ================= Once your datasets have been downloaded and are no longer required, you can delete them from the grid using ``gb2_ds_rm``. You can simply specify the project name or the full LPN of the dataset to delete:: $ gb2_ds_rm myProject $ gb2_ds_rm /belle/user/myUser/myProject Check :ref:`bin_ds` for details and additional options. .. note:: Your project will be set for deletion and treated asynchronously. It means the dataset will not disappear immediately after the execution of ``gb2_ds_rm``. Replica management ================== Sometimes you may want to replicate datasets. For example: - You would like the output dataset of your gbasf2 project to be located at a specific site. - You would like your output dataset to be located on a storage element close to where you work, eg. for easier downloading Use ``gb2_ds_rep`` for this. check :ref:`bin_ds` for more detail. You can specify the project name or the full LPN of the dataset to replicate:: $ gb2_ds_rep -d myProject $ gb2_ds_rep -d /belle/user/myUser/myProject Once the command finishes successfully, the replication is taken care of by the system. To monitor the replication status use ``gb2_ds_rep_status``:: $ gb2_ds_rep_status myProject $ gb2_ds_rep_status /belle/user/myUser/myProject For example:: $ gb2_ds_rep_status /belle/user/anil123/v5r3p1_pre1_check_02 LFN/LPN | OK | Replicating | Stuck | Dest_SE | ====================================================================================================== /belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | ANY=true | /belle/user/anil123/v5r3p1_pre1_check_02/sub00 | 5 | 0 | 0 | CESNET-TMP-SE | In the example above: - first line with Dest_SE: Any=True is the datablock registered by the gbasf2 jobs. - Second line with Dest_SE: CESNET-TMP-SE is the replica made using ``gb2_ds_rep``. check :ref:`bin_ds` for more detail.