Datasets Collection
Collection is a single path refering to set of datasets that is intended for analysis.
Advantages of using collection:
Intituive for user.
Collection is immutable resulting in analysis reproducibility.
Collection has metadata associated to it. Mainly integrated luninosity.
Centrally produced implying correctness.
Collections are centrally made by Data Production(DP) team. Refer to https://confluence.desy.de/display/BI/Collection+summary for more info.
Collections name starts with /belle/collection/
Types of Collection
- MC :
These Collection are for MC datasets. The path looks like :
/belle/collection/MC/<collection_name>
.
- Data:
These Collection are for Data datasets. The path looks like :
/belle/collection/Data/<collection_name>
There are other type of collection (like test) but are not intended for analysis, thus do not use.
Collections’ command-line Tools
You can use gb2_ds_search collection
for collection search commands.
Tools for searching collection and getting info about collection is available Dataset Searcher tools.
To list available collection use
gb2_ds_search collection --list_all_collections /belle/collection/<type>/*
$gb2_ds_search collection --list_all_collection /belle/collection/MC/* /belle/collection/MC/MC14rd_ccbar_Moriond2022_4S_offres_v1 /belle/collection/MC/MC14rd_ccbar_Moriond2022_v1 /belle/collection/MC/MC14rd_charged_Moriond2022_v1 /belle/collection/MC/MC14rd_ddbar_Moriond2022_4S_offres_v1 /belle/collection/MC/MC14rd_ddbar_Moriond2022_4S_offres_v1 ...
To get metadata of collection use
gb2_ds_search collection --get_metadata <collection_path>
. The metadata containsint_luminosity` and ``description
for extra info.:$ gb2_ds_search collection --get_metadata /belle/collection/Data/proc13_chunk1_had_4S_v1 ########## Metadata of Collection ############### dataLevel: mdst description: Collection for proc13 - exp 7,8,10 - 4S - hadron events campaign: proc13 dataType: data skimDecayMode: int_luminosity: 8.609 /fb generalSkimName: hadron #################################################
To see the what datasets are in a collection use
gb2_ds_search collection --list_datasets <collection_path>
orgb2_ds_list <collection_path>
.$ gb2_ds_search collection --list_datasets /belle/collection/MC/MC14rd_ccbar_Moriond2022_v1 /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00694/ccbar/mdst /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00695/ccbar/mdst /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00722/ccbar/mdst /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00723/ccbar/mdst /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00724/ccbar/mdst /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00726/ccbar/mdst /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00727/ccbar/mdst /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00728/ccbar/mdst ....
Refer to “Dataset Collections” part in :ref:`running-jobs` on how to submit gbasf2 jobs using collection