Datasets Collection

Collection is a single path refering to set of datasets that is intended for analysis.

_images/collection_def.png

Advantages of using collection:

  • Intituive for user.

  • Collection is immutable resulting in analysis reproducibility.

  • Collection has metadata associated to it. Mainly integrated luninosity.

  • Centrally produced implying correctness.

Collections are centrally made by Data Production(DP) team. Refer to https://confluence.desy.de/display/BI/Collection+summary for more info. Collections name starts with /belle/collection/

Types of Collection

  • MC :

    These Collection are for MC datasets. The path looks like : /belle/collection/MC/<collection_name>.

  • Data:

    These Collection are for Data datasets. The path looks like : /belle/collection/Data/<collection_name>

There are other type of collection (like test) but are not intended for analysis, thus do not use.

Collections’ command-line Tools

You can use gb2_ds_search collection for collection search commands. Tools for searching collection and getting info about collection is available Dataset Searcher tools.

  • To list available collection use gb2_ds_search collection --list_all_collections /belle/collection/<type>/*

    $gb2_ds_search collection --list_all_collection /belle/collection/MC/*
    /belle/collection/MC/MC14rd_ccbar_Moriond2022_4S_offres_v1
    /belle/collection/MC/MC14rd_ccbar_Moriond2022_v1
    /belle/collection/MC/MC14rd_charged_Moriond2022_v1
    /belle/collection/MC/MC14rd_ddbar_Moriond2022_4S_offres_v1
    /belle/collection/MC/MC14rd_ddbar_Moriond2022_4S_offres_v1
    ...
    
  • To get metadata of collection use gb2_ds_search collection --get_metadata <collection_path>. The metadata contains int_luminosity` and ``description for extra info.:

    $ gb2_ds_search collection --get_metadata /belle/collection/Data/proc13_chunk1_had_4S_v1
    ########## Metadata of Collection ###############
    dataLevel: mdst
    description: Collection for proc13 - exp 7,8,10 - 4S - hadron events
    campaign: proc13
    dataType: data
    skimDecayMode:
    int_luminosity: 8.609 /fb
    generalSkimName: hadron
    #################################################
    
  • To see the what datasets are in a collection use gb2_ds_search collection --list_datasets <collection_path> or gb2_ds_list <collection_path>.

    $ gb2_ds_search collection --list_datasets /belle/collection/MC/MC14rd_ccbar_Moriond2022_v1
    /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00694/ccbar/mdst
    /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00695/ccbar/mdst
    /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00722/ccbar/mdst
    /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00723/ccbar/mdst
    /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00724/ccbar/mdst
    /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00726/ccbar/mdst
    /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00727/ccbar/mdst
    /belle/MC/release-05-02-14/DB00001457/MC14rd_c/prod00020292/s00/e0014/4S/r00728/ccbar/mdst
    ....
    

Refer to “Dataset Collections” part in :ref:`running-jobs` on how to submit gbasf2 jobs using collection