Dataset management tools

The usage of all the available gb2 tools is shown.

In most cases, examples of how to use them can be obtained with the flag --usage.


Download remote file to local directory (Default: current directory) –new option to download using rucio. This is experimental functionality, and will eventually be a default way. Examples:

$ gb2_ds_get myProject
$ gb2_ds_get "project/sub00/file00*.root"
$ gb2_ds_get myproject --new
usage: gb2_ds_get [-h] [-v] [--usage] [-o local directory] [-u USER] [-r {MC,data,user}] [-l] [--new] [-f] [--noSubDir] [-i file with lfns] [--failed_lfns filename.txt] [--se SE]

Positional Arguments


specify dataset name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-o, --output_dir

path to local directory

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

-l, --long

long listing (-ll: extra long)


Enables experimental feature(s) for the tool.

Default: False

-f, --force

skip confirmation

Default: False


Avoid downloading of files in subdirectories of the given dataset

Default: False

-i, --input_dslist

Input file with list of LFNs to download


Set the name of the text file where failed LFNs will be stored


Select an SE


List datasets or files in specified directory . Files only with the status ‘good’ in the metadata catalog are shown by default. Use the option -s to list the files with other statuses, and ‘-s all’ to list all the files. Examples:

$ gb2_ds_list -u username
$ gb2_ds_list "/belle/MC/signal/B2DstpDstm/mcprod*/BGx*"
$ gb2_ds_list dataset -l -g
$ gb2_ds_list dataset -s all
$ gb2_ds_list dataset -s good
usage: gb2_ds_list [-h] [-v] [--usage] [-u USER] [-r {MC,data,user}] [-l] [-g] [-s STATUS] [dataset]

Positional Arguments


specify dataset name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

-l, --long

long listing (-ll: extra long)

-g, --group_by_se

groups by SEs

-s, --status

specify status of file


Show disk usage of specific datasets. (Default: /belle/user/USER/* )

All user’s are specified by ‘-u all’.


$ gb2_ds_du 7345232
$ gb2_ds_du -u username "proj1*"
usage: gb2_ds_du [-h] [-v] [--usage] [-u USER] [-r {MC,data,user}] [--noBar] [dataset]

Positional Arguments


specify dataset name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category


disable status bar

Default: False


Prints the number of events for each file in a dataset or for each dataset in a datablock.

Accept SQL like syntax for metadata query String should be quoted by single or double quatation


% gb2_ds_count_events datasets
% gb2_ds_count_events -u username "dataset*"
% gb2_ds_count_events -q "status='good' and runHigh>100" dataset
% gb2_ds_count_events --summary --output_json "output_name.json" dataset
usage: gb2_ds_count_events [-h] [-v] [--usage] [-u USER] [-q QUERY] [--summary] [--output_json OUTPUT_JSON] dataset [dataset ...]

Positional Arguments


specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-u, --user

specify user name

-q, --query

query for metadata


total number of events and number of files

Default: False


specify json file name


Query file metadata

Accept SQL like syntax for metadata query String should be quoted by single or double quotation


$ gb2_ds_query_file dataset
$ gb2_ds_query_file -u username dataset
$ gb2_ds_query_file -m "status:software" dataset
$ gb2_ds_query_file -q "status='good'" dataset
$ gb2_ds_query_file -q "runL>5 and runH<10" dataset
$ gb2_ds_query_file -q "runH<100" "/belle/MC/generic/ccbar/mcprod1405/BGx1/"
usage: gb2_ds_query_file [-h] [-v] [--usage] [-C CONF] [-m META] [-q QUERY] [-u USER] [-r {MC,data,user}] [--output_csv OUTPUT_CSV] [-l] dataset [dataset ...]

Positional Arguments


specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-C, --conf

specify configuration file

-m, --meta

specify metadata attribute list

-q, --query

query for metadata

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category


specify csv file name

-l, --long

long listing (-ll: extra long)


Query dataset metadata

Accept SQL like syntax for metadata query String should be quoted by single or double quotation


$ gb2_ds_query_dataset dataset
$ gb2_ds_query_dataset -u username "dataset*"
$ gb2_ds_query_dataset -m "software:desc" dataset
$ gb2_ds_query_dataset -q "software='release-00-04-01'" dataset
usage: gb2_ds_query_dataset [-h] [-v] [--usage] [-C CONF] [-m META] [-q QUERY] [-u USER] [-r {MC,data,user}] [-l] dataset [dataset ...]

Positional Arguments


specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-C, --conf

specify configuration file

-m, --meta

specify metadata attribute list

-q, --query

query for metadata

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

-l, --long

long listing (-ll: extra long)


Query datablock metadata

Accept SQL like syntax for metadata query String should be quoted by single or double quotation


$ gb2_ds_query_datablock datablock
$ gb2_ds_query_datablock dataset/sub00
$ gb2_ds_query_datablock -m "size:nFiles" dataset
$ gb2_ds_query_datablock -q "nFiles=1000" dataset
usage: gb2_ds_query_datablock [-h] [-v] [--usage] [-C CONF] [-m META] [-q QUERY] [-u USER] [-r {MC,data,user}] [-l] dataset [dataset ...]

Positional Arguments


specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-C, --conf

specify configuration file

-m, --meta

specify metadata attribute list

-q, --query

query for metadata

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

-l, --long

long listing (-ll: extra long)


Replicate datablocks to other SE. Input datasets are resolved into datablocks.

By default replication rule will be asscoiated with your account. If -u <username> is provided then the rule will be associated to that account. Specify –lifetime xh/xd/xw/xm to provide a custom value. (default to 1 m(onth)). Replica will be deleted after lifetime expires.


% gb2_ds_rep /belle/user/anil123/myproject/sub00 -d DESY-TMP-SE
% gb2_ds_rep /belle/user/anil123/myotherproject/sub00 -d KIT-TMP-SE -u belle_dcops
% gb2_ds_rep /belle/user/anil123/myotherproject1 -d KIT-TMP-SE --lifetime 2w
usage: gb2_ds_rep [-h] [-v] [--usage] [-s SE] -d SE [-u USER] [-f] [--lifetime LIFETIME] dataset [dataset ...]

Positional Arguments


specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-s, --src_se

source SE

-d, --dst_se

destination SE

-u, --user

specify user name

-f, --force

skip confirmation

Default: False


set lifetime for LPN. xh(our) , xd(ay), xw(eek), xm(onth). Default: 1m(onth)


Check the status of replication after the request with gb2_ds_rep.


$ gb2_ds_rep_status /belle/user/anil123/myproject/sub00 $ gb2_ds_rep_status /belle/user/anil123/myotherproject

usage: gb2_ds_rep_status [-h] [-v] [--usage] [-l] dataset [dataset ...]

Positional Arguments


specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-l, --long

long listing (-ll: extra long)


Asynchronously removes files and metadata associated with the dataset or project name provided. All replicas on the SEs are deleted.


$ gb2_ds_rm project_name
$ gb2_ds_rm "/belle/user/hideki/project_*"
$ gb2_ds_rm -u somebody project_name
$ gb2_ds_rm -f project_name
usage: gb2_ds_rm [-h] [-v] [--usage] [-f] [-u USER] [-r {MC,data,user}] [--noBar] dataset [dataset ...]

Positional Arguments


specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-f, --force

skip confirmation

Default: False

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category


disable status bar

Default: False


List possible job execution sites based on replica location. Note that site availability is not considered.


$ gb2_ds_siteForecast /belle/MC/release-02-00-01/DB00000411/MC11/prod00005218/s00/e0000/4S/r00000/ddbar/mdst/sub00
usage: gb2_ds_siteForecast [-h] [-v] [--usage] [-u USER] [-r {MC,data,user}] [dataset]

Positional Arguments


specify dataset name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0


show detailed usage

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category