Dataset management tools

The usage of all the available gb2 tools is shown.

In most cases, examples of how to use them can be obtained with the flag --usage.

gb2_ds_get

Download remote file to local directory (Default: current directory) –new option to download using rucio. This is experimental functionality, and will eventually be a default way. Examples:

$ gb2_ds_get myProject
$ gb2_ds_get "project/sub00/file00*.root"
$ gb2_ds_get myproject --new
usage: gb2_ds_get [-h] [-v] [--usage] [-o local directory] [-u USER] [-r {MC,data,user}] [-l] [--new] [-f] [--noSubDir] [-i file with lfns] [--failed_lfns filename.txt] [--se SE]
                  dataset

Positional Arguments

dataset

specify dataset name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-o, --output_dir

path to local directory

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

-l, --long

long listing (-ll: extra long)

--new

Enables experimental feature(s) for the tool.

Default: False

-f, --force

skip confirmation

Default: False

--noSubDir

Avoid downloading of files in subdirectories of the given dataset

Default: False

-i, --input_dslist

Input file with list of LFNs to download

--failed_lfns

Set the name of the text file where failed LFNs will be stored

--se

Select an SE

gb2_ds_list

List datasets or files in specified directory . Files only with the status ‘good’ in the metadata catalog are shown by default. Use the option -s to list the files with other statuses, and ‘-s all’ to list all the files. Examples:

$ gb2_ds_list -u username
$ gb2_ds_list "/belle/MC/signal/B2DstpDstm/mcprod*/BGx*"
$ gb2_ds_list dataset -l -g
$ gb2_ds_list dataset -s all
$ gb2_ds_list dataset -s good
usage: gb2_ds_list [-h] [-v] [--usage] [-u USER] [-r {MC,data,user}] [-l] [-g] [-s STATUS] [dataset]

Positional Arguments

dataset

specify dataset name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

-l, --long

long listing (-ll: extra long)

-g, --group_by_se

groups by SEs

-s, --status

specify status of file

gb2_ds_du

Show disk usage of specific datasets. (Default: /belle/user/USER/* )

All user’s are specified by ‘-u all’.

Examples:

$ gb2_ds_du 7345232
$ gb2_ds_du -u username "proj1*"
usage: gb2_ds_du [-h] [-v] [--usage] [-u USER] [-r {MC,data,user}] [--noBar] [dataset]

Positional Arguments

dataset

specify dataset name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

--noBar

disable status bar

Default: False

gb2_ds_count_events

Prints the number of events for each file in a dataset or for each dataset in a datablock.

Accept SQL like syntax for metadata query String should be quoted by single or double quatation

Examples:

% gb2_ds_count_events datasets
% gb2_ds_count_events -u username "dataset*"
% gb2_ds_count_events -q "status='good' and runHigh>100" dataset
% gb2_ds_count_events --summary --output_json "output_name.json" dataset
usage: gb2_ds_count_events [-h] [-v] [--usage] [-u USER] [-q QUERY] [--summary] [--output_json OUTPUT_JSON] dataset [dataset ...]

Positional Arguments

dataset

specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-u, --user

specify user name

-q, --query

query for metadata

--summary

total number of events and number of files

Default: False

--output_json

specify json file name

gb2_ds_query_file

Query file metadata

Accept SQL like syntax for metadata query String should be quoted by single or double quotation

Examples:

$ gb2_ds_query_file dataset
$ gb2_ds_query_file -u username dataset
$ gb2_ds_query_file -m "status:software" dataset
$ gb2_ds_query_file -q "status='good'" dataset
$ gb2_ds_query_file -q "runL>5 and runH<10" dataset
$ gb2_ds_query_file -q "runH<100" "/belle/MC/generic/ccbar/mcprod1405/BGx1/"
usage: gb2_ds_query_file [-h] [-v] [--usage] [-C CONF] [-m META] [-q QUERY] [-u USER] [-r {MC,data,user}] [--output_csv OUTPUT_CSV] [-l] dataset [dataset ...]

Positional Arguments

dataset

specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-C, --conf

specify configuration file

-m, --meta

specify metadata attribute list

-q, --query

query for metadata

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

--output_csv

specify csv file name

-l, --long

long listing (-ll: extra long)

gb2_ds_query_dataset

Query dataset metadata

Accept SQL like syntax for metadata query String should be quoted by single or double quotation

Examples:

$ gb2_ds_query_dataset dataset
$ gb2_ds_query_dataset -u username "dataset*"
$ gb2_ds_query_dataset -m "software:desc" dataset
$ gb2_ds_query_dataset -q "software='release-00-04-01'" dataset
usage: gb2_ds_query_dataset [-h] [-v] [--usage] [-C CONF] [-m META] [-q QUERY] [-u USER] [-r {MC,data,user}] [-l] dataset [dataset ...]

Positional Arguments

dataset

specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-C, --conf

specify configuration file

-m, --meta

specify metadata attribute list

-q, --query

query for metadata

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

-l, --long

long listing (-ll: extra long)

gb2_ds_query_datablock

Query datablock metadata

Accept SQL like syntax for metadata query String should be quoted by single or double quotation

Examples:

$ gb2_ds_query_datablock datablock
$ gb2_ds_query_datablock dataset/sub00
$ gb2_ds_query_datablock -m "size:nFiles" dataset
$ gb2_ds_query_datablock -q "nFiles=1000" dataset
usage: gb2_ds_query_datablock [-h] [-v] [--usage] [-C CONF] [-m META] [-q QUERY] [-u USER] [-r {MC,data,user}] [-l] dataset [dataset ...]

Positional Arguments

dataset

specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-C, --conf

specify configuration file

-m, --meta

specify metadata attribute list

-q, --query

query for metadata

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

-l, --long

long listing (-ll: extra long)

gb2_ds_rep

Replicate datablocks to other SE. Input datasets are resolved into datablocks.

By default replication rule will be asscoiated with your account. If -u <username> is provided then the rule will be associated to that account. Specify –lifetime xh/xd/xw/xm to provide a custom value. (default to 1 m(onth)). Replica will be deleted after lifetime expires.

Examples:

% gb2_ds_rep /belle/user/anil123/myproject/sub00 -d DESY-TMP-SE
% gb2_ds_rep /belle/user/anil123/myotherproject/sub00 -d KIT-TMP-SE -u belle_dcops
% gb2_ds_rep /belle/user/anil123/myotherproject1 -d KIT-TMP-SE --lifetime 2w
usage: gb2_ds_rep [-h] [-v] [--usage] [-s SE] -d SE [-u USER] [-f] [--lifetime LIFETIME] dataset [dataset ...]

Positional Arguments

dataset

specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-s, --src_se

source SE

-d, --dst_se

destination SE

-u, --user

specify user name

-f, --force

skip confirmation

Default: False

--lifetime

set lifetime for LPN. xh(our) , xd(ay), xw(eek), xm(onth). Default: 1m(onth)

gb2_ds_rep_status

Check the status of replication after the request with gb2_ds_rep.

Examples:

$ gb2_ds_rep_status /belle/user/anil123/myproject/sub00 $ gb2_ds_rep_status /belle/user/anil123/myotherproject

usage: gb2_ds_rep_status [-h] [-v] [--usage] [-l] dataset [dataset ...]

Positional Arguments

dataset

specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-l, --long

long listing (-ll: extra long)

gb2_ds_rm

Asynchronously removes files and metadata associated with the dataset or project name provided. All replicas on the SEs are deleted.

Examples:

$ gb2_ds_rm project_name
$ gb2_ds_rm "/belle/user/hideki/project_*"
$ gb2_ds_rm -u somebody project_name
$ gb2_ds_rm -f project_name
usage: gb2_ds_rm [-h] [-v] [--usage] [-f] [-u USER] [-r {MC,data,user}] [--noBar] dataset [dataset ...]

Positional Arguments

dataset

specify dataset(s) name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-f, --force

skip confirmation

Default: False

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category

--noBar

disable status bar

Default: False

gb2_ds_siteForecast

List possible job execution sites based on replica location. Note that site availability is not considered.

Examples:

$ gb2_ds_siteForecast /belle/MC/release-02-00-01/DB00000411/MC11/prod00005218/s00/e0000/4S/r00000/ddbar/mdst/sub00
usage: gb2_ds_siteForecast [-h] [-v] [--usage] [-u USER] [-r {MC,data,user}] [dataset]

Positional Arguments

dataset

specify dataset name

Named Arguments

-v, --verbose

increase verbosity (up to -vv)

Default: 0

--usage

show detailed usage

-u, --user

specify user name

-r, --subcate

Possible choices: MC, data, user

specify a dataset category