Gbasf2

Submit basf2 job to GRID sites.

All command line options can be embedded to steering file. See --help-steering for available options.

Project name and basf2 software version are mandatory options, and specified by --project and --setuprel, respectively.

If no destination site is specified by --site, DIRAC chooses best site. However the sites specified by --banned_site are never used. Output root file is stored to site local SE unless specified by --default_se.

--input_ds points to an input dataset for basf2, while --input_datafiles handles supplemental files downloaded from SE like beam BG files. One can specify a list of LFNs by --input_dslist.

--input_ds_search gets the datasets from dataset-searcher matching provided metadata:

--input_ds_search='metadata1=value;metadata2=value;exp=expLow:expHigh;run=runLow:runHigh'

Metadata attributes::: ‘release’, ‘campaign’, ‘data_type’, ‘data_level’, ‘beam_energy’, ‘bkg_level’, ‘mc_event’, ‘skim_decay’,’general_skim’, ‘global_tag’, ‘exp’, ‘run’.

Note: To specify single run/exp, use exp=9:9 (it uses exp 9 only) To see available attributes and values, refer to the command gb2_ds_search metadata.

Supplemental files (e.g. decay.dec) can be transfered from desktop using --input_sandboxfiles (in total less than 10MB). If you are using the option --lfn_sandboxfiles, the limit increases to 1GB. Note that tarballs (t*z, tar.*z) in the input files are automatically extracted on the execution directory.

--input_grouping defines how input files are grouped for jobs:

Option site groups input files on the same storage element. The number of files per job is defined with -n NFILES

Option size groups input files that are stored on the same storage elements and are in the same production/datablock. The maximal number of files per job can be defined with -n NFILES. The actual number of files for each job will be set for each job automatically so that the size of the input files is smaller than MaximumInputSizePerJobInGB (5-6GB).

Parametric job, a bunch of basf2 jobs, is initiated by -r options. During basf2 execution, the job number can be obtained by $GBASF2_PARAMETER and is added to the end of the filename. By default the number starts from 1 and one can specify the offset by --rep_start. Or --unique appends DIRAC JobID. It is useful for non-standard basf2 output. (e.g. root file generated by your own module)

Usually maximum execution time is estimated from number of events to be handled. However if it is not sufficient or unavailable due to complicated steering file, you may specify it using --cputime option.

Examples:

$ gbasf2 steer1.py -p test_project --setuprel release-00-04-01
$ gbasf2 steer2.py -p paramJob -r 5 --site=LCG.KEK2.jp --default_se=KEK2-SE
$ gbasf2 steer3.py -p envs --gridenv=AAA=1,BBB=ccc --priority 3
$ gbasf2 steer4.py -p anajob -i "/belle/MC/generic/ccbar/mcprod1405/BGx1/mc35_ccbar_BGx1_s4*" --input_nfiles 5
$ gbasf2 steer5.py -p anajob2 --input_dslist input.txt --cputime 1440
$ gbasf2 steer6.py -p debug --loglevel DEBUG --basf2opt "-l VERB"
$ gbasf2 steer7.py -p proc10exp8 -i '/belle/Data/release-04-01-00/DB00000748/proc10/prod00009642/e0008/4S/r*/mdst/sub*'
$ gbasf2 steer8.py -p anajob3 -p test_project2 --input_ds_search='data_type=Data;campaign=proc11;beam_energy=4S;data_level=mdst;exp=8:9;run=1126:1135'
$ gbasf2 steer9.py -p anajob4 --input_dslist input.txt --input_nfiles 10 --input_grouping size

usage: gbasf2 [-h] [--help-steering] [--usage] [-p PROJECT] [-r NJOBS] [--rep_start OFFSET] [--site SITE] [--resource_tag RESOURCE_TAG] [--platform PLATFORM] [--banned_site SITE] [-d SE] [--meta_version VERSION] [--jobtype JOBTYPE] [--priority PRIORITY] [--cputime CPUTIME | --evtpersec NEVT]
              [-i DSPATH | --input_dslist FILENAME | --input_ds_search= INPUT_DS_SEARCH=] [-n N] [--input_grouping {site,size}] [-o DSPATH] [-f FILES [FILES ...]] [--lfn_sandboxfiles] [--output_sandboxfiles OS_FILES [OS_FILES ...]] [--input_datafiles LFNS [LFNS ...]] [--subdir_pattern SUBDIR_PATTERN] [--subdir_nfiles SUBDIR_NFILES] [--basf2opt OPTS]
              [--basf2path PATH] [-s RELEASE] [--gridenv ENV [ENV ...]] [--executable EXEC] [--prod-group PROD_GROUP] [--prod-desc DESC] [--prod-desc-long DESC] [--prod-parent TRANSID] [--scout] [--noscout] [--dryrun] [--force] [--devmode] [--profile] [--unique] [--forecast] [--loglevel {ALWAYS,NOTICE,INFO,VERBOSE,DEBUG,WARN,ERROR}]
              steering_file

Positional Arguments

steering_file: basf2 steering file

Named Arguments

--help-steering

show embedded options to steering file

--usage

show detailed usage

-p, --project

Name of project

-r, --repetition

number of repetition jobs

--rep_start

offset to repetition jobs

--site

The site name to which you want to submit

--resource_tag

Resource property on which job runs

--platform

Job platform (e.g. Linux_x86_64_glibc-2.12)

--banned_site

The banned site name not to submit (NO_BANNED disables all list)

-d, --default_se

The default SE name to which you want to store

--meta_version

Specify metadata schema version

--jobtype

Type of the Job (e.g. User, Production,…)

--priority

Job priority: 0 (default)-10 (priority)

--cputime

estimated CPUTime (in minutes)

--evtpersec

estimated number of processed events per seconds

-i, --input_ds

Input dataset(s) location ([path/]dataset name)

--input_dslist

Input dataset list from file

--input_ds_search=

Query to retrieve input datasets from the Dataset Searcher. “data_type=Data;campaign=proc11;beam_energy=4S;…”

-n, --input_nfiles

Number of input files per job

--input_grouping

Possible choices: site, size

Choice function to group input files for the jobs: site or size (Default: site)

Default: 'site'

-o, --output_ds

Output dataset location (only path) (Default: /belle/user/USER)

-f, --input_sandboxfiles

Supplemental files to be included in input sandbox

--lfn_sandboxfiles

Use LFN method for supplemental input sandbox files

--output_sandboxfiles

Supplemental files to be saved in output sandbox, must not exceed 10MB in total

--input_datafiles

Supplemental files to be downloaded from SE

--subdir_pattern

Sub-directory naming convention by printf style

--subdir_nfiles

Sub-directory maximum number of files (0: unused)

--basf2opt

Option passed to basf2; require “equal” (e.g. –basf2opt=”-l INFO”)

--basf2path

Basf2 installation directory (/sw/belle2 in KEKCC)

-s, --setuprel

Basf2 release

--gridenv

Additional environment variable on WN

--executable

Executable that runs instead of default basf2 wrapper (basf2helper.py)

--prod-group

Production group (only for production job)

--prod-desc

Description of the production

--prod-desc-long

Long description of the production

--prod-parent

Make the transformation run after the parent completed

--scout

Massive jobs are submitted only after scout jobs are successfully finished (only for user job)

--noscout

Scout jobs are not submitted even when massive jobs are submitted (only for user job)

--dryrun

Process everything except submitting a job

--force

Skip submission confirmation

--devmode

Development mode. Local client scripts uploaded and used in job execution (for tests).

--profile

Enable resource profiling

--unique

Append DIRAC ID to the end of output name

--forecast

Show sites hosting input file

--loglevel

Possible choices: ALWAYS, NOTICE, INFO, VERBOSE, DEBUG, WARN, ERROR

gBasf2 log level