seqgra core functionality¶

Generate synthetic data based on grammar, train model on synthetic data, evaluate model

usage: seqgra [-h] [-v] (-d DATA_DEF_FILE | -f DATA_FOLDER) [-m MODEL_DEF_FILE] [-e EVALUATORS [EVALUATORS ...]] -o
              OUTPUT_DIR [-i] [-p] [-s] [-r] [-g GPU] [--no-checks] [--eval-sets EVAL_SETS [EVAL_SETS ...]]
              [--eval-n EVAL_N] [--eval-n-per-label EVAL_N_PER_LABEL] [--eval-suppress-plots]
              [--eval-fi-predict-threshold EVAL_FI_PREDICT_THRESHOLD]
              [--eval-sis-predict-threshold EVAL_SIS_PREDICT_THRESHOLD]
              [--eval-grad-importance-threshold EVAL_GRAD_IMPORTANCE_THRESHOLD]

Named Arguments¶

-v, --version

show program’s version number and exit

-d, --data-def-file

path to the segra XML data definition file. Use this option to generate synthetic data based on a seqgra grammar (specify either -d or -f, not both)

-f, --data-folder

experimental data folder name inside outputdir/input. Use this option to train the model on experimental or externally synthesized data (specify either -f or -d, not both)

-m, --model-def-file

path to the seqgra XML model definition file

-e, --evaluators

evaluator ID or IDs: IDs of conventional evaluators include metrics, pr, predict, roc; IDs of feature importance evaluators include contrastive-excitation-backprop, deconv, deep-lift, excitation-backprop, feedback, grad-cam, gradient, gradient-x-input, guided-backprop, integrated-gradients, nonlinear-integrated-gradients, saliency, sis, smooth-grad

-o, --output-dir

output directory, subdirectories are created for generated data, trained model, and model evaluation

-i, --in-memory

if this flag is set, training and validation data will be stored in-memory instead of loaded in chunks

Default: False

-p, --print

if this flag is set, data definition, model definition, and model summary are printed

Default: False

-s, --silent

if this flag is set, only warnings and errors are printed

Default: False

-r, --remove

if this flag is set, previously stored data for this grammar - model combination will be removed prior to the analysis run. This includes the folders input/[grammar ID], models/[grammar ID]/[model ID], and evaluation/[grammar ID]/[model ID].

Default: False

-g, --gpu

ID of GPU used by TensorFlow and PyTorch (defaults to GPU ID 0); CPU is used if no GPU is available or GPU ID is set to -1

Default: 0

--no-checks

if this flag is set, examples and example annotations will not be validated before training, e.g., that DNA sequences only contain A, C, G, T, N

Default: False

--eval-sets

either one or more of the following: training, validation, test; selects data set for evaluation; this evaluator argument will be passed to all evaluators

Default: [‘test’]

--eval-n

maximum number of examples to be evaluated per set (defaults to the total number of examples); this evaluator argument will be passed to all evaluators

--eval-n-per-label

maximum number of examples to be evaluated for each label and set (defaults to the total number of examples unless eval-n is set, overrules eval-n); this evaluator argument will be passed to all evaluators

--eval-suppress-plots

if this flag is set, plots are suppressed globally; this evaluator argument will be passed to all evaluators

Default: False

--eval-fi-predict-threshold

prediction threshold used to select examples for evaluation, only examples with predict(x) > threshold will be passed on to evaluators (defaults to 0.5); this evaluator argument will be passed to feature importance evaluators only

Default: 0.5

--eval-sis-predict-threshold

prediction threshold for Sufficient Input Subsets; this evaluator argument is only visible to the SIS evaluator

Default: 0.5

--eval-grad-importance-threshold

feature importance threshold for gradient-based feature importance evaluators; this parameter only affects thresholded grammar agreement plots, not the feature importance measures themselves; this evaluator argument is only visible to gradient-based feature importance evaluators (defaults to 0.01)

Default: 0.01