seqgra core functionality¶
Generate synthetic data based on grammar, train model on synthetic data, evaluate model
usage: seqgra [-h] [-v] (-d DATA_DEF_FILE | -f DATA_FOLDER) [-m MODEL_DEF_FILE] [-e EVALUATORS [EVALUATORS ...]] -o
OUTPUT_DIR [-i] [-p] [-s] [-r] [-g GPU] [--no-checks] [--eval-sets EVAL_SETS [EVAL_SETS ...]]
[--eval-n EVAL_N] [--eval-n-per-label EVAL_N_PER_LABEL] [--eval-suppress-plots]
[--eval-fi-predict-threshold EVAL_FI_PREDICT_THRESHOLD]
[--eval-sis-predict-threshold EVAL_SIS_PREDICT_THRESHOLD]
[--eval-grad-importance-threshold EVAL_GRAD_IMPORTANCE_THRESHOLD]
Named Arguments¶
- -v, --version
show program’s version number and exit
- -d, --data-def-file
path to the segra XML data definition file. Use this option to generate synthetic data based on a seqgra grammar (specify either -d or -f, not both)
- -f, --data-folder
experimental data folder name inside outputdir/input. Use this option to train the model on experimental or externally synthesized data (specify either -f or -d, not both)
- -m, --model-def-file
path to the seqgra XML model definition file
- -e, --evaluators
evaluator ID or IDs: IDs of conventional evaluators include metrics, pr, predict, roc; IDs of feature importance evaluators include contrastive-excitation-backprop, deconv, deep-lift, excitation-backprop, feedback, grad-cam, gradient, gradient-x-input, guided-backprop, integrated-gradients, nonlinear-integrated-gradients, saliency, sis, smooth-grad
- -o, --output-dir
output directory, subdirectories are created for generated data, trained model, and model evaluation
- -i, --in-memory
if this flag is set, training and validation data will be stored in-memory instead of loaded in chunks
Default: False
- -p, --print
if this flag is set, data definition, model definition, and model summary are printed
Default: False
- -s, --silent
if this flag is set, only warnings and errors are printed
Default: False
- -r, --remove
if this flag is set, previously stored data for this grammar - model combination will be removed prior to the analysis run. This includes the folders input/[grammar ID], models/[grammar ID]/[model ID], and evaluation/[grammar ID]/[model ID].
Default: False
- -g, --gpu
ID of GPU used by TensorFlow and PyTorch (defaults to GPU ID 0); CPU is used if no GPU is available or GPU ID is set to -1
Default: 0
- --no-checks
if this flag is set, examples and example annotations will not be validated before training, e.g., that DNA sequences only contain A, C, G, T, N
Default: False
- --eval-sets
either one or more of the following: training, validation, test; selects data set for evaluation; this evaluator argument will be passed to all evaluators
Default: [‘test’]
- --eval-n
maximum number of examples to be evaluated per set (defaults to the total number of examples); this evaluator argument will be passed to all evaluators
- --eval-n-per-label
maximum number of examples to be evaluated for each label and set (defaults to the total number of examples unless eval-n is set, overrules eval-n); this evaluator argument will be passed to all evaluators
- --eval-suppress-plots
if this flag is set, plots are suppressed globally; this evaluator argument will be passed to all evaluators
Default: False
- --eval-fi-predict-threshold
prediction threshold used to select examples for evaluation, only examples with predict(x) > threshold will be passed on to evaluators (defaults to 0.5); this evaluator argument will be passed to feature importance evaluators only
Default: 0.5
- --eval-sis-predict-threshold
prediction threshold for Sufficient Input Subsets; this evaluator argument is only visible to the SIS evaluator
Default: 0.5
- --eval-grad-importance-threshold
feature importance threshold for gradient-based feature importance evaluators; this parameter only affects thresholded grammar agreement plots, not the feature importance measures themselves; this evaluator argument is only visible to gradient-based feature importance evaluators (defaults to 0.01)
Default: 0.01