seqgra ensemble¶
seqgra ensemble: Test model architecture on grammar across data set sizes, simulation and model seeds
usage: seqgrae [-h] [-v] -a ANALYSIS_ID (-d DATA_DEF_FILE | -f DATA_FOLDER) -m MODEL_DEF_FILES [MODEL_DEF_FILES ...]
-o OUTPUT_DIR [--ds-sizes DS_SIZES [DS_SIZES ...]] [--d-seeds D_SEEDS [D_SEEDS ...]]
[--m-seeds M_SEEDS [M_SEEDS ...]] [--seed-grid] [-g GPU]
Named Arguments¶
- -v, --version
show program’s version number and exit
- -a, --analysis-id
analysis ID (used for script file name and comparator folders)
- -d, --data-def-file
path to the segra XML data definition file. Use this option to generate synthetic data based on a seqgra grammar (specify either -d or -f, not both)
- -f, --data-folder
experimental data folder name inside outputdir/input. Use this option to train the model on experimental or externally synthesized data (specify either -f or -d, not both)
- -m, --model-def-files
list of paths to the seqgra XML model definition files
- -o, --output-dir
output directory, subdirectories are created for generated data and model definitions, input data, trained models, and model evaluations
- --ds-sizes
if -d is specified, list of data set sizes in number of examples, where train-val-test split is always 70-10-20, defaults to [10000, 20000, 40000, 80000, 160000, 320000, 640000, 1280000]; if -f is specified, list of subsampling rates of training examples, 1.0 equals original data, no subsampling, defaults to [0.05, 0.1, 0.2, 0.4, 0.8, 1.0]
Default: [10000, 20000, 40000, 80000, 160000, 320000, 640000, 1280000]
- --d-seeds
list of simulation seeds, defaults to [1, 2, 3]
Default: [1, 2, 3]
- --m-seeds
list of model seeds, defaults to [1, 2, 3]
Default: [1, 2, 3]
- --seed-grid
if this flag is set, all simulation and model seed combinations are evaluated, otherwise simulation seed 1 is only tested with model seed 1 and so on
Default: False
- -g, --gpu
ID of GPU used by TensorFlow and PyTorch (defaults to GPU ID 0); CPU is used if no GPU is available or GPU ID is set to -1
Default: 0