seqgra.learner.learner module¶
Contains abstract classes for all learners.
- Classes:
Learner
: abstract base class for all learnersMultiClassClassificationLearner
: abstract class for multi-class classification learnersMultiLabelClassificationLearner
: abstract class for multi-label classification learnersMultipleRegressionLearner
: abstract class for multiple regression learnersMultivariateRegressionLearner
: abstract class for multivariate regression learners
- class Learner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶
Bases:
abc.ABC
Abstract base class for all learners.
- definition¶
contains model meta info, architecture and hyperparameters
- Type
- data_dir¶
directory with data files, e.g., training.txt
- Type
str
- output_dir¶
model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/
- Type
str
- validate_data¶
whether input data should be validated (e.g., check if valid DNA or protein sequence)
- Type
bool
- gpu_id¶
ID of GPU used by TensorFlow and PyTorch
- Type
int
- model¶
PyTorch or TensorFlow model
- optimizer¶
PyTorch or TensorFlow optimizer
- criterion¶
PyTorch or TensorFlow criterion (loss)
- metrics¶
metrics that are collected, usually loss and accuracy
- Type
List[str]
- Parameters
model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch
See also
MultiClassClassificationLearner
: for classification models with mutually exclusive classesMultiLabelClassificationLearner
: for classification models with non-mutually exclusive classesMultipleRegressionLearner
: for regression models with multiple independent variables and one dependent variable
- abstract create_model() → None[source]¶
Abstract method to create library-specific model.
Machine learning library specific implementations are provided for TensorFlow and PyTorch.
- abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)[source]¶
TODO
TODO
- Parameters
file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO
- Returns
TODO
- Return type
array
- Raises
Exception – if neither file_name nor (x and y) are specified
- get_annotations_file(set_name: str = 'test') → str[source]¶
Get path to annotations file.
E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to annotations file
- Return type
str
- Raises
Exception – in case requested annotations file does not exist
- get_examples_file(set_name: str = 'test') → str[source]¶
Get path to examples file.
E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to examples file
- Return type
str
- Raises
Exception – in case requested examples file does not exist
- abstract get_num_params() → seqgra.schema.ModelSize[source]¶
TODO
TODO
- abstract load_model(file_name: Optional[str] = None)[source]¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet[source]¶
Method to parse annotations data file.
Checks validity of annotations.
- Parameters
file_name (str) – file name
- Returns
annotations (List[str]): annotations y (List[str]): labels
- Return type
- abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet[source]¶
Abstract method to parse examples data file.
Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.
- Parameters
file_name (str) – file name
- Returns
x (List[str]): sequences y (List[str]): labels
- Return type
- abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)[source]¶
TODO
TODO
- Parameters
x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True
- Raises
Exception – if neither file_name nor x are specified
- abstract save_model(file_name: Optional[str] = None)[source]¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None[source]¶
Train model.
Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.
- Parameters
file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO
- Raises
Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val
- class MultiClassClassificationLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶
Bases:
seqgra.learner.learner.Learner
Abstract class for multi-class classification learners.
Multi-class classification learners are learners for models with mututally exclusive class labels.
- definition¶
contains model meta info, architecture and hyperparameters
- Type
- data_dir¶
directory with data files, e.g., training.txt
- Type
str
- output_dir¶
model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/
- Type
str
- validate_data¶
whether input data should be validated (e.g., check if valid DNA or protein sequence)
- Type
bool
- gpu_id¶
ID of GPU used by TensorFlow and PyTorch
- Type
int
- model¶
PyTorch or TensorFlow model
- optimizer¶
PyTorch or TensorFlow optimizer
- criterion¶
PyTorch or TensorFlow criterion (loss)
- metrics¶
metrics that are collected, usually loss and accuracy
- Type
List[str]
- Parameters
model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch
- check_annotations(annotations: List[str]) → bool¶
- check_labels(y: List[str], throw_exception: bool = True) → bool¶
- abstract check_sequence(x: List[str]) → bool¶
- abstract create_model() → None¶
Abstract method to create library-specific model.
Machine learning library specific implementations are provided for TensorFlow and PyTorch.
- dataset_generator(file_name: str)¶
- abstract decode_x(x)¶
TODO
TODO
- Parameters
x (array) – TODO
- abstract decode_y(y)¶
TODO
TODO
- Parameters
y (array) – TODO
- abstract encode_x(x)¶
TODO
TODO
- Parameters
x (array) – TODO
- abstract encode_y(y)¶
TODO
TODO
- Parameters
y (array) – TODO
- abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)¶
TODO
TODO
- Parameters
file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO
- Returns
TODO
- Return type
array
- Raises
Exception – if neither file_name nor (x and y) are specified
- get_annotations_file(set_name: str = 'test') → str¶
Get path to annotations file.
E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to annotations file
- Return type
str
- Raises
Exception – in case requested annotations file does not exist
- get_examples_file(set_name: str = 'test') → str¶
Get path to examples file.
E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to examples file
- Return type
str
- Raises
Exception – in case requested examples file does not exist
- abstract get_num_params() → seqgra.schema.ModelSize¶
TODO
TODO
- get_sequence_length(file_name: str) → int¶
- abstract load_model(file_name: Optional[str] = None)¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet¶
Method to parse annotations data file.
Checks validity of annotations.
- Parameters
file_name (str) – file name
- Returns
annotations (List[str]): annotations y (List[str]): labels
- Return type
- abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet¶
Abstract method to parse examples data file.
Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.
- Parameters
file_name (str) – file name
- Returns
x (List[str]): sequences y (List[str]): labels
- Return type
- abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)¶
TODO
TODO
- Parameters
x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True
- Raises
Exception – if neither file_name nor x are specified
- abstract print_model_summary() → None¶
TODO
TODO
- abstract save_model(file_name: Optional[str] = None)¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- abstract set_seed() → None¶
TODO
TODO
- train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None¶
Train model.
Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.
- Parameters
file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO
- Raises
Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val
- class MultiLabelClassificationLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶
Bases:
seqgra.learner.learner.Learner
Abstract class for multi-label classification learners.
Multi-label classification learners are learners for models with class labels that are not mututally exclusive.
- definition¶
contains model meta info, architecture and hyperparameters
- Type
- data_dir¶
directory with data files, e.g., training.txt
- Type
str
- output_dir¶
model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/
- Type
str
- validate_data¶
whether input data should be validated (e.g., check if valid DNA or protein sequence)
- Type
bool
- gpu_id¶
ID of GPU used by TensorFlow and PyTorch
- Type
int
- model¶
PyTorch or TensorFlow model
- optimizer¶
PyTorch or TensorFlow optimizer
- criterion¶
PyTorch or TensorFlow criterion (loss)
- metrics¶
metrics that are collected, usually loss and accuracy
- Type
List[str]
- Parameters
model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch
- check_annotations(annotations: List[str]) → bool¶
- check_labels(y: List[str], throw_exception: bool = True) → bool¶
- abstract check_sequence(x: List[str]) → bool¶
- abstract create_model() → None¶
Abstract method to create library-specific model.
Machine learning library specific implementations are provided for TensorFlow and PyTorch.
- dataset_generator(file_name: str)¶
- abstract decode_x(x)¶
TODO
TODO
- Parameters
x (array) – TODO
- abstract decode_y(y)¶
TODO
TODO
- Parameters
y (array) – TODO
- abstract encode_x(x)¶
TODO
TODO
- Parameters
x (array) – TODO
- abstract encode_y(y)¶
TODO
TODO
- Parameters
y (array) – TODO
- abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)¶
TODO
TODO
- Parameters
file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO
- Returns
TODO
- Return type
array
- Raises
Exception – if neither file_name nor (x and y) are specified
- get_annotations_file(set_name: str = 'test') → str¶
Get path to annotations file.
E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to annotations file
- Return type
str
- Raises
Exception – in case requested annotations file does not exist
- get_examples_file(set_name: str = 'test') → str¶
Get path to examples file.
E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to examples file
- Return type
str
- Raises
Exception – in case requested examples file does not exist
- abstract get_num_params() → seqgra.schema.ModelSize¶
TODO
TODO
- get_sequence_length(file_name: str) → int¶
- abstract load_model(file_name: Optional[str] = None)¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet¶
Method to parse annotations data file.
Checks validity of annotations.
- Parameters
file_name (str) – file name
- Returns
annotations (List[str]): annotations y (List[str]): labels
- Return type
- abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet¶
Abstract method to parse examples data file.
Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.
- Parameters
file_name (str) – file name
- Returns
x (List[str]): sequences y (List[str]): labels
- Return type
- abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)¶
TODO
TODO
- Parameters
x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True
- Raises
Exception – if neither file_name nor x are specified
- abstract print_model_summary() → None¶
TODO
TODO
- abstract save_model(file_name: Optional[str] = None)¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- abstract set_seed() → None¶
TODO
TODO
- train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None¶
Train model.
Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.
- Parameters
file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO
- Raises
Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val
- class MultipleRegressionLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶
Bases:
seqgra.learner.learner.Learner
Abstract class for multiple regression learners.
Multiple regression learners are learners for models with multiple independent real-valued variables (\(x \in R^n\)) and one dependent real-valued variable (\(x \in R\)).
- definition¶
contains model meta info, architecture and hyperparameters
- Type
- data_dir¶
directory with data files, e.g., training.txt
- Type
str
- output_dir¶
model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/
- Type
str
- validate_data¶
whether input data should be validated (e.g., check if valid DNA or protein sequence)
- Type
bool
- gpu_id¶
ID of GPU used by TensorFlow and PyTorch
- Type
int
- model¶
PyTorch or TensorFlow model
- optimizer¶
PyTorch or TensorFlow optimizer
- criterion¶
PyTorch or TensorFlow criterion (loss)
- metrics¶
metrics that are collected, usually loss and accuracy
- Type
List[str]
- Parameters
model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch
- check_annotations(annotations: List[str]) → bool¶
- abstract check_sequence(x: List[str]) → bool¶
- abstract create_model() → None¶
Abstract method to create library-specific model.
Machine learning library specific implementations are provided for TensorFlow and PyTorch.
- dataset_generator(file_name: str)¶
- abstract decode_x(x)¶
TODO
TODO
- Parameters
x (array) – TODO
- abstract decode_y(y)¶
TODO
TODO
- Parameters
y (array) – TODO
- abstract encode_x(x)¶
TODO
TODO
- Parameters
x (array) – TODO
- abstract encode_y(y)¶
TODO
TODO
- Parameters
y (array) – TODO
- abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)¶
TODO
TODO
- Parameters
file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO
- Returns
TODO
- Return type
array
- Raises
Exception – if neither file_name nor (x and y) are specified
- get_annotations_file(set_name: str = 'test') → str¶
Get path to annotations file.
E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to annotations file
- Return type
str
- Raises
Exception – in case requested annotations file does not exist
- get_examples_file(set_name: str = 'test') → str¶
Get path to examples file.
E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to examples file
- Return type
str
- Raises
Exception – in case requested examples file does not exist
- abstract get_num_params() → seqgra.schema.ModelSize¶
TODO
TODO
- get_sequence_length(file_name: str) → int¶
- abstract load_model(file_name: Optional[str] = None)¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet¶
Method to parse annotations data file.
Checks validity of annotations.
- Parameters
file_name (str) – file name
- Returns
annotations (List[str]): annotations y (List[str]): labels
- Return type
- abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet¶
Abstract method to parse examples data file.
Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.
- Parameters
file_name (str) – file name
- Returns
x (List[str]): sequences y (List[str]): labels
- Return type
- abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)¶
TODO
TODO
- Parameters
x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True
- Raises
Exception – if neither file_name nor x are specified
- abstract print_model_summary() → None¶
TODO
TODO
- abstract save_model(file_name: Optional[str] = None)¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- abstract set_seed() → None¶
TODO
TODO
- train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None¶
Train model.
Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.
- Parameters
file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO
- Raises
Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val
- class MultivariateRegressionLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶
Bases:
seqgra.learner.learner.Learner
Abstract class for multivariate regression learners.
Multivariate regression learners are used for models with multiple independent real-valued variables (\(x \in R^n\)) and multiple dependent real-valued variables (\(y \in R^n\)).
- definition¶
contains model meta info, architecture and hyperparameters
- Type
- data_dir¶
directory with data files, e.g., training.txt
- Type
str
- output_dir¶
model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/
- Type
str
- validate_data¶
whether input data should be validated (e.g., check if valid DNA or protein sequence)
- Type
bool
- gpu_id¶
ID of GPU used by TensorFlow and PyTorch
- Type
int
- model¶
PyTorch or TensorFlow model
- optimizer¶
PyTorch or TensorFlow optimizer
- criterion¶
PyTorch or TensorFlow criterion (loss)
- metrics¶
metrics that are collected, usually loss and accuracy
- Type
List[str]
- Parameters
model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch
- check_annotations(annotations: List[str]) → bool¶
- abstract check_sequence(x: List[str]) → bool¶
- abstract create_model() → None¶
Abstract method to create library-specific model.
Machine learning library specific implementations are provided for TensorFlow and PyTorch.
- dataset_generator(file_name: str)¶
- abstract decode_x(x)¶
TODO
TODO
- Parameters
x (array) – TODO
- abstract decode_y(y)¶
TODO
TODO
- Parameters
y (array) – TODO
- abstract encode_x(x)¶
TODO
TODO
- Parameters
x (array) – TODO
- abstract encode_y(y)¶
TODO
TODO
- Parameters
y (array) – TODO
- abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)¶
TODO
TODO
- Parameters
file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO
- Returns
TODO
- Return type
array
- Raises
Exception – if neither file_name nor (x and y) are specified
- get_annotations_file(set_name: str = 'test') → str¶
Get path to annotations file.
E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to annotations file
- Return type
str
- Raises
Exception – in case requested annotations file does not exist
- get_examples_file(set_name: str = 'test') → str¶
Get path to examples file.
E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.
- Parameters
set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
- Returns
file path to examples file
- Return type
str
- Raises
Exception – in case requested examples file does not exist
- abstract get_num_params() → seqgra.schema.ModelSize¶
TODO
TODO
- get_sequence_length(file_name: str) → int¶
- abstract load_model(file_name: Optional[str] = None)¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet¶
Method to parse annotations data file.
Checks validity of annotations.
- Parameters
file_name (str) – file name
- Returns
annotations (List[str]): annotations y (List[str]): labels
- Return type
- abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet¶
Abstract method to parse examples data file.
Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.
- Parameters
file_name (str) – file name
- Returns
x (List[str]): sequences y (List[str]): labels
- Return type
- abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)¶
TODO
TODO
- Parameters
x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True
- Raises
Exception – if neither file_name nor x are specified
- abstract print_model_summary() → None¶
TODO
TODO
- abstract save_model(file_name: Optional[str] = None)¶
TODO
TODO
- Parameters
file_name (str, optional) – file name in output dir; default is library-dependent
- abstract set_seed() → None¶
TODO
TODO
- train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None¶
Train model.
Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.
- Parameters
file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO
- Raises
Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val