seqgra.learner.learner module¶

Contains abstract classes for all learners.

Classes:

Learner: abstract base class for all learners
MultiClassClassificationLearner: abstract class for multi-class classification learners
MultiLabelClassificationLearner: abstract class for multi-label classification learners
MultipleRegressionLearner: abstract class for multiple regression learners
MultivariateRegressionLearner: abstract class for multivariate regression learners

class Learner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶

Bases: abc.ABC

Abstract base class for all learners.

definition¶

contains model meta info, architecture and hyperparameters

Type: ModelDefinition

data_dir¶

directory with data files, e.g., training.txt

Type: str

output_dir¶

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type: str

validate_data¶

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type: bool

gpu_id¶

ID of GPU used by TensorFlow and PyTorch

Type: int

model¶: PyTorch or TensorFlow model

optimizer¶: PyTorch or TensorFlow optimizer

criterion¶: PyTorch or TensorFlow criterion (loss)

metrics¶

metrics that are collected, usually loss and accuracy

Type: List[str]

Parameters

model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

See also

MultiClassClassificationLearner: for classification models with mutually exclusive classes
MultiLabelClassificationLearner: for classification models with non-mutually exclusive classes
MultipleRegressionLearner: for regression models with multiple independent variables and one dependent variable

check_annotations(annotations: List[str]) → bool[source]¶

check_labels(y: List[str], throw_exception: bool = True) → bool[source]¶

abstract check_sequence(x: List[str]) → bool[source]¶

abstract create_model() → None[source]¶

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)[source]¶

abstract decode_x(x)[source]¶

TODO

Parameters: x (array) – TODO

abstract decode_y(y)[source]¶

TODO

Parameters: y (array) – TODO

abstract encode_x(x)[source]¶

TODO

Parameters: x (array) – TODO

abstract encode_y(y)[source]¶

TODO

Parameters: y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)[source]¶

TODO

Parameters

file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test') → str[source]¶

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to annotations file
Return type: str
Raises: Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test') → str[source]¶

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to examples file
Return type: str
Raises: Exception – in case requested examples file does not exist

abstract get_label_set(y: List[str]) → Set[str][source]¶

abstract get_num_params() → seqgra.schema.ModelSize [source]¶

TODO

get_sequence_length(file_name: str) → int[source]¶

abstract load_model(file_name: Optional[str] = None)[source]¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet [source]¶

Method to parse annotations data file.

Checks validity of annotations.

Parameters: file_name (str) – file name
Returns: annotations (List[str]): annotations y (List[str]): labels
Return type: AnnotationSet

abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet [source]¶

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters: file_name (str) – file name
Returns: x (List[str]): sequences y (List[str]): labels
Return type: ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)[source]¶

TODO

Parameters

x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary() → None[source]¶

TODO

abstract save_model(file_name: Optional[str] = None)[source]¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed() → None[source]¶

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None[source]¶

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters

file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO

Raises

Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val

class MultiClassClassificationLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶

Bases: seqgra.learner.learner.Learner

Abstract class for multi-class classification learners.

Multi-class classification learners are learners for models with mututally exclusive class labels.

definition¶

contains model meta info, architecture and hyperparameters

Type: ModelDefinition

data_dir¶

directory with data files, e.g., training.txt

Type: str

output_dir¶

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type: str

validate_data¶

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type: bool

gpu_id¶

ID of GPU used by TensorFlow and PyTorch

Type: int

model¶: PyTorch or TensorFlow model

optimizer¶: PyTorch or TensorFlow optimizer

criterion¶: PyTorch or TensorFlow criterion (loss)

metrics¶

metrics that are collected, usually loss and accuracy

Type: List[str]

Parameters

model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

check_annotations(annotations: List[str]) → bool¶

check_labels(y: List[str], throw_exception: bool = True) → bool¶

abstract check_sequence(x: List[str]) → bool¶

abstract create_model() → None¶

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)¶

abstract decode_x(x)¶

TODO

Parameters: x (array) – TODO

abstract decode_y(y)¶

TODO

Parameters: y (array) – TODO

abstract encode_x(x)¶

TODO

Parameters: x (array) – TODO

abstract encode_y(y)¶

TODO

Parameters: y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)¶

TODO

Parameters

file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test') → str¶

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to annotations file
Return type: str
Raises: Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test') → str¶

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to examples file
Return type: str
Raises: Exception – in case requested examples file does not exist

get_label_set(y: List[str]) → Set[str][source]¶

abstract get_num_params() → seqgra.schema.ModelSize ¶

TODO

get_sequence_length(file_name: str) → int¶

abstract load_model(file_name: Optional[str] = None)¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet ¶

Method to parse annotations data file.

Checks validity of annotations.

Parameters: file_name (str) – file name
Returns: annotations (List[str]): annotations y (List[str]): labels
Return type: AnnotationSet

abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet ¶

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters: file_name (str) – file name
Returns: x (List[str]): sequences y (List[str]): labels
Return type: ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)¶

TODO

Parameters

x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary() → None¶

TODO

abstract save_model(file_name: Optional[str] = None)¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed() → None¶

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None¶

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters

file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO

Raises

Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val

class MultiLabelClassificationLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶

Bases: seqgra.learner.learner.Learner

Abstract class for multi-label classification learners.

Multi-label classification learners are learners for models with class labels that are not mututally exclusive.

definition¶

contains model meta info, architecture and hyperparameters

Type: ModelDefinition

data_dir¶

directory with data files, e.g., training.txt

Type: str

output_dir¶

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type: str

validate_data¶

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type: bool

gpu_id¶

ID of GPU used by TensorFlow and PyTorch

Type: int

model¶: PyTorch or TensorFlow model

optimizer¶: PyTorch or TensorFlow optimizer

criterion¶: PyTorch or TensorFlow criterion (loss)

metrics¶

metrics that are collected, usually loss and accuracy

Type: List[str]

Parameters

model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

check_annotations(annotations: List[str]) → bool¶

check_labels(y: List[str], throw_exception: bool = True) → bool¶

abstract check_sequence(x: List[str]) → bool¶

abstract create_model() → None¶

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)¶

abstract decode_x(x)¶

TODO

Parameters: x (array) – TODO

abstract decode_y(y)¶

TODO

Parameters: y (array) – TODO

abstract encode_x(x)¶

TODO

Parameters: x (array) – TODO

abstract encode_y(y)¶

TODO

Parameters: y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)¶

TODO

Parameters

file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test') → str¶

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to annotations file
Return type: str
Raises: Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test') → str¶

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to examples file
Return type: str
Raises: Exception – in case requested examples file does not exist

get_label_set(y: List[str]) → Set[str][source]¶

abstract get_num_params() → seqgra.schema.ModelSize ¶

TODO

get_sequence_length(file_name: str) → int¶

abstract load_model(file_name: Optional[str] = None)¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet ¶

Method to parse annotations data file.

Checks validity of annotations.

Parameters: file_name (str) – file name
Returns: annotations (List[str]): annotations y (List[str]): labels
Return type: AnnotationSet

abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet ¶

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters: file_name (str) – file name
Returns: x (List[str]): sequences y (List[str]): labels
Return type: ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)¶

TODO

Parameters

x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary() → None¶

TODO

abstract save_model(file_name: Optional[str] = None)¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed() → None¶

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None¶

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters

file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO

Raises

Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val

class MultipleRegressionLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶

Bases: seqgra.learner.learner.Learner

Abstract class for multiple regression learners.

Multiple regression learners are learners for models with multiple independent real-valued variables (\(x \in R^n\)) and one dependent real-valued variable (\(x \in R\)).

definition¶

contains model meta info, architecture and hyperparameters

Type: ModelDefinition

data_dir¶

directory with data files, e.g., training.txt

Type: str

output_dir¶

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type: str

validate_data¶

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type: bool

gpu_id¶

ID of GPU used by TensorFlow and PyTorch

Type: int

model¶: PyTorch or TensorFlow model

optimizer¶: PyTorch or TensorFlow optimizer

criterion¶: PyTorch or TensorFlow criterion (loss)

metrics¶

metrics that are collected, usually loss and accuracy

Type: List[str]

Parameters

model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

check_annotations(annotations: List[str]) → bool¶

check_labels(y: List[str], throw_exception: bool = True) → bool[source]¶

abstract check_sequence(x: List[str]) → bool¶

abstract create_model() → None¶

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)¶

abstract decode_x(x)¶

TODO

Parameters: x (array) – TODO

abstract decode_y(y)¶

TODO

Parameters: y (array) – TODO

abstract encode_x(x)¶

TODO

Parameters: x (array) – TODO

abstract encode_y(y)¶

TODO

Parameters: y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)¶

TODO

Parameters

file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test') → str¶

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to annotations file
Return type: str
Raises: Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test') → str¶

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to examples file
Return type: str
Raises: Exception – in case requested examples file does not exist

get_label_set(y: List[str]) → Set[str][source]¶

abstract get_num_params() → seqgra.schema.ModelSize ¶

TODO

get_sequence_length(file_name: str) → int¶

abstract load_model(file_name: Optional[str] = None)¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet ¶

Method to parse annotations data file.

Checks validity of annotations.

Parameters: file_name (str) – file name
Returns: annotations (List[str]): annotations y (List[str]): labels
Return type: AnnotationSet

abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet ¶

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters: file_name (str) – file name
Returns: x (List[str]): sequences y (List[str]): labels
Return type: ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)¶

TODO

Parameters

x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary() → None¶

TODO

abstract save_model(file_name: Optional[str] = None)¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed() → None¶

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None¶

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters

file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO

Raises

Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val

class MultivariateRegressionLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]¶

Bases: seqgra.learner.learner.Learner

Abstract class for multivariate regression learners.

Multivariate regression learners are used for models with multiple independent real-valued variables (\(x \in R^n\)) and multiple dependent real-valued variables (\(y \in R^n\)).

definition¶

contains model meta info, architecture and hyperparameters

Type: ModelDefinition

data_dir¶

directory with data files, e.g., training.txt

Type: str

output_dir¶

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type: str

validate_data¶

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type: bool

gpu_id¶

ID of GPU used by TensorFlow and PyTorch

Type: int

model¶: PyTorch or TensorFlow model

optimizer¶: PyTorch or TensorFlow optimizer

criterion¶: PyTorch or TensorFlow criterion (loss)

metrics¶

metrics that are collected, usually loss and accuracy

Type: List[str]

Parameters

model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters
data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}
output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}
validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)
gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

check_annotations(annotations: List[str]) → bool¶

check_labels(y: List[str], throw_exception: bool = True) → bool[source]¶

abstract check_sequence(x: List[str]) → bool¶

abstract create_model() → None¶

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)¶

abstract decode_x(x)¶

TODO

Parameters: x (array) – TODO

abstract decode_y(y)¶

TODO

Parameters: y (array) – TODO

abstract encode_x(x)¶

TODO

Parameters: x (array) – TODO

abstract encode_y(y)¶

TODO

Parameters: y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)¶

TODO

Parameters

file_name (Optional[str]) – TODO
x (Optional[List[str]]) – TODO
y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test') → str¶

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to annotations file
Return type: str
Raises: Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test') → str¶

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters: set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test
Returns: file path to examples file
Return type: str
Raises: Exception – in case requested examples file does not exist

get_label_set(y: List[str]) → Set[str][source]¶

abstract get_num_params() → seqgra.schema.ModelSize ¶

TODO

get_sequence_length(file_name: str) → int¶

abstract load_model(file_name: Optional[str] = None)¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str) → seqgra.schema.AnnotationSet ¶

Method to parse annotations data file.

Checks validity of annotations.

Parameters: file_name (str) – file name
Returns: annotations (List[str]): annotations y (List[str]): labels
Return type: AnnotationSet

abstract parse_examples_data(file_name: str) → seqgra.schema.ExampleSet ¶

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters: file_name (str) – file name
Returns: x (List[str]): sequences y (List[str]): labels
Return type: ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)¶

TODO

Parameters

x (array) – TODO
encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary() → None¶

TODO

abstract save_model(file_name: Optional[str] = None)¶

TODO

Parameters: file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed() → None¶

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None) → None¶

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters

file_name_train (Optional[str]) – TODO
file_name_val (Optional[str]) – TODO
x_train (Optional[List[str]]) – TODO
y_train (Optional[List[str]]) – TODO
x_val (Optional[List[str]]) – TODO
y_val (Optional[List[str]]) – TODO

Raises

Exception – output directory non-empty
Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val