seqgra.learner.learner module

Contains abstract classes for all learners.

Classes:
class Learner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]

Bases: abc.ABC

Abstract base class for all learners.

definition

contains model meta info, architecture and hyperparameters

Type

ModelDefinition

data_dir

directory with data files, e.g., training.txt

Type

str

output_dir

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type

str

validate_data

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type

bool

gpu_id

ID of GPU used by TensorFlow and PyTorch

Type

int

model

PyTorch or TensorFlow model

optimizer

PyTorch or TensorFlow optimizer

criterion

PyTorch or TensorFlow criterion (loss)

metrics

metrics that are collected, usually loss and accuracy

Type

List[str]

Parameters
  • model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters

  • data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}

  • output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}

  • validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)

  • gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

See also

check_annotations(annotations: List[str])bool[source]
check_labels(y: List[str], throw_exception: bool = True)bool[source]
abstract check_sequence(x: List[str])bool[source]
abstract create_model()None[source]

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)[source]
abstract decode_x(x)[source]

TODO

TODO

Parameters

x (array) – TODO

abstract decode_y(y)[source]

TODO

TODO

Parameters

y (array) – TODO

abstract encode_x(x)[source]

TODO

TODO

Parameters

x (array) – TODO

abstract encode_y(y)[source]

TODO

TODO

Parameters

y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)[source]

TODO

TODO

Parameters
  • file_name (Optional[str]) – TODO

  • x (Optional[List[str]]) – TODO

  • y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test')str[source]

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to annotations file

Return type

str

Raises

Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test')str[source]

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to examples file

Return type

str

Raises

Exception – in case requested examples file does not exist

abstract get_label_set(y: List[str])Set[str][source]
abstract get_num_params()seqgra.schema.ModelSize[source]

TODO

TODO

get_sequence_length(file_name: str)int[source]
abstract load_model(file_name: Optional[str] = None)[source]

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str)seqgra.schema.AnnotationSet[source]

Method to parse annotations data file.

Checks validity of annotations.

Parameters

file_name (str) – file name

Returns

annotations (List[str]): annotations y (List[str]): labels

Return type

AnnotationSet

abstract parse_examples_data(file_name: str)seqgra.schema.ExampleSet[source]

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters

file_name (str) – file name

Returns

x (List[str]): sequences y (List[str]): labels

Return type

ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)[source]

TODO

TODO

Parameters
  • x (array) – TODO

  • encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary()None[source]

TODO

TODO

abstract save_model(file_name: Optional[str] = None)[source]

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed()None[source]

TODO

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None)None[source]

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters
  • file_name_train (Optional[str]) – TODO

  • file_name_val (Optional[str]) – TODO

  • x_train (Optional[List[str]]) – TODO

  • y_train (Optional[List[str]]) – TODO

  • x_val (Optional[List[str]]) – TODO

  • y_val (Optional[List[str]]) – TODO

Raises
  • Exception – output directory non-empty

  • Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val

class MultiClassClassificationLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]

Bases: seqgra.learner.learner.Learner

Abstract class for multi-class classification learners.

Multi-class classification learners are learners for models with mututally exclusive class labels.

definition

contains model meta info, architecture and hyperparameters

Type

ModelDefinition

data_dir

directory with data files, e.g., training.txt

Type

str

output_dir

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type

str

validate_data

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type

bool

gpu_id

ID of GPU used by TensorFlow and PyTorch

Type

int

model

PyTorch or TensorFlow model

optimizer

PyTorch or TensorFlow optimizer

criterion

PyTorch or TensorFlow criterion (loss)

metrics

metrics that are collected, usually loss and accuracy

Type

List[str]

Parameters
  • model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters

  • data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}

  • output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}

  • validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)

  • gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

check_annotations(annotations: List[str])bool
check_labels(y: List[str], throw_exception: bool = True)bool
abstract check_sequence(x: List[str])bool
abstract create_model()None

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)
abstract decode_x(x)

TODO

TODO

Parameters

x (array) – TODO

abstract decode_y(y)

TODO

TODO

Parameters

y (array) – TODO

abstract encode_x(x)

TODO

TODO

Parameters

x (array) – TODO

abstract encode_y(y)

TODO

TODO

Parameters

y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)

TODO

TODO

Parameters
  • file_name (Optional[str]) – TODO

  • x (Optional[List[str]]) – TODO

  • y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test')str

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to annotations file

Return type

str

Raises

Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test')str

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to examples file

Return type

str

Raises

Exception – in case requested examples file does not exist

get_label_set(y: List[str])Set[str][source]
abstract get_num_params()seqgra.schema.ModelSize

TODO

TODO

get_sequence_length(file_name: str)int
abstract load_model(file_name: Optional[str] = None)

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str)seqgra.schema.AnnotationSet

Method to parse annotations data file.

Checks validity of annotations.

Parameters

file_name (str) – file name

Returns

annotations (List[str]): annotations y (List[str]): labels

Return type

AnnotationSet

abstract parse_examples_data(file_name: str)seqgra.schema.ExampleSet

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters

file_name (str) – file name

Returns

x (List[str]): sequences y (List[str]): labels

Return type

ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)

TODO

TODO

Parameters
  • x (array) – TODO

  • encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary()None

TODO

TODO

abstract save_model(file_name: Optional[str] = None)

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed()None

TODO

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None)None

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters
  • file_name_train (Optional[str]) – TODO

  • file_name_val (Optional[str]) – TODO

  • x_train (Optional[List[str]]) – TODO

  • y_train (Optional[List[str]]) – TODO

  • x_val (Optional[List[str]]) – TODO

  • y_val (Optional[List[str]]) – TODO

Raises
  • Exception – output directory non-empty

  • Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val

class MultiLabelClassificationLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]

Bases: seqgra.learner.learner.Learner

Abstract class for multi-label classification learners.

Multi-label classification learners are learners for models with class labels that are not mututally exclusive.

definition

contains model meta info, architecture and hyperparameters

Type

ModelDefinition

data_dir

directory with data files, e.g., training.txt

Type

str

output_dir

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type

str

validate_data

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type

bool

gpu_id

ID of GPU used by TensorFlow and PyTorch

Type

int

model

PyTorch or TensorFlow model

optimizer

PyTorch or TensorFlow optimizer

criterion

PyTorch or TensorFlow criterion (loss)

metrics

metrics that are collected, usually loss and accuracy

Type

List[str]

Parameters
  • model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters

  • data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}

  • output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}

  • validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)

  • gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

check_annotations(annotations: List[str])bool
check_labels(y: List[str], throw_exception: bool = True)bool
abstract check_sequence(x: List[str])bool
abstract create_model()None

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)
abstract decode_x(x)

TODO

TODO

Parameters

x (array) – TODO

abstract decode_y(y)

TODO

TODO

Parameters

y (array) – TODO

abstract encode_x(x)

TODO

TODO

Parameters

x (array) – TODO

abstract encode_y(y)

TODO

TODO

Parameters

y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)

TODO

TODO

Parameters
  • file_name (Optional[str]) – TODO

  • x (Optional[List[str]]) – TODO

  • y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test')str

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to annotations file

Return type

str

Raises

Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test')str

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to examples file

Return type

str

Raises

Exception – in case requested examples file does not exist

get_label_set(y: List[str])Set[str][source]
abstract get_num_params()seqgra.schema.ModelSize

TODO

TODO

get_sequence_length(file_name: str)int
abstract load_model(file_name: Optional[str] = None)

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str)seqgra.schema.AnnotationSet

Method to parse annotations data file.

Checks validity of annotations.

Parameters

file_name (str) – file name

Returns

annotations (List[str]): annotations y (List[str]): labels

Return type

AnnotationSet

abstract parse_examples_data(file_name: str)seqgra.schema.ExampleSet

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters

file_name (str) – file name

Returns

x (List[str]): sequences y (List[str]): labels

Return type

ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)

TODO

TODO

Parameters
  • x (array) – TODO

  • encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary()None

TODO

TODO

abstract save_model(file_name: Optional[str] = None)

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed()None

TODO

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None)None

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters
  • file_name_train (Optional[str]) – TODO

  • file_name_val (Optional[str]) – TODO

  • x_train (Optional[List[str]]) – TODO

  • y_train (Optional[List[str]]) – TODO

  • x_val (Optional[List[str]]) – TODO

  • y_val (Optional[List[str]]) – TODO

Raises
  • Exception – output directory non-empty

  • Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val

class MultipleRegressionLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]

Bases: seqgra.learner.learner.Learner

Abstract class for multiple regression learners.

Multiple regression learners are learners for models with multiple independent real-valued variables (\(x \in R^n\)) and one dependent real-valued variable (\(x \in R\)).

definition

contains model meta info, architecture and hyperparameters

Type

ModelDefinition

data_dir

directory with data files, e.g., training.txt

Type

str

output_dir

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type

str

validate_data

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type

bool

gpu_id

ID of GPU used by TensorFlow and PyTorch

Type

int

model

PyTorch or TensorFlow model

optimizer

PyTorch or TensorFlow optimizer

criterion

PyTorch or TensorFlow criterion (loss)

metrics

metrics that are collected, usually loss and accuracy

Type

List[str]

Parameters
  • model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters

  • data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}

  • output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}

  • validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)

  • gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

check_annotations(annotations: List[str])bool
check_labels(y: List[str], throw_exception: bool = True)bool[source]
abstract check_sequence(x: List[str])bool
abstract create_model()None

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)
abstract decode_x(x)

TODO

TODO

Parameters

x (array) – TODO

abstract decode_y(y)

TODO

TODO

Parameters

y (array) – TODO

abstract encode_x(x)

TODO

TODO

Parameters

x (array) – TODO

abstract encode_y(y)

TODO

TODO

Parameters

y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)

TODO

TODO

Parameters
  • file_name (Optional[str]) – TODO

  • x (Optional[List[str]]) – TODO

  • y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test')str

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to annotations file

Return type

str

Raises

Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test')str

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to examples file

Return type

str

Raises

Exception – in case requested examples file does not exist

get_label_set(y: List[str])Set[str][source]
abstract get_num_params()seqgra.schema.ModelSize

TODO

TODO

get_sequence_length(file_name: str)int
abstract load_model(file_name: Optional[str] = None)

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str)seqgra.schema.AnnotationSet

Method to parse annotations data file.

Checks validity of annotations.

Parameters

file_name (str) – file name

Returns

annotations (List[str]): annotations y (List[str]): labels

Return type

AnnotationSet

abstract parse_examples_data(file_name: str)seqgra.schema.ExampleSet

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters

file_name (str) – file name

Returns

x (List[str]): sequences y (List[str]): labels

Return type

ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)

TODO

TODO

Parameters
  • x (array) – TODO

  • encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary()None

TODO

TODO

abstract save_model(file_name: Optional[str] = None)

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed()None

TODO

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None)None

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters
  • file_name_train (Optional[str]) – TODO

  • file_name_val (Optional[str]) – TODO

  • x_train (Optional[List[str]]) – TODO

  • y_train (Optional[List[str]]) – TODO

  • x_val (Optional[List[str]]) – TODO

  • y_val (Optional[List[str]]) – TODO

Raises
  • Exception – output directory non-empty

  • Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val

class MultivariateRegressionLearner(model_definition: seqgra.model.model.modeldefinition.ModelDefinition, data_dir: str, output_dir: str, validate_data: bool = True, gpu_id: int = 0, silent: bool = False)[source]

Bases: seqgra.learner.learner.Learner

Abstract class for multivariate regression learners.

Multivariate regression learners are used for models with multiple independent real-valued variables (\(x \in R^n\)) and multiple dependent real-valued variables (\(y \in R^n\)).

definition

contains model meta info, architecture and hyperparameters

Type

ModelDefinition

data_dir

directory with data files, e.g., training.txt

Type

str

output_dir

model output directory, {OUTPUTDIR}/models/{GRAMMAR ID}/{MODEL ID}/

Type

str

validate_data

whether input data should be validated (e.g., check if valid DNA or protein sequence)

Type

bool

gpu_id

ID of GPU used by TensorFlow and PyTorch

Type

int

model

PyTorch or TensorFlow model

optimizer

PyTorch or TensorFlow optimizer

criterion

PyTorch or TensorFlow criterion (loss)

metrics

metrics that are collected, usually loss and accuracy

Type

List[str]

Parameters
  • model_definition (ModelDefinition) – contains model meta info, architecture and hyperparameters

  • data_dir (str) – directory with data files, {OUTPUTDIR}/input/{GRAMMAR ID}

  • output_dir (str) – model output directory without model folder, {OUTPUTDIR}/models/{GRAMMAR ID}

  • validate_data (bool) – whether input data should be validated (e.g., check if valid DNA or protein sequence)

  • gpu_id (int) – ID of GPU used by TensorFlow and PyTorch

check_annotations(annotations: List[str])bool
check_labels(y: List[str], throw_exception: bool = True)bool[source]
abstract check_sequence(x: List[str])bool
abstract create_model()None

Abstract method to create library-specific model.

Machine learning library specific implementations are provided for TensorFlow and PyTorch.

dataset_generator(file_name: str)
abstract decode_x(x)

TODO

TODO

Parameters

x (array) – TODO

abstract decode_y(y)

TODO

TODO

Parameters

y (array) – TODO

abstract encode_x(x)

TODO

TODO

Parameters

x (array) – TODO

abstract encode_y(y)

TODO

TODO

Parameters

y (array) – TODO

abstract evaluate_model(file_name: Optional[str] = None, x: Optional[List[str]] = None, y: Optional[List[str]] = None)

TODO

TODO

Parameters
  • file_name (Optional[str]) – TODO

  • x (Optional[List[str]]) – TODO

  • y (Optional[List[str]]) – TODO

Returns

TODO

Return type

array

Raises

Exception – if neither file_name nor (x and y) are specified

get_annotations_file(set_name: str = 'test')str

Get path to annotations file.

E.g., get_annotations_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training-annotation.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to annotations file

Return type

str

Raises

Exception – in case requested annotations file does not exist

get_examples_file(set_name: str = 'test')str

Get path to examples file.

E.g., get_examples_file(“training”) returns {OUTPUTDIR}/input/{GRAMMAR ID}/training.txt, if it exists.

Parameters

set_name (str, optional) – set name can be one of the following: training, validation, or test; defaults to test

Returns

file path to examples file

Return type

str

Raises

Exception – in case requested examples file does not exist

get_label_set(y: List[str])Set[str][source]
abstract get_num_params()seqgra.schema.ModelSize

TODO

TODO

get_sequence_length(file_name: str)int
abstract load_model(file_name: Optional[str] = None)

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

parse_annotations_data(file_name: str)seqgra.schema.AnnotationSet

Method to parse annotations data file.

Checks validity of annotations.

Parameters

file_name (str) – file name

Returns

annotations (List[str]): annotations y (List[str]): labels

Return type

AnnotationSet

abstract parse_examples_data(file_name: str)seqgra.schema.ExampleSet

Abstract method to parse examples data file.

Checks validity of sequences with sequence data type specific implementations provided for DNA and amino acid sequences.

Parameters

file_name (str) – file name

Returns

x (List[str]): sequences y (List[str]): labels

Return type

ExampleSet

abstract predict(file_name: Optional[str] = None, x: Optional[Any] = None, encode: bool = True)

TODO

TODO

Parameters
  • x (array) – TODO

  • encode (bool, optional) – whether x should be encoded; defaults to True

Raises

Exception – if neither file_name nor x are specified

abstract print_model_summary()None

TODO

TODO

abstract save_model(file_name: Optional[str] = None)

TODO

TODO

Parameters

file_name (str, optional) – file name in output dir; default is library-dependent

abstract set_seed()None

TODO

TODO

train_model(file_name_train: Optional[str] = None, file_name_val: Optional[str] = None, x_train: Optional[List[str]] = None, y_train: Optional[List[str]] = None, x_val: Optional[List[str]] = None, y_val: Optional[List[str]] = None)None

Train model.

Specify either file_name_train and file_name_val or x_train, y_train, x_val, and y_val.

Parameters
  • file_name_train (Optional[str]) – TODO

  • file_name_val (Optional[str]) – TODO

  • x_train (Optional[List[str]]) – TODO

  • y_train (Optional[List[str]]) – TODO

  • x_val (Optional[List[str]]) – TODO

  • y_val (Optional[List[str]]) – TODO

Raises
  • Exception – output directory non-empty

  • Exception – specify either file_name_train and file_name_val or x_train, y_train, x_val, y_val