dataset_operations

Dataset-related transformations.

This module contains the base class and concrete classes for dataset transformations, those that potentially act over the entire dataset.

Classes

AverageColumnsTransformation

class AverageColumnsTransformation(    *,    name: str = None,    cols: list[str],    round_to_int: bool = False,    drop_source_cols: bool = True,    output: bool = True,):

Transformation that averages multiple columns into a single new column.

This transformation computes the mean of the specified source columns and creates a new column with the result. Optionally, the result can be rounded to the nearest integer and the source columns can be dropped.

Arguments

cols: List of column names to average. Can use column references (e.g., "c:column_name").
drop_source_cols: Whether to drop the source columns after computing the average. Defaults to True.
name: The name of the transformation. If not provided a unique name will be generated from the class name.
output: Whether this transformation should be included in the final output. Defaults to True.
round_to_int: Whether to round the result to the nearest integer. Defaults to False.

Raises

TransformationRegistryError: If the transformation name is already in use.
TransformationRegistryError: If the transformation name hasn't been provided and the transformation is not registered.
ValueError: If fewer than 2 columns are specified.

Method generated by attrs for class AverageColumnsTransformation.

Ancestors

Transformation

Variables

static cols : list[str]

static drop_source_cols : bool

static output : bool

static round_to_int : bool

Static methods

schema

def schema() ‑> marshmallow.schema.Schema:

Inherited from:

Transformation.schema :

Gets an instance of the Schema associated with this Transformation.

Raises

TypeError: If the transformation doesn't have a TransformationSchema as the schema.

CleanDataTransformation

class CleanDataTransformation(    *, name: str = None, output: bool = True, cols: Union[str, list[str]] = 'all',):

Dataset transformation that will "clean" the specified columns.

For continuous columns this will replace all infinities and NaNs with 0. For categorical columns this will replace all NaN's with "nan" explicitly.

Arguments

cols: The columns to act on as a list of strings. Defaults to "all" which acts on all columns in the dataset.
name: The name of the transformation. If not provided a unique name will be generated from the class name.
output: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.

Raises

TransformationRegistryError: If the transformation name is already in use.
TransformationRegistryError: If the transformation name hasn't been provided and the transformation is not registered.
ValueError: If output is False.

Method generated by attrs for class CleanDataTransformation.

Ancestors

Static methods

schema

def schema() ‑> marshmallow.schema.Schema:

Inherited from:

DatasetTransformation.schema :

Gets an instance of the Schema associated with this Transformation.

Raises

TypeError: If the transformation doesn't have a TransformationSchema as the schema.

DatasetTransformation

class DatasetTransformation(    *, name: str = None, output: bool = True, cols: Union[str, list[str]] = 'all',):

Base transformation for all dataset transformation classes.

User can specify "all" to have it act on every relevant column as defined in the schema.

Arguments

cols: The columns to act on as a list of strings. Defaults to "all" which acts on all columns in the dataset.
name: The name of the transformation. If not provided a unique name will be generated from the class name.
output: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.

Raises

TransformationRegistryError: If the transformation name is already in use.
TransformationRegistryError: If the transformation name hasn't been provided and the transformation is not registered.
ValueError: If output is False.

Method generated by attrs for class DatasetTransformation.

Ancestors

Transformation

Subclasses

Variables

static cols : Union[str, list[str]]

static output : bool

Static methods

schema

def schema() ‑> marshmallow.schema.Schema:

Inherited from:

Transformation.schema :

Gets an instance of the Schema associated with this Transformation.

Raises

TypeError: If the transformation doesn't have a TransformationSchema as the schema.

DropColumnsTransformation

class DropColumnsTransformation(    *, name: str = None, cols: list[str], output: bool = True,):

Transformation that drops specified columns from the dataframe.

This transformation removes the specified columns from the dataframe.

Arguments

cols: List of column names to drop. Can use column references (e.g., "c:column_name").
name: The name of the transformation. If not provided a unique name will be generated from the class name.
output: Whether this transformation should be included in the final output. This is always True for DropColumnsTransformation as it modifies the dataframe in place.

Raises

TransformationRegistryError: If the transformation name is already in use.
TransformationRegistryError: If the transformation name hasn't been provided and the transformation is not registered.
ValueError: If no columns are specified.

Method generated by attrs for class DropColumnsTransformation.

Ancestors

Transformation

Variables

static cols : list[str]

static output : bool

Static methods

schema

def schema() ‑> marshmallow.schema.Schema:

Inherited from:

Transformation.schema :

Gets an instance of the Schema associated with this Transformation.

Raises

TypeError: If the transformation doesn't have a TransformationSchema as the schema.

NormalizeDataTransformation

class NormalizeDataTransformation(    *, name: str = None, output: bool = True, cols: Union[str, list[str]] = 'float',):

Dataset transformation that will normalise the specified continuous columns.

Arguments

cols: The columns to act on as a list of strings. By default, this transformation will only apply to columns of type float.
name: The name of the transformation. If not provided a unique name will be generated from the class name.
output: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.

Raises

TransformationRegistryError: If the transformation name is already in use.
TransformationRegistryError: If the transformation name hasn't been provided and the transformation is not registered.
ValueError: If output is False.

Method generated by attrs for class NormalizeDataTransformation.

Ancestors

Variables

static cols : Union[str, list[str]]

Static methods

schema

def schema() ‑> marshmallow.schema.Schema:

Inherited from:

DatasetTransformation.schema :

Gets an instance of the Schema associated with this Transformation.

Raises

TypeError: If the transformation doesn't have a TransformationSchema as the schema.

ScalarAdditionDataTransformation

class ScalarAdditionDataTransformation(    *,    name: str = None,    output: bool = True,    cols: Union[str, list[str]] = 'all',    scalar: Union[int, float, Mapping[str, Union[int, float]]] = 0,):

Dataset transformation that adds a scalar to the specified columns.

Transformation applied to the dataset in place. Only applies to continuous columns.

Arguments

cols: The columns to act on as a list of strings. Defaults to "all" which acts on all columns in the dataset.
name: The name of the transformation. If not provided a unique name will be generated from the class name.
output: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.
scalar: the scalar to be used for multiplication. It can be provided as a number, in which case all numerical columns will be multiplied by the respective scalar or as a dictionary mapping column names to scalars for multiplication. Defaults to 0.

Raises

TransformationApplicationError: if the scalar variable is not correctly instantiated.
TransformationRegistryError: If the transformation name is already in use.
TransformationRegistryError: If the transformation name hasn't been provided and the transformation is not registered.
ValueError: If output is False.

Method generated by attrs for class ScalarAdditionDataTransformation.

Ancestors

Variables

static scalar : Union[int, float, collections.abc.Mapping[str, Union[int, float]]]

Static methods

schema

def schema() ‑> marshmallow.schema.Schema:

Inherited from:

DatasetTransformation.schema :

Gets an instance of the Schema associated with this Transformation.

Raises

TypeError: If the transformation doesn't have a TransformationSchema as the schema.

ScalarMultiplicationDataTransformation

class ScalarMultiplicationDataTransformation(    *,    name: str = None,    output: bool = True,    cols: Union[str, list[str]] = 'all',    scalar: Union[int, float, Mapping[str, Union[int, float]]] = 1,):

Dataset transformation that multiplies the specified columns by a scalar.

Transformation applied to the dataset in place. Only applies to continuous columns.

Arguments

cols: The columns to act on as a list of strings. Defaults to "all" which acts on all columns in the dataset.
name: The name of the transformation. If not provided a unique name will be generated from the class name.
output: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.
scalar: the scalar to be used for multiplication. It can be provided as a number, in which case all numerical columns will be multiplied by the respective scalar or as a dictionary mapping column names to scalars for multiplication. Defaults to 1.

Raises

TransformationApplicationError: if the scalar variable is not correctly instantiated.
TransformationRegistryError: If the transformation name is already in use.
TransformationRegistryError: If the transformation name hasn't been provided and the transformation is not registered.
ValueError: If output is False.

Method generated by attrs for class ScalarMultiplicationDataTransformation.

Ancestors

Variables

static scalar : Union[int, float, collections.abc.Mapping[str, Union[int, float]]]

Static methods

schema

def schema() ‑> marshmallow.schema.Schema:

Inherited from:

DatasetTransformation.schema :

Gets an instance of the Schema associated with this Transformation.

Raises

TypeError: If the transformation doesn't have a TransformationSchema as the schema.

Classes​

AverageColumnsTransformation​

Ancestors​

Variables​

Static methods​

schema​

CleanDataTransformation​

Ancestors​

Static methods​

schema​

DatasetTransformation​

Ancestors​

Subclasses​

Variables​

Static methods​

schema​

DropColumnsTransformation​

Ancestors​

Variables​

Static methods​

schema​

NormalizeDataTransformation​

Ancestors​

Variables​

Static methods​

schema​

ScalarAdditionDataTransformation​

Ancestors​

Variables​

Static methods​

schema​

ScalarMultiplicationDataTransformation​

Ancestors​

Variables​

Static methods​

schema​

Classes

AverageColumnsTransformation

Ancestors

Variables

Static methods

schema

CleanDataTransformation

Ancestors

Static methods

schema

DatasetTransformation

Ancestors

Subclasses

Variables

Static methods

schema

DropColumnsTransformation

Ancestors

Variables

Static methods

schema

NormalizeDataTransformation

Ancestors

Variables

Static methods

schema

ScalarAdditionDataTransformation

Ancestors

Variables

Static methods

schema

ScalarMultiplicationDataTransformation

Ancestors

Variables

Static methods

schema