Skip to main content

filtering_algorithm

Algorithm for filtering data records based on configurable strategies.

Classes

AgeRangeFilterArgs

class AgeRangeFilterArgs(*args, **kwargs):

Arguments for AGE_RANGE filter strategy.

This filtering strategy keeps only records within a specified age range in a given column.

Ancestors

  • builtins.dict

Variables

  • static birth_date_column : str
  • static max_age : int
  • static min_age : int

FilterStrategy

class FilterStrategy(    value, names=None, *, module=None, qualname=None, type=None, start=1,):

Enumeration of available filtering strategies.

Inherits from str to allow for easy conversion to string and comparison with other strings. The ordering of the inheritance is important (first str then Enum). This replicates the strEnum behaviour in Python 3.11+. TODO: [Python 3.11] Convert to strEnum when Python 3.11 is the minimum version.

Ancestors

Variables

  • static AGE_RANGE
  • static FREQUENCY
  • static LATEST

FilterStrategyClass

class FilterStrategyClass(    value, names=None, *, module=None, qualname=None, type=None, start=1,):

Enumeration map of filter strategies to TypedDict and classes.

Ancestors

Variables

  • static AGE_RANGE
  • static FREQUENCY
  • static LATEST

FrequencyFilterArgs

class FrequencyFilterArgs(*args, **kwargs):

Arguments for FREQUENCY filter strategy.

This filtering strategy keeps only records with a specified frequency of ID occurrence.

Ancestors

  • builtins.dict

Variables

  • static id_column : Union[str, list[str]]
  • static max_frequency : int
  • static min_frequency : int

LatestFilterArgs

class LatestFilterArgs(*args, **kwargs):

Arguments for LATEST filter strategy.

This filtering strategy keeps only the latest records per ID.

Ancestors

  • builtins.dict

Variables

  • static date_column : str
  • static id_column : Union[str, list[str]]
  • static num_latest : int

RecordFilterAlgorithm

class RecordFilterAlgorithm(    datastructure: DataStructure,    strategies: Sequence[Union[FilterStrategy, str]],    filter_args_list: list[FilterArgs],):

Algorithm factory for filtering records based on various strategies.

Arguments

  • ****kwargs**: Additional keyword arguments.
  • datastructure: The data structure to use for the algorithm.
  • filter_args_list: List of strategy-specific arguments
  • strategies: List of filtering strategies

Attributes

  • class_name: The name of the algorithm class.
  • datastructure: The data structure to use for the algorithm
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • filter_args_list: List of strategy-specific arguments
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • strategies: List of filtering strategies

Raises

  • ValueError: If required parameters for a strategy are missing

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.filtering_algorithm._WorkerSide:

Worker-side of the algorithm.