Skip to main content

filtering_algorithm

Algorithm for filtering data records based on configurable strategies.

Classes

AgeRangeFilterArgs

class AgeRangeFilterArgs(*args, **kwargs):

Arguments for AGE_RANGE filter strategy.

This filtering strategy keeps only records within a specified age range in a given column.

Variables

  • static birth_date_column : str
  • static max_age : int
  • static min_age : int
  • static remote_modeller : bool

FilterStrategy

class FilterStrategy(*args, **kwds):

Enumeration of available filtering strategies.

Variables

  • static AGE_RANGE
  • static FREQUENCY
  • static LATEST
  • static PATIENT_ID
  • static SCAN_FREQUENCY

FilterStrategyClass

class FilterStrategyClass(*args, **kwds):

Enumeration map of filter strategies to TypedDict and classes.

Ancestors

Variables

  • static AGE_RANGE
  • static FREQUENCY
  • static LATEST
  • static PATIENT_ID
  • static SCAN_FREQUENCY

FrequencyFilterArgs

class FrequencyFilterArgs(*args, **kwargs):

Arguments for FREQUENCY filter strategy.

This filtering strategy keeps only records with a specified frequency of ID occurrence.

Variables

  • static id_column : Union[str, list[str]]
  • static max_frequency : int
  • static min_frequency : int
  • static remote_modeller : bool

LatestFilterArgs

class LatestFilterArgs(*args, **kwargs):

Arguments for LATEST filter strategy.

This filtering strategy keeps only the latest records per ID.

See dataclass for meanings of args.

Variables

  • static date_column : str
  • static id_column : Union[str, list[str]]
  • static num_latest : int
  • static remote_modeller : bool

PatientIDFilterArgs

class PatientIDFilterArgs(*args, **kwargs):

Arguments for PATIENT_ID filter strategy.

This strategy reads a list of patient MRNs (or patient IDs) from a CSV file and excludes records from the dataframe that match those MRNs.

Arguments

  • filename: Path to the CSV file containing patient MRNs/IDs to exclude.
  • patient_id_column: Column name in the exclusion CSV file that contains the patient MRNs/IDs to exclude. This is NOT the column in the dataframe being filtered.

Variables

  • static filename : Union[str, os.PathLike, ForwardRef(None)]
  • static patient_id_column : str
  • static remote_modeller : bool

RecordFilterAlgorithm

class RecordFilterAlgorithm(    datastructure: DataStructure,    strategies: Sequence[Union[FilterStrategy, str]],    filter_args_list: list[FilterArgs],):

Algorithm factory for filtering records based on various strategies.

Arguments

  • ****kwargs**: Additional keyword arguments.
  • datastructure: The data structure to use for the algorithm.
  • filter_args_list: List of strategy-specific arguments
  • strategies: List of filtering strategies

Attributes

  • class_name: The name of the algorithm class.
  • datastructure: The data structure to use for the algorithm
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshmallow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • filter_args_list: List of strategy-specific arguments
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • strategies: List of filtering strategies

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, *, context: ProtocolContext, **kwargs: Any,)> bitfount.federated.algorithms.filtering_algorithm._ModellerSide:

Inherited from:

BaseNonModelAlgorithmFactory.modeller :

Modeller-side of the algorithm.

worker

def worker(    self, *, context: ProtocolContext, **kwargs: Any,)> bitfount.federated.algorithms.filtering_algorithm._WorkerSide:

Inherited from:

BaseNonModelAlgorithmFactory.worker :

Worker-side of the algorithm.

ScanFrequencyFilterArgs

class ScanFrequencyFilterArgs(*args, **kwargs):

Arguments for SCAN_FREQUENCY filter strategy.

This filtering strategy keeps only patients with a minimum specified number of scans per year over a specified number of years.

Variables

  • static date_column : str
  • static id_column : Union[str, list[str]]
  • static min_number_of_scans_per_year : int
  • static number_of_years : int
  • static remote_modeller : bool