Skip to main content

filtering_algorithm

Algorithm for filtering data records based on configurable strategies.

Classes

AgeRangeFilterArgs

class AgeRangeFilterArgs(*args, **kwargs):

Arguments for AGE_RANGE filter strategy.

This filtering strategy keeps only records within a specified age range in a given column.

Variables

  • static birth_date_column : str
  • static flag_column_name : Optional[str]
  • static flag_only : bool
  • static max_age : int
  • static min_age : int
  • static remote_modeller : bool

FilterStrategy

class FilterStrategy(*args, **kwds):

Enumeration of available filtering strategies.

Variables

  • static AGE_RANGE
  • static FREQUENCY
  • static LATEST
  • static PATIENT_ID
  • static PATIENT_ID_WITH_PREVIOUS_RUNS
  • static SCAN_FREQUENCY
  • static SCAN_IN_PERIOD
  • static SERIES_DESCRIPTION_LATEST

FilterStrategyClass

class FilterStrategyClass(*args, **kwds):

Enumeration map of filter strategies to TypedDict and classes.

Ancestors

Variables

  • static AGE_RANGE
  • static FREQUENCY
  • static LATEST
  • static PATIENT_ID
  • static PATIENT_ID_WITH_PREVIOUS_RUNS
  • static SCAN_FREQUENCY
  • static SCAN_IN_PERIOD
  • static SERIES_DESCRIPTION_LATEST

FrequencyFilterArgs

class FrequencyFilterArgs(*args, **kwargs):

Arguments for FREQUENCY filter strategy.

This filtering strategy keeps only records with a specified frequency of ID occurrence.

Variables

  • static flag_column_name : Optional[str]
  • static flag_only : bool
  • static id_column : Union[str, list[str]]
  • static max_frequency : int
  • static min_frequency : int
  • static remote_modeller : bool

LatestFilterArgs

class LatestFilterArgs(*args, **kwargs):

Arguments for LATEST filter strategy.

This filtering strategy keeps only the latest records per ID.

See dataclass for meanings of args.

Variables

  • static date_column : str
  • static flag_column_name : Optional[str]
  • static flag_only : bool
  • static id_column : Union[str, list[str]]
  • static num_latest : int
  • static remote_modeller : bool

PatientIDFilterArgs

class PatientIDFilterArgs(*args, **kwargs):

Arguments for PATIENT_ID filter strategy.

This strategy reads a list of patient MRNs (or patient IDs) from a CSV file and excludes records from the dataframe that match those MRNs.

Arguments

  • filename: Path to the CSV file containing patient MRNs/IDs to exclude.
  • patient_id_column: Column name in the exclusion CSV file that contains the patient MRNs/IDs to exclude. This is NOT the column in the dataframe being filtered.

Variables

  • static filename : Union[str, os.PathLike[str], ForwardRef(None)]
  • static flag_column_name : Optional[str]
  • static flag_only : bool
  • static patient_id_column : str
  • static remote_modeller : bool

RecordFilterAlgorithm

class RecordFilterAlgorithm(    datastructure: DataStructure,    strategies: Sequence[Union[FilterStrategy, str]],    filter_args_list: list[FilterArgs],):

Algorithm factory for filtering records based on various strategies.

Arguments

  • **kwargs: Additional keyword arguments.
  • datastructure: The data structure to use for the algorithm.
  • filter_args_list: List of strategy-specific arguments
  • strategies: List of filtering strategies

Attributes

  • class_name: The name of the algorithm class.
  • datastructure: The data structure to use for the algorithm
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshmallow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • filter_args_list: List of strategy-specific arguments
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • strategies: List of filtering strategies

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, *, context: ProtocolContext, **kwargs: Any,)> bitfount.federated.algorithms.filtering_algorithm._ModellerSide:

Inherited from:

BaseNonModelAlgorithmFactory.modeller :

Modeller-side of the algorithm.

worker

def worker(    self, *, context: ProtocolContext, **kwargs: Any,)> bitfount.federated.algorithms.filtering_algorithm._WorkerSide:

Inherited from:

BaseNonModelAlgorithmFactory.worker :

Worker-side of the algorithm.

ScanFrequencyFilterArgs

class ScanFrequencyFilterArgs(*args, **kwargs):

Arguments for SCAN_FREQUENCY filter strategy.

This filtering strategy keeps only patients with a minimum specified number of scans per year over a specified number of years.

Variables

  • static date_column : str
  • static flag_column_name : Optional[str]
  • static flag_only : bool
  • static id_column : Union[str, list[str]]
  • static min_number_of_scans_per_year : int
  • static number_of_years : int
  • static remote_modeller : bool

ScanInPeriodFilterArgs

class ScanInPeriodFilterArgs(*args, **kwargs):

Arguments for SCAN_IN_PERIOD filter strategy.

Keeps only patients with at least one scan within a specified time period. When laterality_column is provided, both a left AND a right scan are required; otherwise any scan in the period qualifies. All records for qualifying patients are retained.

The period can be specified in one of three mutually exclusive ways:

  • last_n_months: relative period, e.g. last 12 months from today
  • last_n_years: relative period, e.g. last 1 year from today
  • start_date + end_date: absolute date range (ISO format or parseable string)

Arguments

  • date_column: Column containing scan/acquisition dates.
  • id_column: Column(s) to group patients by.
  • laterality_column: Column containing eye laterality (e.g. "L"/"R"). When omitted, the both-eyes requirement is skipped.
  • last_n_months: Number of months back from today for relative period.
  • last_n_years: Number of years back from today for relative period.
  • start_date: Start of absolute date range (inclusive).
  • end_date: End of absolute date range (inclusive).

Variables

  • static date_column : str
  • static end_date : str
  • static flag_column_name : Optional[str]
  • static flag_only : bool
  • static id_column : Union[str, list[str]]
  • static last_n_months : int
  • static last_n_years : int
  • static laterality_column : str
  • static remote_modeller : bool
  • static start_date : str

SeriesDescriptionLatestFilterArgs

class SeriesDescriptionLatestFilterArgs(*args, **kwargs):

Arguments for SERIES_DESCRIPTION_LATEST filter strategy.

This filtering strategy keeps only the latest records per ID and scan type, inferred from the series description.

Variables

  • static columns_column : str
  • static date_column : str
  • static flag_column_name : Optional[str]
  • static flag_only : bool
  • static id_column : Union[str, list[str]]
  • static manufacturer_column : str
  • static number_of_frames_column : str
  • static remote_modeller : bool
  • static rows_column : str
  • static series_description_column : str