filtering_algorithm
Algorithm for filtering data records based on configurable strategies.
Classes
AgeRangeFilterArgs
class AgeRangeFilterArgs(*args, **kwargs):Arguments for AGE_RANGE filter strategy.
This filtering strategy keeps only records within a specified age range in a given column.
Variables
- static
birth_date_column : str
- static
max_age : int
- static
min_age : int
- static
remote_modeller : bool
FilterStrategy
class FilterStrategy(*args, **kwds):Enumeration of available filtering strategies.
Ancestors
FilterStrategyClass
class FilterStrategyClass(*args, **kwds):Enumeration map of filter strategies to TypedDict and classes.
FrequencyFilterArgs
class FrequencyFilterArgs(*args, **kwargs):Arguments for FREQUENCY filter strategy.
This filtering strategy keeps only records with a specified frequency of ID occurrence.
Variables
- static
id_column : Union[str, list[str]]
- static
max_frequency : int
- static
min_frequency : int
- static
remote_modeller : bool
LatestFilterArgs
class LatestFilterArgs(*args, **kwargs):Arguments for LATEST filter strategy.
This filtering strategy keeps only the latest records per ID.
See dataclass for meanings of args.
Variables
- static
date_column : str
- static
id_column : Union[str, list[str]]
- static
num_latest : int
- static
remote_modeller : bool
PatientIDFilterArgs
class PatientIDFilterArgs(*args, **kwargs):Arguments for PATIENT_ID filter strategy.
This strategy reads a list of patient MRNs (or patient IDs) from a CSV file and excludes records from the dataframe that match those MRNs.
Arguments
filename: Path to the CSV file containing patient MRNs/IDs to exclude.patient_id_column: Column name in the exclusion CSV file that contains the patient MRNs/IDs to exclude. This is NOT the column in the dataframe being filtered.
Variables
- static
filename : Union[str, os.PathLike, ForwardRef(None)]
- static
patient_id_column : str
- static
remote_modeller : bool
RecordFilterAlgorithm
class RecordFilterAlgorithm( datastructure: DataStructure, strategies: Sequence[Union[FilterStrategy, str]], filter_args_list: list[FilterArgs],):Algorithm factory for filtering records based on various strategies.
Arguments
- **
**kwargs**: Additional keyword arguments. datastructure: The data structure to use for the algorithm.filter_args_list: List of strategy-specific argumentsstrategies: List of filtering strategies
Attributes
class_name: The name of the algorithm class.datastructure: The data structure to use for the algorithmfields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshmallow field type. (e.g. fields_dict ={"class_name": fields.Str()}).filter_args_list: List of strategy-specific argumentsnested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry})strategies: List of filtering strategies
Variables
- static
fields_dict : ClassVar[T_FIELDS_DICT]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:Create an instance representing the role specified.
modeller
def modeller( self, *, context: ProtocolContext, **kwargs: Any,) ‑> bitfount.federated.algorithms.filtering_algorithm._ModellerSide:Inherited from:
BaseNonModelAlgorithmFactory.modeller :
Modeller-side of the algorithm.
worker
def worker( self, *, context: ProtocolContext, **kwargs: Any,) ‑> bitfount.federated.algorithms.filtering_algorithm._WorkerSide:Inherited from:
BaseNonModelAlgorithmFactory.worker :
Worker-side of the algorithm.
ScanFrequencyFilterArgs
class ScanFrequencyFilterArgs(*args, **kwargs):Arguments for SCAN_FREQUENCY filter strategy.
This filtering strategy keeps only patients with a minimum specified number of scans per year over a specified number of years.
Variables
- static
date_column : str
- static
id_column : Union[str, list[str]]
- static
min_number_of_scans_per_year : int
- static
number_of_years : int
- static
remote_modeller : bool