filters
Task-level filter types for runtime data filtering.
This module provides the data structures for specifying filters at task runtime
that work alongside existing datasource-level filters. Task filters are defined
in DataStructure, serialized to workers, and merged with datasource config
using intersection logic (most restrictive bounds win).
Example: >>> from bitfount.data.filters import TaskFilter >>> filter = TaskFilter( ... filter_type="min-frames", ... value=49, ... )
Module
Functions
meets_range_criteria
def meets_range_criteria( value: RangeBoundInput, min_value: Optional[RangeBoundInput] = None, max_value: Optional[RangeBoundInput] = None,) ‑> bool:Check if a value falls within an optional min/max range.
This helper is used by filter methods to determine if a data value meets the configured filter criteria. Both bounds are inclusive.
Arguments
value: The value to check. Must not be NaN.min_value: Optional minimum bound (inclusive). If None, no lower bound.max_value: Optional maximum bound (inclusive). If None, no upper bound.
Returns True if value is within the range, False otherwise.
Raises
TypeError: If value and bounds have incompatible types (e.g., int vs date).ValueError: If value is NaN.
Note datetime values (for value or bounds) are automatically converted to date (time component is discarded) for consistent comparison.
resolve_bool_filter
def resolve_bool_filter(ds_value: bool, task_value: Optional[TaskFilterValue]) ‑> bool:Resolve a bool filter - task can enable if datasource has disabled.
Arguments
ds_value: Datasource filter value (defaults to False if not set)task_value: Task filter value (should be bool if not None)
Returns True if either datasource or task enables the filter, False otherwise
resolve_date_filter
def resolve_date_filter( ds_value: Union[Date, DateTD, None], task_value: Optional[TaskFilterValue], is_min: bool,) ‑> Optional[datetime.date]:Resolve a date filter value using intersection logic.
Converts Date/DateTD objects to datetime.date objects.
Arguments
ds_value: Datasource filter value (Date or DateTD)task_value: Task filter value (should be Date or DateTD if not None)is_min: True for minimum filters (take max), False for maximum (take min)
Returns Resolved date value or None
resolve_float_filter
def resolve_float_filter( ds_value: Optional[float], task_value: Optional[TaskFilterValue], is_min: bool,) ‑> Optional[float]:Resolve a float filter value using intersection logic.
Arguments
ds_value: Datasource filter valuetask_value: Task filter value (should be float/int if not None)is_min: True for minimum filters (take max), False for maximum (take min)
Returns Resolved float value or None
resolve_int_filter
def resolve_int_filter( ds_value: Optional[int], task_value: Optional[TaskFilterValue], is_min: bool,) ‑> Optional[int]:Resolve an integer filter value using intersection logic.
Arguments
ds_value: Datasource filter valuetask_value: Task filter value (should be int if not None)is_min: True for minimum filters (take max), False for maximum (take min)
Returns Resolved integer value or None
resolve_list_filter
def resolve_list_filter( ds_value: Optional[list[str]], task_value: Optional[TaskFilterValue],) ‑> Optional[list[str]]:Resolve a list filter value by taking the union of both lists.
Arguments
ds_value: Datasource filter valuetask_value: Task filter value (should be list[str] if not None)
Returns Union of the two lists or None
resolve_modality_filter
def resolve_modality_filter( ds_value: "Optional[Literal['OCT', 'SLO']]", task_value: Optional[TaskFilterValue],) ‑> Optional[Literal['OCT', 'SLO']]:Resolve a modality filter value - must match if both present.
Arguments
ds_value: Datasource filter valuetask_value: Task filter value (should be str if not None)
Returns Resolved modality value or None
Raises
ValueError: If both values present but don't match
resolve_str_filter
def resolve_str_filter( filter_name: str, ds_value: Optional[str], task_value: Optional[TaskFilterValue],) ‑> Optional[str]:Resolve a string filter value - must match if both present.
should_persist_skip_to_cache
def should_persist_skip_to_cache( *, value: Optional[RangeBoundInput] = None, datasource_min: Optional[RangeBoundInput] = None, datasource_max: Optional[RangeBoundInput] = None, missing_fields: Optional[set[str]] = None, datasource_required_fields: Optional[set[str]] = None, file_modality: Optional[OphthalmologyModalityType] = None, datasource_modality: Optional[OphthalmologyModalityType] = None, file_series_description: Optional[str] = None, datasource_series_description: Optional[str] = None,) ‑> bool:Determine if a skipped file should be persisted to the skip cache.
Checks whether a file that failed the merged filter would also fail the datasource-only filter. If yes, cache the skip. If no, don't cache.
Assumes the file has ALREADY failed the merged filter check.
Arguments
value: File's value for range filters (e.g., num_frames, file_size).datasource_min: Minimum bound from datasource config.datasource_max: Maximum bound from datasource config.missing_fields: Field names the file is missing.datasource_required_fields: Fields required by datasource config.file_modality: The file's actual modality (e.g., "OCT", "SLO").datasource_modality: Modality required by datasource config.file_series_description: The file's actual series description.datasource_series_description: Series description required by datasource config.
Raises
ValueError: If multiple filter categories provided or none provided.
to_date
def to_date(value: Union[Date, DateTD, dict[str, int], None]) ‑> Optional[datetime.date]:Convert a Date or DateTD object to a datetime.date object.
Classes
MergedFilterConfig
class MergedFilterConfig( min_num_frames: Optional[int] = None, max_num_frames: Optional[int] = None, minimum_dob: Optional[date] = None, maximum_dob: Optional[date] = None, check_required_fields: bool = False, required_field_names: Optional[list[str]] = None, file_creation_min_date: Optional[date] = None, file_creation_max_date: Optional[date] = None, file_modification_min_date: Optional[date] = None, file_modification_max_date: Optional[date] = None, min_file_size: Optional[float] = None, max_file_size: Optional[float] = None, modality: "Optional[Literal['OCT', 'SLO']]" = None, min_acquisition_date: Optional[date] = None, max_acquisition_date: Optional[date] = None, series_description: Optional[str] = None,):Effective filter configuration after merging task and datasource filters.
This represents the final, resolved filter values that should be applied when loading data. It is produced by merging task-level filters with datasource-level filters using intersection logic (most restrictive wins).
All fields are optional as different datasource types support different filter capabilities.
Variables
- static
check_required_fields : bool
- static
file_creation_max_date : Optional[datetime.date]
- static
file_creation_min_date : Optional[datetime.date]
- static
file_modification_max_date : Optional[datetime.date]
- static
file_modification_min_date : Optional[datetime.date]
- static
max_acquisition_date : Optional[datetime.date]
- static
max_file_size : Optional[float]
- static
max_num_frames : Optional[int]
- static
maximum_dob : Optional[datetime.date]
- static
min_acquisition_date : Optional[datetime.date]
- static
min_file_size : Optional[float]
- static
min_num_frames : Optional[int]
- static
minimum_dob : Optional[datetime.date]
- static
modality : Optional[Literal['OCT', 'SLO']]
- static
required_field_names : Optional[list[str]]
- static
series_description : Optional[str]
TaskFilter
class TaskFilter(filter_type: str, value: TaskFilterValue):A single task-level filter to apply at runtime.
Task-level filters work with datasource-level filters using intersection logic. They can only further restrict data selection, not expand it.
Arguments
filter_type: The type of filter (kebab-case string matching TaskFilterType).value: The filter value. Type depends on filter_type: - Date filters: dict with year (required), month (optional), day (optional) - Size filters: float (MB) - Frame filters: int - check-required-fields: bool to enable required field checking - required-field-names: list[str] of required field names to check - modality: str, either "OCT" or "SLO"
Variables
- static
fields_dict : ClassVar[dict[str, marshmallow.fields.Field]]
- static
filter_type : str
- static
nested_fields : ClassVar[dict[str, collections.abc.Mapping[str, Any]]]
TaskFilterType
class TaskFilterType(*args, **kwds):Types of task-level filters available at runtime.
Each filter type maps to an existing filter capability in the codebase. Values use kebab-case for FE compatibility.
Variables
- static
CHECK_REQUIRED_FIELDS
- static
FILE_CREATION_MAX_DATE
- static
FILE_CREATION_MIN_DATE
- static
FILE_MODIFICATION_MAX_DATE
- static
FILE_MODIFICATION_MIN_DATE
- static
MAX_DOB
- static
MAX_FILE_SIZE
- static
MAX_FRAMES
- static
MIN_DOB
- static
MIN_FILE_SIZE
- static
MIN_FRAMES
- static
MODALITY
- static
REQUIRED_FIELD_NAMES
- static
SCAN_ACQUISITION_MAX_DATE
- static
SCAN_ACQUISITION_MIN_DATE
- static
SERIES_DESCRIPTION