Skip to main content

utils

Utility functions concerning data sources.

Module

Functions

get_datetime

def get_datetime(    date: Optional[Union[Date, DateTD, datetime_date]],)> Optional[datetime.date]:

Convert a Date or DateTD object to a datetime.date object.

Arguments

  • date: The Date or DateTD object to convert.

Returns The datetime.date object if date is a Date object, otherwise None.

get_file_size_str

def get_file_size_str(filename: str)> str:

Get file size as a human-readable string.

Arguments

  • filename: Path to the file.

Returns Human-readable file size string, or 'unknown' if file doesn't exist.

get_skip_reason_description

def get_skip_reason_description(reason: FileSkipReason)> str:

Get human-readable description for a skip reason.

Arguments

  • reason: The skip reason enum value.

Returns Human-readable description of the skip reason.

load_data_in_memory

def load_data_in_memory(    datasource: base_source.BaseSource, **kwargs: Any,)> pandas.core.frame.DataFrame:

Load all data from a datasource into memory and return a singular DataFrame.

Arguments

  • datasource: the datasource to load from.
  • kwargs: kwargs to pass through to the underlying yield_data() call.

task_running_context_manager

def task_running_context_manager(    datasource: base_source.BaseSource,)> collections.abc.Generator[BaseSource, None, None]:

A context manager to temporarily set a datasource in a "task running" context.

Classes

FileSkipReason

class FileSkipReason(    value, names=None, *, module=None, qualname=None, type=None, start=1,):

Enumeration of all possible reasons why a file might be skipped.

Ancestors

Variables

  • static ALREADY_SKIPPED
  • static DATASOURCE_FILTER_FAILED
  • static DATE_OUT_OF_RANGE
  • static DICOM_EMPTY_FILE
  • static DICOM_LOAD_FAILED
  • static DICOM_NO_IMAGES_SOP_CLASS
  • static DICOM_NO_IMAGE_DATA
  • static DICOM_NO_PIXEL_DATA
  • static DICOM_PIXEL_EXTRACTION_FAILED
  • static DICOM_UNEXPECTED_ERROR
  • static DICOM_UNSUPPORTED_ZEISS_MODALITY
  • static EXTENSION_NOT_ALLOWED
  • static IMAGE_EMPTY_DATA
  • static IMAGE_PROCESSING_FAILED
  • static MAX_FILES_EXCEEDED
  • static NOT_A_FILE
  • static OPHTH_BSCAN_COUNT_OUT_OF_RANGE
  • static OPHTH_BSCAN_COUNT_UNAVAILABLE
  • static OPHTH_DOB_OUT_OF_RANGE
  • static OPHTH_DOB_UNAVAILABLE
  • static OPHTH_MODALITY_MISMATCH
  • static OPHTH_PROPERTY_EXTRACTION_FAILED
  • static PRIVATE_EYE_EMPTY_RESULT
  • static PRIVATE_EYE_NO_PARSER
  • static PRIVATE_EYE_PROCESSING_FAILED
  • static PROCESSING_ERROR
  • static SIZE_OUT_OF_RANGE

FileSystemFilter

class FileSystemFilter(    file_extension: Optional[SingleOrMulti[str]] = None,    strict_file_extension: bool = False,    file_creation_min_date: Optional[Union[Date, DateTD]] = None,    file_modification_min_date: Optional[Union[Date, DateTD]] = None,    file_creation_max_date: Optional[Union[Date, DateTD]] = None,    file_modification_max_date: Optional[Union[Date, DateTD]] = None,    min_file_size: Optional[float] = None,    max_file_size: Optional[float] = None,):

Filter files based on various criteria.

Arguments

  • file_extension: File extension(s) of the data files. If None, all files will be searched. Can either be a single file extension or a list of file extensions. Case-insensitive. Defaults to None.
  • strict_file_extension: Whether File loading should be strictly done on files with the explicit file extension provided. If set to True will only load those files in the dataset. Otherwise, it will scan the given path for files of the same type as the provided file extension. Only relevant if file_extension is provided. Defaults to False.
  • file_creation_min_date: The oldest possible date to consider for file creation. If None, this filter will not be applied. Defaults to None.
  • file_modification_min_date: The oldest possible date to consider for file modification. If None, this filter will not be applied. Defaults to None.
  • file_creation_max_date: The newest possible date to consider for file creation. If None, this filter will not be applied. Defaults to None.
  • file_modification_max_date: The newest possible date to consider for file modification. If None, this filter will not be applied. Defaults to None.
  • min_file_size: The minimum file size in megabytes to consider. If None, all files will be considered. Defaults to None.
  • max_file_size: The maximum file size in megabytes to consider. If None, all files will be considered. Defaults to None.

Methods


check_skip_file

def check_skip_file(    self,    entry: Optional[os.DirEntry] = None,    path: Optional[str | os.PathLike] = None,    stat: Optional[os.stat_result] = None,)> tuple[bool, typing.Optional[FileSkipReason]]:

Filter files based on the criteria provided.

Check the following things in order:

  • is this a file?
  • is this an allowed type of file?
  • does this file meet the date criteria?
  • does this file meet the file size criteria?

Either entry OR path should be supplied. If path is supplied, stat may be optionally provided, but will be newly read if not.

If both entry and path are provided, then entry will take precedence.

Arguments

  • entry: The file to check as an os.DirEntry object, as from os.scandir(). Mutually exclusive with path.
  • path: The file path to check. Mutually exclusive with entry.
  • stat: The os.stat() details associated with path. Optional, will be read directly if not provided.

Returns True if the file should be skipped, False otherwise

log_files_found_with_extension

def log_files_found_with_extension(    self, num_found_files: int, interim: bool = True,)> None:

Log the files found with the given extension.