utils
Utility functions concerning data sources.
Module
Functions
get_datetime
def get_datetime( date: Optional[Union[Date, DateTD, datetime_date]],) ‑> Optional[datetime.date]:Convert a Date or DateTD object to a datetime.date object.
Arguments
date: The Date or DateTD object to convert.
Returns The datetime.date object if date is a Date object, otherwise None.
get_file_size_str
def get_file_size_str(filename: str) ‑> str:Get file size as a human-readable string.
Arguments
filename: Path to the file.
Returns Human-readable file size string, or 'unknown' if file doesn't exist.
get_skip_reason_description
def get_skip_reason_description(reason: FileSkipReason) ‑> str:Get human-readable description for a skip reason.
Arguments
reason: The skip reason enum value.
Returns Human-readable description of the skip reason.
load_data_in_memory
def load_data_in_memory( datasource: base_source.BaseSource, **kwargs: Any,) ‑> pandas.core.frame.DataFrame:Load all data from a datasource into memory and return a singular DataFrame.
Arguments
datasource: the datasource to load from.kwargs: kwargs to pass through to the underlying yield_data() call.
task_running_context_manager
def task_running_context_manager( datasource: base_source.BaseSource,) ‑> collections.abc.Generator[BaseSource, None, None]:A context manager to temporarily set a datasource in a "task running" context.
Classes
FileSkipReason
class FileSkipReason( value, names=None, *, module=None, qualname=None, type=None, start=1,):Enumeration of all possible reasons why a file might be skipped.
Variables
- static
ALREADY_SKIPPED
- static
DATASOURCE_FILTER_FAILED
- static
DATE_OUT_OF_RANGE
- static
DICOM_EMPTY_FILE
- static
DICOM_LOAD_FAILED
- static
DICOM_NO_IMAGES_SOP_CLASS
- static
DICOM_NO_IMAGE_DATA
- static
DICOM_NO_PIXEL_DATA
- static
DICOM_PIXEL_EXTRACTION_FAILED
- static
DICOM_UNEXPECTED_ERROR
- static
DICOM_UNSUPPORTED_ZEISS_MODALITY
- static
EXTENSION_NOT_ALLOWED
- static
IMAGE_EMPTY_DATA
- static
IMAGE_PROCESSING_FAILED
- static
MAX_FILES_EXCEEDED
- static
NOT_A_FILE
- static
OPHTH_BSCAN_COUNT_OUT_OF_RANGE
- static
OPHTH_BSCAN_COUNT_UNAVAILABLE
- static
OPHTH_DOB_OUT_OF_RANGE
- static
OPHTH_DOB_UNAVAILABLE
- static
OPHTH_MODALITY_MISMATCH
- static
OPHTH_PROPERTY_EXTRACTION_FAILED
- static
PRIVATE_EYE_EMPTY_RESULT
- static
PRIVATE_EYE_NO_PARSER
- static
PRIVATE_EYE_PROCESSING_FAILED
- static
PROCESSING_ERROR
- static
SIZE_OUT_OF_RANGE
FileSystemFilter
class FileSystemFilter( file_extension: Optional[SingleOrMulti[str]] = None, strict_file_extension: bool = False, file_creation_min_date: Optional[Union[Date, DateTD]] = None, file_modification_min_date: Optional[Union[Date, DateTD]] = None, file_creation_max_date: Optional[Union[Date, DateTD]] = None, file_modification_max_date: Optional[Union[Date, DateTD]] = None, min_file_size: Optional[float] = None, max_file_size: Optional[float] = None,):Filter files based on various criteria.
Arguments
file_extension: File extension(s) of the data files. If None, all files will be searched. Can either be a single file extension or a list of file extensions. Case-insensitive. Defaults to None.strict_file_extension: Whether File loading should be strictly done on files with the explicit file extension provided. If set to True will only load those files in the dataset. Otherwise, it will scan the given path for files of the same type as the provided file extension. Only relevant iffile_extensionis provided. Defaults to False.file_creation_min_date: The oldest possible date to consider for file creation. If None, this filter will not be applied. Defaults to None.file_modification_min_date: The oldest possible date to consider for file modification. If None, this filter will not be applied. Defaults to None.file_creation_max_date: The newest possible date to consider for file creation. If None, this filter will not be applied. Defaults to None.file_modification_max_date: The newest possible date to consider for file modification. If None, this filter will not be applied. Defaults to None.min_file_size: The minimum file size in megabytes to consider. If None, all files will be considered. Defaults to None.max_file_size: The maximum file size in megabytes to consider. If None, all files will be considered. Defaults to None.
Methods
check_skip_file
def check_skip_file( self, entry: Optional[os.DirEntry] = None, path: Optional[str | os.PathLike] = None, stat: Optional[os.stat_result] = None,) ‑> tuple[bool, typing.Optional[FileSkipReason]]:Filter files based on the criteria provided.
Check the following things in order:
- is this a file?
- is this an allowed type of file?
- does this file meet the date criteria?
- does this file meet the file size criteria?
Either entry OR path should be supplied. If path is supplied, stat may
be optionally provided, but will be newly read if not.
If both entry and path are provided, then entry will take precedence.
Arguments
entry: The file to check as anos.DirEntryobject, as fromos.scandir(). Mutually exclusive withpath.path: The file path to check. Mutually exclusive withentry.stat: Theos.stat()details associated withpath. Optional, will be read directly if not provided.
Returns True if the file should be skipped, False otherwise
log_files_found_with_extension
def log_files_found_with_extension( self, num_found_files: int, interim: bool = True,) ‑> None:Log the files found with the given extension.