Skip to main content

base

Base interface for data persistence implementations.

Classes

DataPersister

class DataPersister():

Abstract interface for data persistence/caching implementations.

Ancestors

Static methods


prep_data_for_caching

def prep_data_for_caching(    data: pd.DataFrame, image_cols: Optional[Collection[str]] = None,)> pd.DataFrame:

Prepares data ready for caching.

This involves removing/replacing things that aren't supposed to be cached or that it makes no sense to cache, such as image data or file paths that won't be relevant except for when the files are actually being used.

Does not mutate input dataframe.

Methods


bulk_set

def bulk_set(    self, data: pd.DataFrame, original_file_col: str = '_original_filename',)> None:

Bulk set a bunch of cache entries from a dataframe.

The dataframe must indicate the original file that each row is associated with. This is the _original_filename column by default.

get

def get(self, file: Union[str, Path])> Optional[pd.DataFrame]:

Get the persisted data for a given file.

Returns None if no data has been persisted, if it is out of date, or an error was otherwise encountered.

set

def set(self, file: Union[str, Path], data: pd.DataFrame)> None:

Set the persisted data for a given file.

If existing data is already set, it will be overwritten.

The data should only be the data that is related to that file.

unset

def unset(self, file: Union[str, Path])> None:

Deletes the persisted data for a given file.