Skip to main content

sqlite

A data persistance implementation backed by an SQLite database.

Classes

CacheInfoTableBase

class CacheInfoTableBase():

Cache information entry ORM.

Represents the table in the database that corresponds to cache validity information. In particular, stores the primary key of the cache, file, which is the canonical path of the file in question, and the time the cache was last updated for that file.

This is a mix-in designed to be used with the EntityName pattern: https://github.com/sqlalchemy/sqlalchemy/wiki/EntityName

Variables

  • static cache_updated_at
  • static data
  • static file

DataTableBase

class DataTableBase():

Cached data entry ORM.

The specific structure of this table will depend on the data being stored in it (hence why deferred reflection is used); the table is initialised at the first set() call and its schema determined at that point.

Some things are consistent though; the data must have: - an integer primary key column (data_cache_id) - a column of text called _source_canonical_path (which stores a canonical filepath) and has a foreign key constraint on the cache info table.

This is a mix-in designed to be used with the EntityName pattern: https://github.com/sqlalchemy/sqlalchemy/wiki/EntityName

Variables

  • static cache_info
  • static data_cache_id

SQLiteDataPersister

class SQLiteDataPersister(sqlite_path: Path):

A data caching implementation that uses an SQLite database.

Static methods


prep_data_for_caching

def prep_data_for_caching(    data: pd.DataFrame, image_cols: Optional[Collection[str]] = None,)> pd.DataFrame:

Inherited from:

DataPersister.prep_data_for_caching :

Prepares data ready for caching.

This involves removing/replacing things that aren't supposed to be cached or that it makes no sense to cache, such as image data or file paths that won't be relevant except for when the files are actually being used.

Does not mutate input dataframe.

Methods


bulk_set

def bulk_set(    self, data: pd.DataFrame, original_file_col: str = '_original_filename',)> None:

Inherited from:

DataPersister.bulk_set :

Bulk set a bunch of cache entries from a dataframe.

The dataframe must indicate the original file that each row is associated with. This is the _original_filename column by default.

get

def get(self, file: Union[str, Path])> Optional[pd.DataFrame]:

Inherited from:

DataPersister.get :

Get the persisted data for a given file.

Returns None if no data has been persisted, if it is out of date, or an error was otherwise encountered.

set

def set(self, file: Union[str, Path], data: pd.DataFrame)> None:

Inherited from:

DataPersister.set :

Set the persisted data for a given file.

If existing data is already set, it will be overwritten.

The data should only be the data that is related to that file.

unset

def unset(self, file: Union[str, Path])> None:

Deletes the persisted data for the given file.