Skip to main content

schema_cache_manager

Cache manager for schema and data points.

Classes

DatasetsTable

class DatasetsTable(dataset_name):

ORM for storing dataset names.

Ancestors

  • sqlalchemy.orm.decl_api.Base

Variables

  • dataset_name : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
  • id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]

FilePathsTable

class FilePathsTable(dataset_id, file_path):

ORM for storing file paths associated with datasets.

Ancestors

  • sqlalchemy.orm.decl_api.Base

Variables

  • dataset_id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
  • file_path : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
  • id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]

MainSchemaCacheTable

class MainSchemaCacheTable(dataset_id, partial_schema):

ORM for linking datasets to their partial schemas.

Ancestors

  • sqlalchemy.orm.decl_api.Base

Variables

  • cache_updated_at : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
  • dataset_id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
  • id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
  • partial_schema : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]

SQLiteSchemaCacheManager

class SQLiteSchemaCacheManager(sqlite_path: pathlib.Path):

A schema caching implementation that uses an SQLite database.

Methods


add_dataset

def add_dataset(self, dataset_name: str)> int:

Add a dataset to the Datasets table if it doesn't already exist.

cache_file_paths_with_partial_schema

def cache_file_paths_with_partial_schema(    self, dataset_name: str, file_paths: List[str], partial_schema: dict[str, typing.Any],)> None:

Cache file paths and the latest partial schema for a specific dataset.

  • Adds file paths to the file paths table.
  • Updates the 'number_of_records' field in the partial_schema based on new files added.
  • Saves the updated schema to the cache.

Arguments

  • dataset_name: The name of the dataset.
  • file_paths: A list of file paths to cache.
  • partial_schema: The partial schema dictionary to update and cache.

clear_dataset

def clear_dataset(self, dataset_name: str)> None:

Clear all cached data for a dataset.

get_file_paths

def get_file_paths(self, dataset_name: str)> List[str]:

Retrieve all file paths associated with a dataset.

get_partial_schema

def get_partial_schema(self, dataset_name: str)> Optional[dict[str, typing.Any]]:

Retrieve the partial schema for a dataset.

get_partial_schema_and_file_paths

def get_partial_schema_and_file_paths(    self, dataset_name: str,)> tuple[typing.Optional[dict[str, typing.Any]], typing.List[str]]:

Retrieve the partial schema and filepaths for a dataset.

update_partial_schema_field

def update_partial_schema_field(    self, dataset_name: str, field_path: List[str], value: Any,)> None:

Update a specific field in the partial schema for a given dataset.

Arguments

  • dataset_name: The name of the dataset.
  • field_path: A list of keys representing the nested field path to update. E.g., ["metadata", "schema_type"].
  • value: The new value to set for the field.