schema_cache_manager
Cache manager for schema and data points.
Classes
DatasetsTable
class DatasetsTable(dataset_name):
ORM for storing dataset names.
Ancestors
- sqlalchemy.orm.decl_api.Base
Variables
-
dataset_name : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
-
id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
FilePathsTable
class FilePathsTable(dataset_id, file_path):
ORM for storing file paths associated with datasets.
Ancestors
- sqlalchemy.orm.decl_api.Base
Variables
-
dataset_id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
-
file_path : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
-
id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
MainSchemaCacheTable
class MainSchemaCacheTable(dataset_id, partial_schema):
ORM for linking datasets to their partial schemas.
Ancestors
- sqlalchemy.orm.decl_api.Base
Variables
-
cache_updated_at : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
-
dataset_id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
-
id : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
-
partial_schema : Union[sqlalchemy.orm.attributes.InstrumentedAttribute[+_T_co], +_T_co]
SQLiteSchemaCacheManager
class SQLiteSchemaCacheManager(sqlite_path: pathlib.Path):
A schema caching implementation that uses an SQLite database.
Methods
add_dataset
def add_dataset(self, dataset_name: str) ‑> int:
Add a dataset to the Datasets table if it doesn't already exist.
cache_file_paths_with_partial_schema
def cache_file_paths_with_partial_schema( self, dataset_name: str, file_paths: List[str], partial_schema: dict[str, typing.Any],) ‑> None:
Cache file paths and the latest partial schema for a specific dataset.
- Adds file paths to the file paths table.
- Updates the 'number_of_records' field in the partial_schema based on new files added.
- Saves the updated schema to the cache.
Arguments
dataset_name
: The name of the dataset.file_paths
: A list of file paths to cache.partial_schema
: The partial schema dictionary to update and cache.
clear_dataset
def clear_dataset(self, dataset_name: str) ‑> None:
Clear all cached data for a dataset.
get_file_paths
def get_file_paths(self, dataset_name: str) ‑> List[str]:
Retrieve all file paths associated with a dataset.
get_partial_schema
def get_partial_schema(self, dataset_name: str) ‑> Optional[dict[str, typing.Any]]:
Retrieve the partial schema for a dataset.
get_partial_schema_and_file_paths
def get_partial_schema_and_file_paths( self, dataset_name: str,) ‑> tuple[typing.Optional[dict[str, typing.Any]], typing.List[str]]:
Retrieve the partial schema and filepaths for a dataset.
update_partial_schema_field
def update_partial_schema_field( self, dataset_name: str, field_path: List[str], value: Any,) ‑> None:
Update a specific field in the partial schema for a given dataset.
Arguments
dataset_name
: The name of the dataset.field_path
: A list of keys representing the nested field path to update. E.g., ["metadata", "schema_type"].value
: The new value to set for the field.