pod_db_utils
Utilities for the Pod results database.
Module
Functions
map_task_to_hash_add_to_db
def map_task_to_hash_add_to_db( serialized_protocol: SerializedProtocol, task_hash: str, project_db_con: sqlite3.Connection,) ‑> None:
Maps the task hash to the protocol and algorithm used.
Adds the task to the task database if it is not already present.
Arguments
serialized_protocol
: The serialized protocol used for the task.task_hash
: The hash of the task.project_db_con
: The connection to the database.
save_processed_datapoint_to_project_db
def save_processed_datapoint_to_project_db( connector: PodDbConnector, project_db_con: sqlite3.Connection, datasource: BaseSource, run_on_new_data_only: bool, pod_identifier: str, task_hash: str, table: Optional[str] = None,) ‑> None:
Saves the result of a task run to the database.
Arguments
connector
: The PodDbConnector object for database connection.project_db_con
: The connection to the project database.datasource
: The datasource used for the task.run_on_new_data_only
: Whether the task was run on new data only. This is used to determine which rows of the data should be saved to the database.pod_identifier
: The identifier of the pod.task_hash
: The hash of the task, a unique identifier for when results have come from the same task definition, regardless of whether they are from the same run.table
: The table to get pod data from. Defaults to None.
update_pod_db
def update_pod_db( pod_name: str, connector: PodDbConnector, datasource_name: str, datasource: BaseSource, pod_init_call: bool = False,) ‑> None:
Creates and updates the pod database.
This is a static database on the pod with the datapoint hashes so we only compute them once. For each datapoint row in the datasource, a hash value is computed. Then the data from (each table of) the datasource, together with the hash value, are written to the database.