Skip to main content

pod_db_utils

Utilities for the Pod results database.

Module

Functions

map_task_to_hash_add_to_db

def map_task_to_hash_add_to_db(    serialized_protocol: SerializedProtocol,    task_hash: str,    project_db_con: sqlite3.Connection,)> None:

Maps the task hash to the protocol and algorithm used.

Adds the task to the task database if it is not already present.

Arguments

  • serialized_protocol: The serialized protocol used for the task.
  • task_hash: The hash of the task.
  • project_db_con: The connection to the database.

save_processed_datapoint_to_project_db

def save_processed_datapoint_to_project_db(    connector: PodDbConnector,    project_db_con: sqlite3.Connection,    datasource: BaseSource,    run_on_new_data_only: bool,    pod_identifier: str,    task_hash: str,    table: Optional[str] = None,)> None:

Saves the result of a task run to the database.

Arguments

  • connector: The PodDbConnector object for database connection.
  • project_db_con: The connection to the project database.
  • datasource: The datasource used for the task.
  • run_on_new_data_only: Whether the task was run on new data only. This is used to determine which rows of the data should be saved to the database.
  • pod_identifier: The identifier of the pod.
  • task_hash: The hash of the task, a unique identifier for when results have come from the same task definition, regardless of whether they are from the same run.
  • table: The table to get pod data from. Defaults to None.

update_pod_db

def update_pod_db(    pod_name: str,    connector: PodDbConnector,    datasource_name: str,    datasource: BaseSource,    pod_init_call: bool = False,)> None:

Creates and updates the pod database.

This is a static database on the pod with the datapoint hashes so we only compute them once. For each datapoint row in the datasource, a hash value is computed. Then the data from (each table of) the datasource, together with the hash value, are written to the database.