SQL
A SQL task is a task that allows you to run SQL queries on a dataset and optionally return the results. It is a powerful tool for data analysis and manipulation. The task below saves the results of the SQL query to both the modeller and the Pod-side.
info
If running your SQL query against a non-SQL-based dataset (e.g. a CSVSource dataset or otherwise), the table name will be the dataset identifier without the username, in between backticks(``). Please ensure your SQL query operates on that table to make sure it is correctly parsed e.g. SELECT MAX(G) AS MAX_OF_G FROM `my-dataset-identifier` .
If running a SQL task against a SQL-based dataset (i.e. an OMOPSource dataset), you can write your query as normal.
Example
modeller:
identity_verification_method: key-based
pods:
identifiers:
- <replace-with-dataset-identifier>
batched_execution: false
test_run: false
run_on_new_data_only: false
task:
protocol:
name: bitfount.ResultsOnly
arguments:
save_location: "{{ save_location }}"
algorithm:
- name: bitfount.SqlQuery
arguments:
query: "{{ query }}"
data_structure:
# Schema is not required for this task since we are returning all columns regardless
schema_requirements: empty
compatible_datasources:
- CSVSource
template:
query:
type: string
default: "SELECT * FROM `table` LIMIT 100"
label: Query
tooltip: The SQL query to execute.
save_location:
label: "Save Location"
tooltip: "Specify where to save the results."
type: "array"
items:
type: "string"
minItems: 1
default:
- Modeller
- Worker