Connecting datasets
This page covers how to connect datasets to a Pod using the Bitfount SDK. Any datasets connected using the SDK will also be visible in the Bitfount Desktop application and Hub but they won't be configurable. Under the hood, a dataset is powered by a datasource, which is the object that represents the type of data being connected to a Pod and encapsulates the specific logic required for loading and processing that kind of data.
Recall that datasets are part of a Pod, which is the entity that contains the datasets and enables them to be used in tasks.
Available Datasources
Bitfount supports connecting various types of datasets to a Pod, organised by domain. For detailed API documentation on all datasource classes, see the Datasources API reference.
Datasources are the objects that represent the type of data being connected to a Pod and encapsulate the specific logic required for loading and processing that kind of data. Learn more about how they work here.
General Datasets
- CSV files (
CSVSource) - Structured tabular data - Image folders (
ImageSource) - Collections of image files
Healthcare Datasets
- DICOM files (
DICOMSource) - Medical imaging data in DICOM format - OMOP databases (
OMOPSource) - Observational Medical Outcomes Partnership common data model - InterMine databases (
InterMineSource) - Biological data warehouses
Ophthalmic Datasets
- Heidelberg Eye Explorer data (
HeidelbergSource) - Retinal imaging data from Heidelberg devices - Topcon data (
TopconSource) - Ophthalmic imaging from Topcon equipment - DICOM Ophthalmology data (
DICOMOphthalmologySource) - General ophthalmic datasets in DICOM format (including Zeiss)
For specific API documentation on ophthalmic datasources, see the Ophthalmology Datasources API reference.
Connecting a dataset using the SDK
See the tutorials on Running a Pod for examples of how to connect CSV and Image folder datasets using the SDK.
A DICOM dataset can be connected to a Pod in much the same way but instead simply using the DICOMSource class.
Multiple datasets can be connected to a single Pod using the SDK by passing a list of DatasourceContainerConfig objects to the datasources argument of the Pod class.
Pod configuration objects
PodDetailsConfigprovides human-readable metadata for a dataset (for exampledisplay_nameanddescription) for display in the Bitfount Desktop application and HubPodDataConfigcarries the operational options required to load data, such asdatasource_args(for examplepath, connection strings, or ophthalmology flags), optionalforce_stypesto give control over column semantic types, andfile_system_filtersto filter files based on various criteria.
Example: Connecting a DICOM dataset using the SDK
This example shows how to connect a DICOM dataset to a Pod using the SDK. It also demonstrates how to filter files based on various criteria, such as file extension, file creation date, and file size.
import loggingfrom bitfount import ( DICOMSource, Pod, setup_loggers,)from bitfount.data.datasources.types import Datefrom bitfount.runners.config_schemas import ( DatasourceContainerConfig, FileSystemFilterConfig, PodDataConfig, PodDetailsConfig,)loggers = setup_loggers([logging.getLogger("bitfount")])if __name__ == "__main__": datasource_details = PodDetailsConfig( display_name="My DICOM Dataset", description="This Pod contains data from my DICOM dataset", ) datasource_args = {"path": "/path/to/dicom/dataset"} datasource = DICOMSource(**datasource_args) data_config = PodDataConfig( datasource_args=datasource_args, # DICOM frames are identified by the prefix "Pixel Data" force_stypes={"image_prefix": ["Pixel Data"]}, file_system_filters=FileSystemFilterConfig( file_extension="dcm", file_creation_min_date=Date(2025, 1, 1), min_file_size= 1.0, # 1MB ), ) pod = Pod( name="my-pod", datasources=[ DatasourceContainerConfig( name="my-dicom-dataset", datasource=datasource, datasource_details=datasource_details, data_config=data_config, ) ], ) pod.start()