Skip to main content

Datasets

Datasets in Bitfount act as references to your data, storing only metadata and schema—not the raw data itself. Your datasets always remain on your system and are never transferred or stored by Bitfount.

This guide covers how to connect a dataset to Bitfount, link it to a project, and manage dataset access.

Connecting datasets

Before using a dataset in a project, you must first connect it to Bitfount using Bitfount Desktop. Connecting a dataset to Bitfount is like registering it—only its metadata (name, description, and schema) is stored, never the raw data itself.

Format

It's important to ensure your dataset is formatted correctly to be compatible with the task used in the project. If you are joining an existing project, please check with the project contact to ensure your dataset meets the requirements for the task.

Selecting a data source

To connect a dataset, click Connect dataset either from the Datasets page, or within the project when you link a dataset, and choose from the available data sources supported by Bitfount.

product-modal-datasources-min.png

tip

If your dataset contains DICOM files and you intend to run Ophthalmic tasks, we recommend selecting the DICOM (Ophthalmology) data source for optimal compatibility.

After selecting a data source, enter a dataset name and optionally, a description, then click Connect dataset. The system will then process the connection, making the dataset available within Bitfount.

Once connected, the dataset should appear Online.

note

Can't find the data source you need? Please reach out to the Bitfount support team—we're happy to help you connect your dataset to Bitfount.

Schema

When you connect a dataset, Bitfount automatically generates a schema that defines the column names and data types within your dataset. This schema is used to verify compatibility with the task used in a project, and does not contain any actual data (such as patient records), only structural information about the dataset.

If you are working with data scientists, they may also reference the schema to design analyses and tasks that align with your dataset's structure.

product-schema-min.png

Managing datasets

Status

When you start Bitfount Desktop, the system automatically attempts to establish a connection with all connected datasets, whether they are online or offline.

If needed, you can manually take a dataset offline from the Settings tab, which will temporarily disable task execution for that dataset.

info

Tasks cannot run until Bitfount has finished connecting all datasets at startup

History

A full audit trail is available for datasets via the Activity history tab. To view project-specific activity, navigate to the same tab in the relevant project.

Archiving

From the Settings tab you can archive your dataset. Archiving does not delete the raw data source connected to Bitfount. Archived datasets can be unarchived and reused in projects when appropriate.

Access

You can view all projects the dataset is currently linked to via the Linked projects tab on the Dataset's detail page. Unlinking a dataset from a project can be completed at any time by clicking the Unlink dataset button within the project's Datasets tab.

If linking your dataset to projects does not fit your use case and you are working directly with a data scientist, please see Managing Pod Access. This guide outlines how to manage direct access to datasets outside of the context of a project via the Assigned roles tab.