Skip to main content

Using Custom Data Sources

When connecting data and running Pods, you may wish to enable a custom DataSource plugin that is not supported by Bitfount by default. We support a built-in plugin manager which allows you to do so by simply dropping files in the appropriate subdirectory within $HOME/.bitfount/_plugins (e.g. _plugins/datasources/ for datasource plugins) within the Bitfount-connected device or server.

This extensibility allows you to easily build on top of Bitfount according to your own needs. It also allows you to keep your custom components private if desired. However, if you would like to share their plugins with other users with whom you would like to collaborate, you can do so by copying and pasting the files into the appropriate directory.

You can also leverage other libraries within your plugins by simply installing the relevant libraries in your virtual environment and importing them in your plugin modules.

DataSource plugins

To ensure that the datasource plugins are compatible with Bitfount, they must inherit from BaseSource. An example DataSource plugin module called excel_source.py is shown below which extends BaseSource to be able to read Excel files:

import loggingimport osfrom typing import Any, Dict, Iterable, List, Optional, Unionimport numpy as npimport pandas as pdfrom pydantic import AnyUrlfrom bitfount.data.datasources.base_source import BaseSourcefrom bitfount.types import _Dtypeslogger = logging.getLogger(__name__)class MyExcelSource(BaseSource):    """Data source for loading excel files.    Args:        path: The path to the excel file.        **read_excel_kwargs: Additional arguments to be passed to `pandas.read_excel`.    """    def __init__(        self,        path: Union[os.PathLike, AnyUrl, str],        read_excel_kwargs: Optional[Dict[str, Any]] = None,        **kwargs: Any,    ):        if not str(path).endswith((".xls", ".xlsx")):            raise TypeError("Please provide a Path or URL to an Excel file.")        self.path = str(path)        if not read_excel_kwargs:            read_excel_kwargs = {}        self.read_excel_kwargs = read_excel_kwargs    def get_data(self, **kwargs: Any) -> pd.DataFrame:        """Loads and returns data from Excel dataset.        Returns:            A DataFrame-type object which contains the data.        """        df: pd.DataFrame = pd.read_excel(self.path, **self.read_excel_kwargs)        return df    def get_values(        self, col_names: List[str], **kwargs: Any    ) -> Dict[str, Iterable[Any]]:        """Get distinct values from columns in Excel dataset.        Args:            col_names: The list of the columns whose distinct values should be                returned.        Returns:            The distinct values of the requested column as a mapping from col name to            a series of distinct values.        """        return {col: self.get_data()[col].unique() for col in col_names}    def get_column(self, col_name: str, **kwargs: Any) -> Union[np.ndarray, pd.Series]:        """Loads and returns single column from Excel dataset.        Args:            col_name: The name of the column which should be loaded.        Returns:            The column request as a series.        """        df: pd.DataFrame = self.get_data()        return df[col_name]    def get_dtypes(self, **kwargs: Any) -> _Dtypes:        """Loads and returns the columns and column types of the Excel dataset.        Returns:            A mapping from column names to column types.        """        df: pd.DataFrame = self.get_data()        return self._get_data_dtypes(df)    def __len__(self) -> int:        return len(self.get_data())    @property    def multi_table(self) -> bool:        """Attribute to specify whether the datasource is multi table."""        return False

API Example

Once the excel_source.py is placed inside $HOME/.bitfount/_plugins/datasources/, it and its contents are automatically readable and accessible from bitfount as if it were any other module. For example:

from bitfount import MyExcelSource, Podpod = Pod(    name="my-excel-datasource",    datasource=MyExcelSource("/path/to/my/excel/file.xlsx"),)pod.start()

YAML Example

Alternatively, the plugin can be referenced in the pod yaml config file as follows:

name: my-excel-datasourcedatasource: MyExcelSourcedata_config:  datasource_args:    path: /path/to/my/excel/file.xlsx

Docker example

Datasource plugins can also be used with the bitfount pod docker container. All that's needed is to mount your existing plugins from your machine onto the container before they can be referenced in the YAML config file. The location inside the container to mount the plugins to is: /root/.bitfount/_plugins, so your docker run command might look like this:

docker run -d -v /path/to/config/directory:/mount/config -v $HOME/.bitfount/_plugins:/root/.bitfount/_plugins ghcr.io/bitfount/pod:stable

If you wish to make use of extra libraries within your plugins, you will have to create your own docker image based off our image and add the dependencies in there.