Skip to main content

dataloaders

HuggingFace compatible dataloaders.

Classes

HuggingFaceBitfountDataLoader

class HuggingFaceBitfountDataLoader(    dataset: Union[_HuggingFaceDataset, _IterableHuggingFaceDataset],    batch_size: int = 1,    shuffle: bool = False,):

Wraps a PyTorch DataLoader with bitfount functions.

Arguments

  • batch_size: The batch size for the dataloader. Defaults to 1.
  • dataset: An pytorch compatible dataset.
  • shuffle: A boolean value indicating whether the values in the dataset should be shuffled. Defaults to False.

Attributes

  • batch_size: The batch size for the dataloader. Defaults to 1.
  • shuffle: A boolean value indicating whether the values in the dataset should be shuffled. Defaults to False.

Ancestors

  • bitfount.data.huggingface.dataloaders._BaseHuggingFaceBitfountDataLoader
  • BitfountDataLoader

Methods


get_pytorch_dataloader

def get_pytorch_dataloader(self, **kwargs: Any)> torch.utils.data.dataloader.DataLoader:

Return a PyTorch DataLoader for self.dataset.

Keyword arguments are passed to PyTorch's DataLoader constructor and take precedence over the values set in the constructor.

get_x_dataframe

def get_x_dataframe(    self,)> Union[pandas.core.frame.DataFrame, Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]]:

Inherited from:

BitfountDataLoader.get_x_dataframe :

Gets the x-dataframe of the data i.e. features.

For models incompatible with the iter approach.

get_y_dataframe

def get_y_dataframe(self)> pandas.core.frame.DataFrame:

Inherited from:

BitfountDataLoader.get_y_dataframe :

Gets the y-dataframe of the data i.e. target.

For models incompatible with the iter approach.

HuggingFaceIterableBitfountDataLoader

class HuggingFaceIterableBitfountDataLoader(    dataset: _IterableBitfountDataset,    batch_size: int = 1,    shuffle: bool = False,    secure_rng: bool = False,):

Wraps a PyTorch DataLoader with bitfount functions.

Arguments

  • batch_size: The batch size for the dataloader. Defaults to None.
  • dataset: An HuggingFace compatible dataset.

Variables

  • static dataset : bitfount.data.huggingface.datasets._IterableHuggingFaceDataset
  • buffer_size : int - Number of elements to buffer.

    The size of the buffer is the greater of the batch size and default buffer size unless the dataset is smaller than the default buffer in which case the dataset size is used. PyTorch already ensures that the batch size is not greater than the dataset size under the hood.

Static methods


convert_input_target

def convert_input_target(    batch: _DataBatchAllowingText,)> List[Union[torch.Tensor, numpy.ndarray, Sequence[Union[torch.Tensor, numpy.ndarray]]]]:

Convert the input and target to match the hugging face expected inputs_.

Methods


get_x_dataframe

def get_x_dataframe(    self,)> Union[pandas.core.frame.DataFrame, Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]]:

Inherited from:

BitfountDataLoader.get_x_dataframe :

Gets the x-dataframe of the data i.e. features.

For models incompatible with the iter approach.

get_y_dataframe

def get_y_dataframe(self)> pandas.core.frame.DataFrame:

Inherited from:

BitfountDataLoader.get_y_dataframe :

Gets the y-dataframe of the data i.e. target.

For models incompatible with the iter approach.