dataloaders
HuggingFace compatible dataloaders.
Classes
HuggingFaceBitfountDataLoader
class HuggingFaceBitfountDataLoader( dataset: Union[_HuggingFaceDataset, _IterableHuggingFaceDataset], batch_size: int = 1, shuffle: bool = False,):
Wraps a PyTorch DataLoader with bitfount functions.
Arguments
batch_size
: The batch size for the dataloader. Defaults to 1.dataset
: An pytorch compatible dataset.shuffle
: A boolean value indicating whether the values in the dataset should be shuffled. Defaults to False.
Attributes
batch_size
: The batch size for the dataloader. Defaults to 1.shuffle
: A boolean value indicating whether the values in the dataset should be shuffled. Defaults to False.
Ancestors
- bitfount.data.huggingface.dataloaders._BaseHuggingFaceBitfountDataLoader
- BitfountDataLoader
Methods
get_pytorch_dataloader
def get_pytorch_dataloader(self, **kwargs: Any) ‑> torch.utils.data.dataloader.DataLoader:
Return a PyTorch DataLoader for self.dataset
.
Keyword arguments are passed to PyTorch's DataLoader constructor and take precedence over the values set in the constructor.
get_x_dataframe
def get_x_dataframe( self,) ‑> Union[pandas.core.frame.DataFrame, Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]]:
Inherited from:
BitfountDataLoader.get_x_dataframe :
Gets the x-dataframe of the data i.e. features.
For models incompatible with the iter approach.
get_y_dataframe
def get_y_dataframe(self) ‑> pandas.core.frame.DataFrame:
Inherited from:
BitfountDataLoader.get_y_dataframe :
Gets the y-dataframe of the data i.e. target.
For models incompatible with the iter approach.
HuggingFaceIterableBitfountDataLoader
class HuggingFaceIterableBitfountDataLoader( dataset: _IterableBitfountDataset, batch_size: int = 1, shuffle: bool = False, secure_rng: bool = False,):
Wraps a PyTorch DataLoader with bitfount functions.
Arguments
batch_size
: The batch size for the dataloader. Defaults to None.dataset
: An HuggingFace compatible dataset.
Ancestors
Variables
- static
dataset : bitfount.data.huggingface.datasets._IterableHuggingFaceDataset
-
buffer_size : int
- Number of elements to buffer.The size of the buffer is the greater of the batch size and default buffer size unless the dataset is smaller than the default buffer in which case the dataset size is used. PyTorch already ensures that the batch size is not greater than the dataset size under the hood.
Static methods
convert_input_target
def convert_input_target( batch: _DataBatchAllowingText,) ‑> List[Union[torch.Tensor, numpy.ndarray, Sequence[Union[torch.Tensor, numpy.ndarray]]]]:
Convert the input and target to match the hugging face expected inputs_.
Methods
get_x_dataframe
def get_x_dataframe( self,) ‑> Union[pandas.core.frame.DataFrame, Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]]:
Inherited from:
BitfountDataLoader.get_x_dataframe :
Gets the x-dataframe of the data i.e. features.
For models incompatible with the iter approach.
get_y_dataframe
def get_y_dataframe(self) ‑> pandas.core.frame.DataFrame:
Inherited from:
BitfountDataLoader.get_y_dataframe :
Gets the y-dataframe of the data i.e. target.
For models incompatible with the iter approach.