dataframe_source

Module containing DataFrameSource class.

DataFrameSource class handles loading data stored in memory in a pandas dataframe.

Classes

DataFrameSource

class DataFrameSource(    data: pd.DataFrame,    data_splitter: Optional[DatasetSplitter] = None,    seed: Optional[int] = None,    modifiers: Optional[Dict[str, DataPathModifiers]] = None,    ignore_cols: Optional[Union[str, Sequence[str]]] = None,):

Data source for loading dataframes.

Arguments

data: The dataframe to be loaded.
data_splitter: Approach used for splitting the data into training, test, validation. Defaults to None.
ignore_cols: Column/list of columns to be ignored from the data. Defaults to None.
modifiers: Dictionary used for modifying paths/ extensions in the dataframe. Defaults to None.
seed: Random number seed. Used for setting random seed for all libraries. Defaults to None.

Attributes

data_splitter: Approach used for splitting the data into training, test, validation.
seed: Random number seed. Used for setting random seed for all libraries.

Ancestors

Variables

data : pandas.core.frame.DataFrame - A property containing the underlying dataframe if the data has been loaded.

Raises: DataNotLoadedError: If the data has not been loaded yet.

hash : str - The hash associated with this BaseSource.

This is the hash of the static information regarding the underlying DataFrame, primarily column names and content types but NOT anything content-related itself. It should be consistent across invocations, even if additional data is added, as long as the DataFrame is still compatible in its format.

Returns: The hexdigest of the DataFrame hash.

is_initialised : bool - Checks if BaseSource was initialised.

is_task_running : bool - Returns True if a task is running.

iterable : bool - This returns False if the DataSource does not subclass IterableSource.

However, this property must be re-implemented in IterableSource, therefore it is not necessarily True if the DataSource inherits from IterableSource.

multi_table : bool - This returns False if the DataSource does not subclass MultiTableSource.

However, this property must be re-implemented in MultiTableSource, therefore it is not necessarily True if the DataSource inherits from MultiTableSource.

Methods

get_column

def get_column(    self: BaseSource, col_name: str, *args: Any, **kwargs: Any,) ‑> Union[numpy.ndarray, pandas.core.series.Series]:

Inherited from:

BaseSource.get_column :

Get a single column from dataset.

Used in the ColumnAverage algorithm as well as to iterate over image columns for the purposes of schema generation.

get_column_names

def get_column_names(self, **kwargs: Any) ‑> Iterable[str]:

Inherited from:

BaseSource.get_column_names :

Get the column names as an iterable.

get_data

def get_data(self, **kwargs: Any) ‑> pandas.core.frame.DataFrame:

Loads and returns datafrom DataFrame dataset.

Returns A DataFrame-type object which contains the data.

get_dtypes

def get_dtypes(self: BaseSource, *args: Any, **kwargs: Any) ‑> _Dtypes:

Inherited from:

BaseSource.get_dtypes :

Implement this method to get the columns and column types from dataset.

get_values

def get_values(self, col_names: List[str], **kwargs: Any) ‑> Dict[str, Iterable[Any]]:

Get distinct values from columns in DataFrame dataset.

Arguments

col_names: The list of the columns whose distinct values should be returned.

Returns The distinct values of the requested column as a mapping from col name to a series of distinct values.

load_data

def load_data(self, **kwargs: Any) ‑> None:

Inherited from:

BaseSource.load_data :

Load the data for the datasource.

Raises

TypeError: If data format is not supported.

dataframe_source

Classes​

DataFrameSource​

Ancestors​

Variables​

Methods​

get_column​

get_column_names​

get_data​

get_dtypes​

get_values​

load_data​

Classes

DataFrameSource

Ancestors

Variables

Methods

get_column

get_column_names

get_data

get_dtypes

get_values

load_data