Skip to main content

pandas_utils

Utility functions for interacting with pandas.

Module

Functions

conditional_dataframe_yielder

def conditional_dataframe_yielder(    dfs: Iterable[pandas.core.frame.DataFrame],    condition: Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame],    reset_index: bool = True,)> Generator[pandas.core.frame.DataFrame, None, None]:

Create a generator that conditionally yields rows from a set of dataframes.

This replicates the standard .loc conditional indexing that can be used on a whole dataframe in a manner that can be applied to an iterable of dataframes such as is returned when chunking a CSV file.

Arguments

  • dfs: An iterable of dataframes to conditionally yield rows from.
  • condition: A callable that takes in a dataframe, applied a condition, and returns the edited/filtered dataframe.
  • reset_index: Whether the index of the yielded dataframes should be reset. If True, a standard integer index is used that is consistent between the yielded dataframes (e.g. if yielded dataframe 10 ends with index 42, yielded dataframe 11 will start with index 43).

dataframe_iterable_join

def dataframe_iterable_join(    joiners: Iterable[pandas.core.frame.DataFrame],    joinee: pandas.core.frame.DataFrame,    reset_joiners_index: bool = False,)> Generator[pandas.core.frame.DataFrame, None, None]:

Performs a dataframe join against a collection of dataframes.

This replicates the standard .join() method that can be used on a whole dataframe in a manner that can be applied to an iterable of dataframes such as is returned when chunking a CSV file.

This is equivalent to:

joiner.join(joinee)

Arguments

  • joiners: The collection of dataframes that should be joined against the joinee.
  • joinee: The single dataframe that the others should be joined against.
  • reset_joiners_index: Whether the index of the joiners dataframes should be reset as they are processed. If True, a standard integer index is used that is consistent between the yielded dataframes (e.g. if yielded dataframe 10 ends with index 42, yielded dataframe 11 will start with index 43).