Skip to main content


Utility functions for interacting with pandas.




def conditional_dataframe_yielder(    dfs: Iterable[pandas.core.frame.DataFrame],    condition: Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame],    reset_index: bool = True,)> Generator[pandas.core.frame.DataFrame, None, None]:

Create a generator that conditionally yields rows from a set of dataframes.

This replicates the standard .loc conditional indexing that can be used on a whole dataframe in a manner that can be applied to an iterable of dataframes such as is returned when chunking a CSV file.


  • dfs: An iterable of dataframes to conditionally yield rows from.
  • condition: A callable that takes in a dataframe, applied a condition, and returns the edited/filtered dataframe.
  • reset_index: Whether the index of the yielded dataframes should be reset. If True, a standard integer index is used that is consistent between the yielded dataframes (e.g. if yielded dataframe 10 ends with index 42, yielded dataframe 11 will start with index 43).


def dataframe_iterable_join(    joiners: Iterable[pandas.core.frame.DataFrame],    joinee: pandas.core.frame.DataFrame,    reset_joiners_index: bool = False,)> Generator[pandas.core.frame.DataFrame, None, None]:

Performs a dataframe join against a collection of dataframes.

This replicates the standard .join() method that can be used on a whole dataframe in a manner that can be applied to an iterable of dataframes such as is returned when chunking a CSV file.

This is equivalent to:



  • joiners: The collection of dataframes that should be joined against the joinee.
  • joinee: The single dataframe that the others should be joined against.
  • reset_joiners_index: Whether the index of the joiners dataframes should be reset as they are processed. If True, a standard integer index is used that is consistent between the yielded dataframes (e.g. if yielded dataframe 10 ends with index 42, yielded dataframe 11 will start with index 43).