dataframe_generation_extensions
Additional functionality for DataFrame processing.
Provides functions that can be used for additional column generation.
Module
Functions
extract_hypertransmission
def extract_hypertransmission(df: pd.DataFrame) ‑> pandas.core.frame.DataFrame:
Extension function for extracting hypertransmission area.
extract_is_os_disruption
def extract_is_os_disruption(df: pd.DataFrame) ‑> pandas.core.frame.DataFrame:
Extension function for extracting IS/OS disruption area.
extract_json_value
def extract_json_value( df: pd.DataFrame, json_column: str, key: str, new_column_name: str,) ‑> pandas.core.frame.DataFrame:
Extracts a specific value from a JSON string or dictionary column.
Arguments
df
: The DataFrame to process.json_column
: The name of the column containing JSON strings or dictionaries.key
: The key to extract from the JSON or dictionary.new_column_name
: The name for the new column containing the extracted values.
Returns The DataFrame with the new column added.
extract_neurosensory_retina_atrophy
def extract_neurosensory_retina_atrophy(df: pd.DataFrame) ‑> pandas.core.frame.DataFrame:
Extension function for extracting neurosensory retina atrophy area.
extract_rpe_atrophy
def extract_rpe_atrophy(df: pd.DataFrame) ‑> pandas.core.frame.DataFrame:
Extension function for extracting RPE atrophy area.
extract_rpe_disruption
def extract_rpe_disruption(df: pd.DataFrame) ‑> pandas.core.frame.DataFrame:
Extension function for extracting RPE disruption area.
generate_bitfount_patient_id
def generate_bitfount_patient_id( df: pd.DataFrame, name_col: str = "Patient's Name", dob_col: str = "Patient's Birth Date",) ‑> pandas.core.frame.DataFrame:
Adds a BitfountPatientID column to the provided DataFrame.
This mutates the input dataframe with the new column.
The generated IDs are the hash of the concatenated string of a Bitfount-specific key, full name, and date of birth.
generate_subfoveal_indicator
def generate_subfoveal_indicator( df: pd.DataFrame, distance_from_fovea_col: str = 'distance_from_fovea_centre', max_distance: float = 0.1,) ‑> pandas.core.frame.DataFrame:
Adds a 'Subfoveal?' column to the provided DataFrame.
This mutates the input dataframe with the new column.
The column will contain 'Y' if the distance from fovea is less than the specified maximum distance, 'N' if it's greater, and 'Fovea not detected' if the distance value is not available.
Arguments
df
: The DataFrame to add the column to.distance_from_fovea_col
: The name of the column containing the distance from fovea. Defaults to DISTANCE_FROM_FOVEA_CENTRE_COL.max_distance
: The maximum distance to consider as subfoveal. Defaults to 0.0.
Returns The modified DataFrame with the new column.
Raises
DataFrameExtensionError
: If the distance from fovea column is not available in the DataFrame.
generate_subfoveal_indicator_extension
def generate_subfoveal_indicator_extension( df: pd.DataFrame,) ‑> pandas.core.frame.DataFrame:
Extension function for generating the subfoveal indicator column.
Note that this is a wrapper function since extensions do not support parameters yet. Once they do, we can remove this wrapper function.
id_safe_string
def id_safe_string(s: str) ‑> str:
Converts a string to a normalised version safe for use in IDs.
In particular, converts accented/diacritic characters to their closest ASCII representation, ensures lowercase, and replaces any non-word characters with underscores.
This allows us to map potentially different spellings (e.g. Francois John-Smith vs François John Smith) to the same string (francois_john_smith).
safe_format_date
def safe_format_date(value: Any) ‑> Any:
Safely format a date string.
Arguments
value
: The input value, which can be a date string, integer, or NaN.
Returns Formatted date string or the original value as a string if formatting fails.
Classes
DataFrameExtensionError
class DataFrameExtensionError(*args, **kwargs):
Indicates an error whilst trying to apply an extension function.
Ancestors
- BitfountError
- builtins.Exception
- builtins.BaseException
DataFrameExtensionFunction
class DataFrameExtensionFunction(*args, **kwargs):
Callback protocol for DataFrame extension functions.