Skip to main content

name_parser

Name parser for parsing human names into components.

This module provides functionality to parse western-style names (including Latin-style names common in the USA) into their component parts: title, first name, middle name(s), last name, suffix, and nickname.

Architecture:

The parser is organized into logical sections:

  1. Configuration Constants: Sets of known titles, suffixes, prefixes, etc.
  2. Utility Functions: Helper functions for classification (is_title, is_suffix, etc.)
  3. Name Component Extraction: Functions to extract titles, suffixes, nicknames
  4. Format-Specific Parsing:
    • Comma format: "Last [Suffix], Title First Middle [Suffix]"
    • Standard format: "Title First Middle Last Suffix"
  5. Public API: Main parse_name() function

Parsing Flow:

  1. Extract nickname from quotes/parentheses
  2. Normalize whitespace and clean input
  3. Detect format (comma-separated vs standard)
  4. Parse according to format:
    • Extract titles from start
    • Extract suffixes from end
    • Identify first, middle, and last names
    • Handle prefixes (e.g., "de la", "van der")
  5. Apply post-processing rules (e.g., handle_firstnames logic)

Module

Functions

parse_name

def parse_name(full_name: str)> ParsedName:

Parse a full name string into its components.

Supports formats like:

  • "John Smith"
  • "Smith, John"
  • "Dr. John M. Smith"
  • "Smith, John M., JR"
  • "John 'Johnny' Smith"
  • "De la Vega, Juan"
  • "McDonald, James"

The parsing process:

  1. Extract nickname from quotes/parentheses
  2. Normalize whitespace and clean input
  3. Detect format (comma-separated vs standard)
  4. Parse according to format
  5. Apply post-processing rules

Arguments

  • full_name: The full name string to parse

Returns ParsedName object with parsed components

Classes

ParsedName

class ParsedName(    title: str = '',    first: str = '',    middle: str = '',    last: str = '',    suffix: str = '',    nickname: str = '',):

Represents a parsed human name with its components.

This class holds the result of parsing a full name string into its constituent parts. All fields are strings and may be empty if that component was not found in the input.

Attributes

  • title: Title or honorific (e.g., "Dr.", "Mr.", "Prof.")
  • first: First name (given name)
  • middle: Middle name(s) or initial(s), space-separated if multiple
  • last: Last name (family name, may include prefixes like "de la", "van", "Mc")
  • suffix: Suffix (e.g., "JR", "III", "Ph.D."), comma-separated if multiple
  • nickname: Nickname extracted from quotes or parentheses

Methods


as_dict

def as_dict(self)> Dict[str, str]:

Return the parsed name as a dictionary.

All fields are normalized to empty strings if None, ensuring consistent output format for serialization and comparison.

Returns Dictionary with keys: title, first, middle, last, suffix, nickname