Skip to main content

pii_redaction

Redact PII in query parameters and log messages.

For logging request details (e.g. FHIR/EHR query params) without exposing identifiers, names, or dates. Non-printable escaping: use safe_str_for_log in bitfount.data.datasources.utils.

Rules: Only alphanumeric chars are replaced with '*'; punctuation and whitespace are kept (length preserved). Default strategy is "partial"; use strategy="full" to mask all alphanumeric. Partial keeps up to 3 chars at each end, min 1 (except length less than or equal to 2 → 0), scaled so ≤25% visible from length 8 onward.

Param key → redaction type (for redact_value / redact_query_params):

  • name: name, given, family
  • identifier: identifier, _id, patient
  • date: birthdate
  • datetime: date, date__gt, date__lt, onset-date, performed-date
  • fallback: any other key in pii_keys, or any key not in pii_keys when pass_through_unknown_keys is False (default: unknown keys are redacted).

If a type-specific redactor leaves the string unchanged, fallback redaction is applied.

Examples (partial unless noted)

Name (1–2 chars: 0 kept; 3+ scaled by n)::

"X" → "" (n=1) "ab" → "**" (n=2) "Joe" → "Je" "John" → "Jn" "Smith" → "S*h" "O'Brien" → "O'n" (apostrophe kept) "Mary-Jane" → "M-e" (hyphen kept) "Christopher" → "C*****r"

Name (full)::

"John" → "" "O'Brien" → "'**"

Identifier (same partial rule; punctuation kept)::

"a" → "*" (n=1) "12" → "" "MRN-123" → "M-3" "MRN-12345678" → "M-8" "patient-42" → "p-2" "id_abc_123" → "i__3"

Date (year and month kept; day redacted; else fallback)::

"1990-01-15" → "1990-01-" "19900101" → "199001" "2005" → "**" (no month/day → fallback) "1990-01" → "1*-*1" (no day → fallback)

Datetime (year and month kept; day and time redacted)::

"2024-03-15T14:30:00" → "2024-03-T::" "2024-03-15" → "2024-03-**"

Fallback (unknown key; partial / full)::

"" → "" (empty kept as empty) "a" → "" (n=1) "ab" → "**" "xyz" → "xz" "unknown_key" → "u**_y" "xyz" (full) → "*"

Module

Functions

anonymise_patient_id

def anonymise_patient_id(patient_id: str)> str:

Anonymise a patient ID for logging by masking middle alphanumeric chars.

Use this when you need anonymised IDs to remain distinguishable in logs (e.g. during patient ID exchange) and when prefix/suffix (BOM, spaces) should stay visible for debugging. For general PII in request/error logs, use redact_identifier instead.

Difference from redact_identifier:

  • Strictness: This function keeps more characters visible (k = min(3, max(1, n//3))), so e.g. "aa12345" → "aa***45". redact_identifier uses k = max(1, min(3, n//8)), so the same string → "a*****5" (stricter; ≤25% visible from length 8 onward; lengths 3–7 show >25% due to floor). Use redact_identifier wherever minimal exposure is enough.
  • Prefix/suffix: This function preserves leading/trailing content that is not part of the "core" ID (BOM mojibake, spaces), so encoding and formatting issues remain visible in logs. redact_identifier redacts the whole string and does not special-case prefix/suffix.
  • When to use which: Use redact_identifier for query params, error messages, and any log line where you only need to show that an ID was present. Use this function when you need to tell multiple anonymised IDs apart (e.g. "aa45" vs "ab45") or when debugging ID format/encoding.

Preserves hyphens and underscores inside the core; only alphanumeric in the middle are replaced with *.

redact_date

def redact_date(value: Any, strategy: RedactionStrategy = 'partial')> str:

Redact date: keep year and month, redact day onward. Falls back if unchanged.

redact_datetime

def redact_datetime(value: Any, strategy: RedactionStrategy = 'partial')> str:

Redact datetime: keep year, month; redact day, time. Fallback if unchanged.

redact_fallback

def redact_fallback(value: Any, strategy: RedactionStrategy = 'partial')> str:

Redact unknown PII: partial keeps scaled ends, full masks all alphanumeric.

redact_identifier

def redact_identifier(value: Any, strategy: RedactionStrategy = 'partial')> str:

Redact identifier (MRN, patient id). Falls back if unchanged.

redact_name

def redact_name(value: Any, strategy: RedactionStrategy = 'partial')> str:

Redact name-like value. Falls back to fallback redaction if unchanged.

redact_query_params

def redact_query_params(    params: Mapping[str, Any] | None,    *,    pii_keys: frozenset[str] | None = None,    strategy: RedactionStrategy = 'partial',    pass_through_unknown_keys: bool = False,)> dict[str, typing.Any]:

Return a copy of params with PII values redacted.

Keys in pii_keys get type-specific redaction (name, identifier, date, etc.). Keys not in pii_keys are redacted with fallback by default; set pass_through_unknown_keys=True to leave them unchanged.

redact_value

def redact_value(key: str, value: Any, strategy: RedactionStrategy = 'partial')> str:

Redact a single value by key (name, identifier, date, datetime, or fallback).