pii_redaction
Redact PII in query parameters and log messages.
For logging request details (e.g. FHIR/EHR query params) without exposing
identifiers, names, or dates. Non-printable escaping: use
safe_str_for_log in bitfount.data.datasources.utils.
Rules: Only alphanumeric chars are replaced with '*'; punctuation and
whitespace are kept (length preserved). Default strategy is "partial";
use strategy="full" to mask all alphanumeric. Partial keeps up to 3
chars at each end, min 1 (except length less than or equal to 2 → 0), scaled so ≤25% visible
from length 8 onward.
Param key → redaction type (for redact_value / redact_query_params):
- name:
name,given,family - identifier:
identifier,_id,patient - date:
birthdate - datetime:
date,date__gt,date__lt,onset-date,performed-date - fallback: any other key in
pii_keys, or any key not inpii_keyswhenpass_through_unknown_keysis False (default: unknown keys are redacted).
If a type-specific redactor leaves the string unchanged, fallback redaction is applied.
Examples (partial unless noted)
Name (1–2 chars: 0 kept; 3+ scaled by n)::
"X" → "" (n=1) "ab" → "**" (n=2) "Joe" → "Je" "John" → "Jn" "Smith" → "S*h" "O'Brien" → "O'n" (apostrophe kept) "Mary-Jane" → "M-e" (hyphen kept) "Christopher" → "C*****r"
Name (full)::
"John" → "" "O'Brien" → "'**"
Identifier (same partial rule; punctuation kept)::
"a" → "*" (n=1) "12" → "" "MRN-123" → "M-3" "MRN-12345678" → "M-8" "patient-42" → "p-2" "id_abc_123" → "i__3"
Date (year and month kept; day redacted; else fallback)::
"1990-01-15" → "1990-01-" "19900101" → "199001" "2005" → "**" (no month/day → fallback) "1990-01" → "1*-*1" (no day → fallback)
Datetime (year and month kept; day and time redacted)::
"2024-03-15T14:30:00" → "2024-03-T::" "2024-03-15" → "2024-03-**"
Fallback (unknown key; partial / full)::
"" → "" (empty kept as empty) "a" → "" (n=1) "ab" → "**" "xyz" → "xz" "unknown_key" → "u**_y" "xyz" (full) → "*"
Module
Functions
anonymise_patient_id
def anonymise_patient_id(patient_id: str) ‑> str:Anonymise a patient ID for logging by masking middle alphanumeric chars.
Use this when you need anonymised IDs to remain distinguishable in logs
(e.g. during patient ID exchange) and when prefix/suffix (BOM, spaces)
should stay visible for debugging. For general PII in request/error logs,
use redact_identifier instead.
Difference from redact_identifier:
- Strictness: This function keeps more characters visible
(k = min(3, max(1, n//3))), so e.g. "aa12345" → "aa***45".
redact_identifieruses k = max(1, min(3, n//8)), so the same string → "a*****5" (stricter; ≤25% visible from length 8 onward; lengths 3–7 show >25% due to floor). Useredact_identifierwherever minimal exposure is enough. - Prefix/suffix: This function preserves leading/trailing content that is
not part of the "core" ID (BOM mojibake, spaces), so encoding and
formatting issues remain visible in logs.
redact_identifierredacts the whole string and does not special-case prefix/suffix. - When to use which: Use
redact_identifierfor query params, error messages, and any log line where you only need to show that an ID was present. Use this function when you need to tell multiple anonymised IDs apart (e.g. "aa45" vs "ab45") or when debugging ID format/encoding.
Preserves hyphens and underscores inside the core; only alphanumeric in the
middle are replaced with *.
redact_date
def redact_date(value: Any, strategy: RedactionStrategy = 'partial') ‑> str:Redact date: keep year and month, redact day onward. Falls back if unchanged.
redact_datetime
def redact_datetime(value: Any, strategy: RedactionStrategy = 'partial') ‑> str:Redact datetime: keep year, month; redact day, time. Fallback if unchanged.
redact_fallback
def redact_fallback(value: Any, strategy: RedactionStrategy = 'partial') ‑> str:Redact unknown PII: partial keeps scaled ends, full masks all alphanumeric.
redact_identifier
def redact_identifier(value: Any, strategy: RedactionStrategy = 'partial') ‑> str:Redact identifier (MRN, patient id). Falls back if unchanged.
redact_name
def redact_name(value: Any, strategy: RedactionStrategy = 'partial') ‑> str:Redact name-like value. Falls back to fallback redaction if unchanged.
redact_query_params
def redact_query_params( params: Mapping[str, Any] | None, *, pii_keys: frozenset[str] | None = None, strategy: RedactionStrategy = 'partial', pass_through_unknown_keys: bool = False,) ‑> dict[str, typing.Any]:Return a copy of params with PII values redacted.
Keys in pii_keys get type-specific redaction (name, identifier, date, etc.). Keys not in pii_keys are redacted with fallback by default; set pass_through_unknown_keys=True to leave them unchanged.
redact_value
def redact_value(key: str, value: Any, strategy: RedactionStrategy = 'partial') ‑> str:Redact a single value by key (name, identifier, date, datetime, or fallback).