May 27, 2026

HIPAA-Aware Workflows for Redacting Medical Records

HIPAA-Aware Workflows for Redacting Medical Records

Clinical teams share medical records every day — with researchers, second-opinion specialists, insurance reviewers, quality auditors, residents and students. Every one of those flows involves Protected Health Information (PHI) under HIPAA, and every one has a different appropriate level of de-identification.

This guide gives you a practical, HIPAA-aware workflow for redacting medical records at scale.

The 18 HIPAA Safe Harbor identifiers (quick reference)

The HIPAA Safe Harbor method requires removing 18 specific categories of identifiers to consider PHI de-identified:

  1. Names
  2. All geographic subdivisions smaller than a state (street, city, county, ZIP)
  3. All elements of dates (except year) related to an individual
  4. Telephone numbers
  5. Fax numbers
  6. Email addresses
  7. Social Security numbers
  8. Medical record numbers
  9. Health plan beneficiary numbers
  10. Account numbers
  11. Certificate / license numbers
  12. Vehicle identifiers (VIN, license plates)
  13. Device identifiers (serial numbers)
  14. URLs
  15. IP addresses
  16. Biometric identifiers
  17. Full-face photographs
  18. Any other unique identifying number, characteristic, or code

Removing these 18 takes a record from "PHI" to "de-identified data" under HIPAA — meaning it can be used for research, teaching, or analytics without the same constraints.

Three common workflows

Workflow A: Research dataset preparation

You need a corpus of clinical notes for NLP training or a retrospective study.

Standard: Safe Harbor de-identification (all 18 categories).

Approach: Bulk processing. Use AI to detect names, dates, addresses, phone, email, organization in a single pass. For institution-specific identifiers (MRN, accession numbers), add their format to an "Always Redact" terms list so they're caught deterministically across the entire corpus.

Quality check: Audit a stratified sample (10–20 records) manually to estimate residual PHI rate. For research-grade quality, aim for <1% residual.

Workflow B: Second-opinion consult

A specialist outside your institution needs a single patient's chart.

Standard: Minimum-necessary rule — only the data needed for the consult.

Approach: Studio review per case. Decide what the consulting specialist actually needs (e.g., imaging + diagnosis but not full demographics) and redact the rest. Takes 2–5 minutes per case.

Workflow C: Teaching case for residents

You're presenting an interesting case at morning rounds or a tumor board.

Standard: De-identified per Safe Harbor.

Approach: Bulk-process all cases for the conference at once. Studio audit one or two for quality. Faces in any embedded photos need manual masks (AI doesn't catch images).

What modern AI catches (and doesn't)

A tool like Redact PDF AI detects roughly Safe Harbor categories 1, 2, 3, 4, 6 (names, addresses, dates, phone, email) and most cases of category 8 (organization names, which include hospital names). It uses OCR to handle scanned charts.

What it doesn't catch automatically:

  • MRN and account numbers (#8, #9, #10) — institution-specific formats. Add to "Always Redact" terms.
  • License plate / VIN / device serial (#12, #13) — rare in clinical text; manual masking if present.
  • Biometric identifiers and photos (#16, #17) — images, not text. Manual masks in the Studio.
  • Implicit identifiers (e.g., "the only liver transplant in Geneva in 2024") — require human judgment.

The combination of AI detection + Always-Redact terms + Studio review covers the 18 categories with practical efficiency.

Infrastructure considerations

For HIPAA compliance, the underlying infrastructure matters as much as the workflow:

  • Business Associate Agreement (BAA) with the platform vendor
  • Encryption at rest (AES-256) and in transit (TLS 1.2+)
  • Audit logging of access to PHI
  • Retention controls — delete originals after processing if not needed

Redact PDF AI runs on Microsoft Azure infrastructure that is HIPAA-eligible under Microsoft's BAA. Redact PDF AI itself is not independently HIPAA-audited; full HIPAA compliance requires a BAA arrangement, your own internal controls, and a compliant overall workflow. Contact us to discuss BAA arrangements for healthcare customers.

A practical 4-step team workflow

  1. Stage: Drop the records into a single batch in your team's shared workspace.
  2. Run: Click "Analyze with AI" using your team's saved PII categories.
  3. Audit: Have a designated reviewer spot-check ~5% of records via the Studio Highlight mode.
  4. Release: Download the ZIP and deliver to the requester. Delete the originals from the workspace if no longer needed.

For a 100-record batch, the whole workflow typically takes 15–30 minutes from upload to release.

Get started

Read our medical records redaction guide or try the free demo on redact-pdf.ai.