Build Redaction Instruction Set
Skill: Convert PII findings + court rules into a per-document redaction instruction set
Region: United States Category: Legal / eDiscovery Does: Takes PII/PHI detection results plus the applicable court redaction rules and produces a structured redaction instruction set — per-document, per-location instructions (coordinates or text patterns, redaction basis, and code) importable into Relativity Redact, Nuix, Everlaw, or DISCO for automated application. Authority: FRCP 5.2 · FRAP 25(a)(5) · local-court ESI/protective orders · Relativity Redact (and equivalent) import schema
This produces instructions, applied by a redaction tool and then QC'd by a human before production. Redaction must burn in (flatten) — never rely on a visual overlay that leaves the text layer intact, the classic metadata-leak failure. Confirm the controlling court's redaction rules and the protective order before generating.
When this applies
- Turning a PII/PHI detection report into actionable redactions ahead of production.
- Applying FRCP 5.2 personal-identifier redactions (SSN → last 4, financial account → last 4, DOB → year, minor → initials) and any case-specific categories (trade secrets, privileged passages, third-party confidential data).
FRCP 5.2 default redaction rules
| Identifier | Permitted form |
|---|---|
| Social Security / taxpayer ID | last 4 digits only |
| Financial-account number | last 4 digits only |
| Date of birth | year only |
| Name of a minor | initials only |
| (Home address, in criminal cases) | city and state only |
Protective orders and local rules may add categories (medical, trade secret, source code) and stricter forms.
Output structure (JSON/XML per redaction)
{
"document_id": "DOC000123",
"bates": "ABC000451",
"page": 1,
"target": {
"mode": "text_pattern", // or "region_coordinates"
"text_pattern": "\\b\\d{3}-\\d{2}-(\\d{4})\\b",
"keep_groups": [1], // keep last 4 (FRCP 5.2 partial)
"region": null // {x, y, width, height} when mode=region_coordinates
},
"redaction_type": "partial", // full | partial
"redaction_basis": "FRCP 5.2 - SSN",
"redaction_code": "PII-SSN",
"burn_in": true,
"scrub_text_layer": true,
"scrub_metadata_fields": ["from","to","subject","ocr_text"]
}
Build rules
- Map each detection to a rule: detection
entity_type+jurisdiction_flags→ the redaction form (full vs partial, keep-last-4, year-only). FRCP 5.2 types are usually partial; trade-secret/privilege passages are full. - Two targeting modes:
text_pattern(preferred for born-digital text — robust to pagination) andregion_coordinates(for images/scanned pages where text isn't reliably positioned). OCR scanned docs first so patterns can match. - Burn-in + scrub: every instruction must set
burn_in=true(flatten the mark) andscrub_text_layer=true, and list metadata fields to scrub — redacting the image but leaving the extracted text or load-file field is the most common production leak. - Carry a
redaction_basisandredaction_codeper redaction — these populate the production's redaction log and let opposing counsel see the justification category (not the content). - Family/consistency: apply the same redaction to all near-duplicates and family copies of a document so the same SSN isn't redacted on one copy and produced on another.
- Always QC a sampled set after the tool applies the instructions; verify no underlying text/metadata survives.
Worked example (one document, two redactions)
DOC000123 / ABC000451:
R1 page 1 text_pattern SSN \d{3}-\d{2}-(\d{4}) → partial, keep last 4
basis "FRCP 5.2 - SSN" code PII-SSN burn_in+scrub
R2 page 2 region_coordinates {x:120,y:300,w:240,h:18} (scanned acct no.)
→ full, basis "FRCP 5.2 - financial account" code PII-ACCT burn_in+scrub
Apply identically to DOC000123's 2 near-duplicates (DOC000130, DOC000401).
Exported as a Relativity Redact / Nuix / Everlaw import set; tool applies, then human QC before the document joins the production volume.
Validation checklist
- Controlling court redaction rules + protective order confirmed; FRCP 5.2 partial forms applied (last-4 / year / initials)
- Each redaction has a target (
text_patternorregion_coordinates), type (full/partial), basis, and code - Scanned/image docs OCR'd before pattern targeting; coordinate mode used where text isn't reliable
-
burn_in=trueandscrub_text_layer=trueon every redaction; metadata fields to scrub listed - Same redactions propagated to near-duplicates and family members
- Output format matches the target tool's import schema (Relativity Redact/Nuix/Everlaw/DISCO)
- Redaction log (basis/code per doc) generated for the production
- Post-application QC: sampled docs verified — no underlying text or metadata survives
Last updated: 2026-05-31 — instructions are applied by a redaction tool and require human QC; confirm permitted redaction forms and metadata handling against current FRCP 5.2, the case protective order, and the target platform's redaction-import schema before production.