PII Detection
Rime includes an automatic PII detection scanner that analyses column data against a library of patterns to identify personally identifiable information. Detected PII is flagged for human review — the scanner never auto-classifies columns or changes masking policies on its own.
PII detection works alongside the masked-by-default model. Because all columns start masked, detected PII does not represent an immediate exposure risk. The scanner’s role is to accelerate classification by surfacing columns that likely contain personal data so governance administrators can prioritise their review.
How scanning works
When a scan runs, Rime samples rows from each unclassified column and evaluates the sample data against its pattern library. The process:
- Column selection — the scanner identifies columns that have not been classified or that have been modified since their last scan
- Row sampling — Rime reads a sample of rows from each column (typically 100-1000 rows, depending on table size). Sampling uses the Rime service account’s Snowflake role, which has access to unmasked data
- Pattern matching — each sampled value is tested against the full pattern library. Multiple patterns can match a single column
- Confidence scoring — based on the match rate across sampled rows, the scanner assigns a confidence level
- Flagging — columns that exceed the minimum confidence threshold are flagged for review in the data classification column browser
Scans run automatically when new data sources are connected and when schema changes are detected. You can also trigger a manual scan at any time.
Detection patterns
The pattern library includes both general patterns and New Zealand / Australia-specific patterns:
General patterns
| Pattern | Description | Example matches |
|---|---|---|
| Email address | Standard email format matching | [email protected], [email protected] |
| Phone (international) | International format with country code | +64 21 123 4567, +61 400 123 456 |
| Phone (NZ local) | New Zealand local formats | 021 123 4567, 09 123 4567, (09) 123-4567 |
| Physical address | Street address patterns with number, street type, and locality | 123 Queen Street, Auckland |
| Date of birth | Date patterns in common formats, filtered by plausible age range | 1985-03-15, 15/03/1985 |
| Passport number | Alphanumeric patterns matching passport formats | LN123456, AB1234567 |
New Zealand-specific patterns
| Pattern | Description | Validation |
|---|---|---|
| IRD number | Inland Revenue Department number (8-9 digits) | Mod-11 check digit validation. Only values that pass the check digit algorithm are flagged, significantly reducing false positives |
| NHI number | National Health Index number | Format: 3 alpha characters followed by 4 digits (e.g., ABC1234). Alphabetic prefix is validated against known NHI character ranges |
| NZ bank account | Bank account number | Format: BB-bbbb-AAAAAAA-SSS (bank, branch, account, suffix). Bank and branch codes are validated against known ranges |
| NZ driver licence | Driver licence number | Alphanumeric format matching NZTA licence patterns |
Australian patterns
| Pattern | Description | Validation |
|---|---|---|
| Phone (AU) | Australian phone formats | +61 prefix, valid area codes |
| TFN | Tax File Number (9 digits) | Check digit validation |
Confidence levels
Each flagged column receives a confidence level based on what percentage of sampled rows matched the pattern:
| Confidence | Match rate | Meaning |
|---|---|---|
| High | 70% or more of sampled rows match | Very likely contains this PII type. The pattern consistently matches across the sample |
| Medium | 30-69% of sampled rows match | Probably contains this PII type, but some rows do not match, possibly due to mixed data, nulls, or formatting variations |
| Low | 10-29% of sampled rows match | Possibly contains this PII type. The match rate is low enough that the column may contain coincidental matches rather than actual PII |
Columns with a match rate below 10% are not flagged. This threshold reduces noise from columns where a few values happen to match a pattern (for example, a product code column where some codes coincidentally look like phone numbers).
Review workflow
Flagged columns appear in the data classification column browser with a PII detection indicator showing the detected PII type and confidence level.
To review flagged columns:
- Navigate to Governance > Classifications and filter by “Flagged by PII detection”
- Select a flagged column to see the detection details: which pattern matched, the confidence level, and sample matched values
- Decide whether the detection is accurate:
- If accurate, classify the column with the appropriate privacy level and PII type. Select Accept Detection to pre-fill the classification from the detection result
- If inaccurate, select Dismiss to mark the detection as a false positive. The column will not be re-flagged for the same pattern unless you re-scan manually
- Repeat for remaining flagged columns
You can also bulk-review: select multiple flagged columns, accept all detections, or dismiss all as false positives.
Re-scanning
Automatic scans run in these situations:
- New connector — when a new data source is connected and its first extraction completes
- Schema change — when Rime detects that a table has new or renamed columns
- Scheduled — a periodic scan runs daily (configurable) to catch changes that may not trigger event-based scans
To trigger a manual scan:
- Navigate to Governance > PII Detection
- Select Run Scan
- Choose the scope: full account, specific database, schema, or table
- Select Start
Manual scans are useful after bulk data loads, migrations, or when you want to re-evaluate columns that were previously dismissed as false positives.
False positive handling
False positives are inevitable in pattern-based detection. Rime provides several mechanisms to manage them:
- Dismiss — mark a specific detection on a specific column as a false positive. The column will not be re-flagged for the same pattern
- Exclude column — permanently exclude a column from PII scanning. Useful for columns that consistently trigger false matches (e.g., product codes that resemble phone numbers)
- Exclude table — exclude an entire table from scanning. Useful for reference tables or lookup tables that contain no customer data
- Adjust threshold — raise the minimum confidence level required for flagging. Setting it to “High only” eliminates most false positives at the cost of potentially missing some true positives
Dismissed false positives are logged in the audit log for compliance purposes.
Tier availability
PII detection is available at Business tier and above. Free/Trial and Small Business tiers do not include automatic PII scanning. See Masked by Default for full tier details.
Next steps
- Review the Data Classification workflow for classifying flagged columns
- Configure Masking Policies for detected PII types
- Check the Compliance Reporting dashboard for classification coverage metrics