PII Detection

Rime includes an automatic PII detection scanner that analyses column data against a library of patterns to identify personally identifiable information. Detected PII is flagged for human review — the scanner never auto-classifies columns or changes masking policies on its own.

PII detection works alongside the masked-by-default model. Because all columns start masked, detected PII does not represent an immediate exposure risk. The scanner’s role is to accelerate classification by surfacing columns that likely contain personal data so governance administrators can prioritise their review.

How scanning works

When a scan runs, Rime samples rows from each unclassified column and evaluates the sample data against its pattern library. The process:

Column selection — the scanner identifies columns that have not been classified or that have been modified since their last scan
Row sampling — Rime reads a sample of rows from each column (typically 100-1000 rows, depending on table size). Sampling uses the Rime service account’s Snowflake role, which has access to unmasked data
Pattern matching — each sampled value is tested against the full pattern library. Multiple patterns can match a single column
Confidence scoring — based on the match rate across sampled rows, the scanner assigns a confidence level
Flagging — columns that exceed the minimum confidence threshold are flagged for review in the data classification column browser

Scans run automatically when new data sources are connected and when schema changes are detected. You can also trigger a manual scan at any time.

Detection patterns

The pattern library includes both general patterns and New Zealand / Australia-specific patterns:

General patterns

Pattern	Description	Example matches
Email address	Standard email format matching	`[email protected]`, `[email protected]`
Phone (international)	International format with country code	`+64 21 123 4567`, `+61 400 123 456`
Phone (NZ local)	New Zealand local formats	`021 123 4567`, `09 123 4567`, `(09) 123-4567`
Physical address	Street address patterns with number, street type, and locality	`123 Queen Street, Auckland`
Date of birth	Date patterns in common formats, filtered by plausible age range	`1985-03-15`, `15/03/1985`
Passport number	Alphanumeric patterns matching passport formats	`LN123456`, `AB1234567`

New Zealand-specific patterns

Pattern	Description	Validation
IRD number	Inland Revenue Department number (8-9 digits)	Mod-11 check digit validation. Only values that pass the check digit algorithm are flagged, significantly reducing false positives
NHI number	National Health Index number	Format: 3 alpha characters followed by 4 digits (e.g., `ABC1234`). Alphabetic prefix is validated against known NHI character ranges
NZ bank account	Bank account number	Format: BB-bbbb-AAAAAAA-SSS (bank, branch, account, suffix). Bank and branch codes are validated against known ranges
NZ driver licence	Driver licence number	Alphanumeric format matching NZTA licence patterns

Australian patterns

Pattern	Description	Validation
Phone (AU)	Australian phone formats	`+61` prefix, valid area codes
TFN	Tax File Number (9 digits)	Check digit validation

Confidence levels

Each flagged column receives a confidence level based on what percentage of sampled rows matched the pattern:

Confidence	Match rate	Meaning
High	70% or more of sampled rows match	Very likely contains this PII type. The pattern consistently matches across the sample
Medium	30-69% of sampled rows match	Probably contains this PII type, but some rows do not match, possibly due to mixed data, nulls, or formatting variations
Low	10-29% of sampled rows match	Possibly contains this PII type. The match rate is low enough that the column may contain coincidental matches rather than actual PII

Columns with a match rate below 10% are not flagged. This threshold reduces noise from columns where a few values happen to match a pattern (for example, a product code column where some codes coincidentally look like phone numbers).

Review workflow

Flagged columns appear in the data classification column browser with a PII detection indicator showing the detected PII type and confidence level.

To review flagged columns:

Navigate to Governance > Classifications and filter by “Flagged by PII detection”
Select a flagged column to see the detection details: which pattern matched, the confidence level, and sample matched values
Decide whether the detection is accurate:
- If accurate, classify the column with the appropriate privacy level and PII type. Select Accept Detection to pre-fill the classification from the detection result
- If inaccurate, select Dismiss to mark the detection as a false positive. The column will not be re-flagged for the same pattern unless you re-scan manually
Repeat for remaining flagged columns

You can also bulk-review: select multiple flagged columns, accept all detections, or dismiss all as false positives.

Re-scanning

Automatic scans run in these situations:

New connector — when a new data source is connected and its first extraction completes
Schema change — when Rime detects that a table has new or renamed columns
Scheduled — a periodic scan runs daily (configurable) to catch changes that may not trigger event-based scans

To trigger a manual scan:

Navigate to Governance > PII Detection
Select Run Scan
Choose the scope: full account, specific database, schema, or table
Select Start

Manual scans are useful after bulk data loads, migrations, or when you want to re-evaluate columns that were previously dismissed as false positives.

False positive handling

False positives are inevitable in pattern-based detection. Rime provides several mechanisms to manage them:

Dismiss — mark a specific detection on a specific column as a false positive. The column will not be re-flagged for the same pattern
Exclude column — permanently exclude a column from PII scanning. Useful for columns that consistently trigger false matches (e.g., product codes that resemble phone numbers)
Exclude table — exclude an entire table from scanning. Useful for reference tables or lookup tables that contain no customer data
Adjust threshold — raise the minimum confidence level required for flagging. Setting it to “High only” eliminates most false positives at the cost of potentially missing some true positives

Dismissed false positives are logged in the audit log for compliance purposes.

Tier availability

PII detection is available at Business tier and above. Free/Trial and Small Business tiers do not include automatic PII scanning. See Masked by Default for full tier details.

Next steps

Review the Data Classification workflow for classifying flagged columns
Configure Masking Policies for detected PII types
Check the Compliance Reporting dashboard for classification coverage metrics