Schema Discovery
Schema discovery is the process of detecting the tables, columns, and data types available in a source system. Rime runs schema discovery during connector setup (step 4 of the configuration flow) and can re-run it on demand to detect changes.
How discovery works
When you trigger schema discovery, Rime queries the source system’s metadata:
- Databases (PostgreSQL, MySQL, MSSQL, Oracle): Rime queries the
information_schemaor equivalent system catalogue to list tables, columns, data types, and nullability constraints. - MongoDB: Rime samples documents from each collection to infer a tabular schema, since MongoDB does not enforce a fixed schema.
- SaaS connectors: Rime queries the application’s API metadata endpoints (e.g., Salesforce’s
describeAPI, Xero’s entity definitions). - REST APIs: Rime sends a test request and infers the schema from the JSON response structure.
- Files: Rime reads the file headers (CSV) or samples records (JSON) to determine column names and types.
The result is a list of tables (or endpoints, collections, objects — depending on the source type) with their columns and detected types.
Column selection
After discovery, you choose which tables and columns to extract:
- Table selection: check or uncheck entire tables. Only checked tables are extracted during syncs.
- Column inclusion/exclusion: within each selected table, you can exclude individual columns. This is useful for skipping large binary columns, sensitive data you do not need in your warehouse, or computed columns that would be redundant.
Excluded columns are never read from the source system. They are not extracted, not written to Parquet, and not loaded into Snowflake. This reduces extraction time and storage for tables with many columns you do not need.
Changes to column selection take effect on the next sync. Currently running syncs are not affected.
Type mapping
Rime maps source data types to Snowflake types through the Arrow type system. The general mapping is:
Database type mappings
| Source type | Arrow type | Snowflake type |
|---|---|---|
INTEGER, INT, INT4 | Int32 / Int64 | NUMBER |
BIGINT, INT8 | Int64 | NUMBER |
SMALLINT, INT2 | Int16 | NUMBER |
DECIMAL, NUMERIC | Decimal128 | NUMBER(p,s) |
REAL, FLOAT4 | Float32 | FLOAT |
DOUBLE PRECISION, FLOAT8 | Float64 | FLOAT |
VARCHAR, TEXT, CHAR | Utf8 | VARCHAR |
BOOLEAN, BOOL | Boolean | BOOLEAN |
DATE | Date32 | DATE |
TIMESTAMP, DATETIME | Timestamp | TIMESTAMP_NTZ |
TIMESTAMPTZ | Timestamp (with tz) | TIMESTAMP_TZ |
TIME | Time64 | TIME |
BYTEA, BLOB, BINARY | Binary | BINARY |
JSON, JSONB | Utf8 | VARIANT |
UUID | Utf8 | VARCHAR |
ARRAY | List | VARIANT |
INTERVAL | Utf8 (serialized) | VARCHAR |
SaaS and API type mappings
SaaS connectors and REST APIs produce JSON data, which Rime maps as follows:
| JSON type | Arrow type | Snowflake type |
|---|---|---|
| string | Utf8 | VARCHAR |
| number (integer) | Int64 | NUMBER |
| number (decimal) | Float64 | FLOAT |
| boolean | Boolean | BOOLEAN |
| null | (inferred from non-null values) | (column’s non-null type) |
| object | Utf8 (JSON string) | VARIANT |
| array | Utf8 (JSON string) | VARIANT |
| ISO 8601 date string | Timestamp | TIMESTAMP_TZ |
File type mappings
CSV files have no inherent types, so Rime infers types by scanning the data. See the CSV type detection section for details. JSON files carry explicit types that map directly.
Schema changes
Source systems change over time. Tables gain new columns, columns are removed, and data types change. Rime handles each case:
New columns
When Rime detects a column that did not exist during the previous schema discovery:
- The column appears as “new” in the schema discovery UI with a suggested Snowflake type.
- New columns are excluded by default. You must explicitly include them if you want to extract them. This prevents unexpected columns from appearing in downstream transformations.
- Including the new column and running a sync adds the column to the Snowflake table via
ALTER TABLE ADD COLUMN.
Removed columns
When a previously-discovered column no longer exists in the source:
- The column is marked as “removed” in the schema discovery UI.
- If the column was included in extraction, it remains in the Snowflake table but is no longer populated. New rows will have
NULLfor that column. - Rime does not drop columns from Snowflake tables. Dropping a column would destroy historical data. If you want to remove the column, do so manually in Snowflake or through your infrastructure configuration.
Type changes
When a column’s data type changes in the source:
- The schema discovery UI highlights the type change and shows the old and new types.
- If the new type is compatible (e.g.,
INTwidened toBIGINT, orVARCHAR(50)changed toVARCHAR(255)), Rime applies the type change to the Snowflake table automatically. - If the new type is incompatible (e.g.,
INTEGERchanged toVARCHAR), Rime flags the column as needing manual resolution. You can choose to keep the old type (values that do not fit will be cast or nulled) or update the type (which may require a SnowflakeALTER COLUMNthat could fail if existing data is not compatible).
Table renames and removals
- Renamed tables: Rime treats a renamed table as a removed table plus a new table. The old table stops receiving data, and the new table appears in discovery. Rime does not attempt to detect renames.
- Removed tables: tables that no longer exist in the source are marked as “removed” in the schema discovery UI. The Snowflake table is retained with its historical data.
Refreshing schema discovery
You can re-run schema discovery at any time from the connector detail page by clicking Refresh Schema. This queries the source system’s metadata again and compares the results against the previously discovered schema.
Reasons to refresh:
- You added new tables or columns to the source database
- You granted the connector user access to additional schemas
- The source system changed a column’s data type
- You want to verify the current schema matches what Rime has stored
Schema refresh does not trigger an extraction. It only updates the metadata about what is available. After reviewing changes, you can adjust column selection and then run a sync to extract the updated schema.
Schema in the UI
The schema discovery panel shows:
- Table list: all discovered tables with their row count estimate (for databases) and column count
- Column details: for each table, the column name, source type, mapped Snowflake type, nullability, and inclusion status
- Change indicators: “new”, “removed”, or “type changed” badges on columns that differ from the previous discovery
- Search and filter: filter tables by name, filter columns by type or status
Next steps
- Configure column selection during connector setup
- Understand the full extraction pipeline
- Monitor schema-related issues in the run history