Skip to content

Schema Discovery

Schema discovery is the process of detecting the tables, columns, and data types available in a source system. Rime runs schema discovery during connector setup (step 4 of the configuration flow) and can re-run it on demand to detect changes.

How discovery works

When you trigger schema discovery, Rime queries the source system’s metadata:

  • Databases (PostgreSQL, MySQL, MSSQL, Oracle): Rime queries the information_schema or equivalent system catalogue to list tables, columns, data types, and nullability constraints.
  • MongoDB: Rime samples documents from each collection to infer a tabular schema, since MongoDB does not enforce a fixed schema.
  • SaaS connectors: Rime queries the application’s API metadata endpoints (e.g., Salesforce’s describe API, Xero’s entity definitions).
  • REST APIs: Rime sends a test request and infers the schema from the JSON response structure.
  • Files: Rime reads the file headers (CSV) or samples records (JSON) to determine column names and types.

The result is a list of tables (or endpoints, collections, objects — depending on the source type) with their columns and detected types.

Column selection

After discovery, you choose which tables and columns to extract:

  • Table selection: check or uncheck entire tables. Only checked tables are extracted during syncs.
  • Column inclusion/exclusion: within each selected table, you can exclude individual columns. This is useful for skipping large binary columns, sensitive data you do not need in your warehouse, or computed columns that would be redundant.

Excluded columns are never read from the source system. They are not extracted, not written to Parquet, and not loaded into Snowflake. This reduces extraction time and storage for tables with many columns you do not need.

Changes to column selection take effect on the next sync. Currently running syncs are not affected.

Type mapping

Rime maps source data types to Snowflake types through the Arrow type system. The general mapping is:

Database type mappings

Source typeArrow typeSnowflake type
INTEGER, INT, INT4Int32 / Int64NUMBER
BIGINT, INT8Int64NUMBER
SMALLINT, INT2Int16NUMBER
DECIMAL, NUMERICDecimal128NUMBER(p,s)
REAL, FLOAT4Float32FLOAT
DOUBLE PRECISION, FLOAT8Float64FLOAT
VARCHAR, TEXT, CHARUtf8VARCHAR
BOOLEAN, BOOLBooleanBOOLEAN
DATEDate32DATE
TIMESTAMP, DATETIMETimestampTIMESTAMP_NTZ
TIMESTAMPTZTimestamp (with tz)TIMESTAMP_TZ
TIMETime64TIME
BYTEA, BLOB, BINARYBinaryBINARY
JSON, JSONBUtf8VARIANT
UUIDUtf8VARCHAR
ARRAYListVARIANT
INTERVALUtf8 (serialized)VARCHAR

SaaS and API type mappings

SaaS connectors and REST APIs produce JSON data, which Rime maps as follows:

JSON typeArrow typeSnowflake type
stringUtf8VARCHAR
number (integer)Int64NUMBER
number (decimal)Float64FLOAT
booleanBooleanBOOLEAN
null(inferred from non-null values)(column’s non-null type)
objectUtf8 (JSON string)VARIANT
arrayUtf8 (JSON string)VARIANT
ISO 8601 date stringTimestampTIMESTAMP_TZ

File type mappings

CSV files have no inherent types, so Rime infers types by scanning the data. See the CSV type detection section for details. JSON files carry explicit types that map directly.

Schema changes

Source systems change over time. Tables gain new columns, columns are removed, and data types change. Rime handles each case:

New columns

When Rime detects a column that did not exist during the previous schema discovery:

  • The column appears as “new” in the schema discovery UI with a suggested Snowflake type.
  • New columns are excluded by default. You must explicitly include them if you want to extract them. This prevents unexpected columns from appearing in downstream transformations.
  • Including the new column and running a sync adds the column to the Snowflake table via ALTER TABLE ADD COLUMN.

Removed columns

When a previously-discovered column no longer exists in the source:

  • The column is marked as “removed” in the schema discovery UI.
  • If the column was included in extraction, it remains in the Snowflake table but is no longer populated. New rows will have NULL for that column.
  • Rime does not drop columns from Snowflake tables. Dropping a column would destroy historical data. If you want to remove the column, do so manually in Snowflake or through your infrastructure configuration.

Type changes

When a column’s data type changes in the source:

  • The schema discovery UI highlights the type change and shows the old and new types.
  • If the new type is compatible (e.g., INT widened to BIGINT, or VARCHAR(50) changed to VARCHAR(255)), Rime applies the type change to the Snowflake table automatically.
  • If the new type is incompatible (e.g., INTEGER changed to VARCHAR), Rime flags the column as needing manual resolution. You can choose to keep the old type (values that do not fit will be cast or nulled) or update the type (which may require a Snowflake ALTER COLUMN that could fail if existing data is not compatible).

Table renames and removals

  • Renamed tables: Rime treats a renamed table as a removed table plus a new table. The old table stops receiving data, and the new table appears in discovery. Rime does not attempt to detect renames.
  • Removed tables: tables that no longer exist in the source are marked as “removed” in the schema discovery UI. The Snowflake table is retained with its historical data.

Refreshing schema discovery

You can re-run schema discovery at any time from the connector detail page by clicking Refresh Schema. This queries the source system’s metadata again and compares the results against the previously discovered schema.

Reasons to refresh:

  • You added new tables or columns to the source database
  • You granted the connector user access to additional schemas
  • The source system changed a column’s data type
  • You want to verify the current schema matches what Rime has stored

Schema refresh does not trigger an extraction. It only updates the metadata about what is available. After reviewing changes, you can adjust column selection and then run a sync to extract the updated schema.

Schema in the UI

The schema discovery panel shows:

  • Table list: all discovered tables with their row count estimate (for databases) and column count
  • Column details: for each table, the column name, source type, mapped Snowflake type, nullability, and inclusion status
  • Change indicators: “new”, “removed”, or “type changed” badges on columns that differ from the previous discovery
  • Search and filter: filter tables by name, filter columns by type or status

Next steps