◐ Shell
clean mode source ↗

dbt integration: Validate entity column data type is appropriate

Context

PR #5827 added dbt integration that creates Entity objects from dbt model columns.

Problem

No validation that the entity column has an appropriate data type for use as an entity key. Entity keys should typically be:

  • STRING / VARCHAR
  • INT / INT64 / BIGINT
  • UUID (if supported)

But the code would accept any column type including:

  • FLOAT / DOUBLE (non-deterministic for joins)
  • BYTES (not suitable for entity keys)
  • TIMESTAMP (rarely appropriate)

Current Behavior

# In dbt_import.py:191-197
if entity_column not in column_names:
    click.echo(warning)
    continue
# No type checking!

Proposed Solution

Add validation and warning:

entity_col = next((c for c in model.columns if c.name == entity_column), None)
if entity_col:
    normalized_type = entity_col.data_type.upper()
    valid_entity_types = ['STRING', 'TEXT', 'VARCHAR', 'INT', 'INT32', 'INT64', 'INTEGER', 'BIGINT', 'UUID']
    
    if not any(t in normalized_type for t in valid_entity_types):
        click.echo(
            f"{Fore.YELLOW}Warning: Entity column '{entity_column}' has type "
            f"'{entity_col.data_type}' which may not be suitable for entity keys."
            f" Recommended types: STRING, INT64{Style.RESET_ALL}"
        )

Edge Cases to Handle

  • FLOAT columns (should warn strongly)
  • ARRAY columns (invalid for entities)
  • Complex/nested types (invalid)

Related