Introduction to the Cassis data intelligence layer
Cassis reads your warehouse, your dbt project, your docs, your query logs, and your dashboards. It cross-references everything and builds a structured ontology: entities, metrics, dimensions, relationships. Every definition tracks where it came from, what it is based on, and how it has evolved.
Anyone, or any AI agent, can then ask a question in natural language. Cassis maps it to the ontology, generates SQL, and returns the answer with full provenance.
This documentation walks through every capability available in Cassis. Cassis is available through its own conversational interface or via MCP.
Overview
How it works
Connect
Connect your data
Cassis reads from multiple sources to build the richest possible ontology. Start by connecting your warehouse, then add whatever context you have.
Tables found: 347
Columns: 4,218
Introspected table names, column names, column types,
primary keys, foreign keys, and warehouse comments.
Metrics found: 41
Dimensions found: 112
Entities found: 18
⚠ 1 warning: models/staging/stg_events.yml, column 'user_id' has no description
Unique query patterns: 28
Transformation logic extracted: 31
Cross-referenced with dbt project. 14 queries reference
models already ingested. 17 use raw tables directly.
active_customer has 2 definitions
Notion glossary: "logged in within 30 days"
dbt model: "has_active_subscription = true"
All flagged for review during bootstrap.
Most queried tables: orders, customers, events, subscriptions
Common aggregations found: 23
Common joins found: 15
Metrics not yet in ontology: 9
Unmapped:
Chart 'MRR trend' uses: monthly_recurring_revenue (not in ontology)
Chart 'Cohort retention' uses: retention_rate (not in ontology)
... and 7 more
Bootstrap
Generate the ontology
Bootstrap combines everything Cassis has ingested and produces a structured ontology. Each definition carries a confidence level based on how much evidence supports it.
24 entities · 47 metrics · 138 dimensions · 31 relationships · 52 business terms
Confidence breakdown:
High: 168 (multiple sources agree)
Medium: 96 (single strong source or partial agreement)
Low: 28 (inferred from weak signal)
Each definition tracks its sources and rationale:
Where it came from (dbt, docs, query patterns, schema)
Which sources agree or disagree
When it was last reviewed or edited
11 conflicts need your review. 6 entities have no description yet.
Review and validate
The bootstrap summary shows what looks solid and what needs attention. Review objects one at a time, or run benchmarks to validate the ontology against known answers.
Total: 24 | With descriptions: 18 | Missing: 6
Metrics
Total: 47
From dbt: 32 | From query patterns: 10 | From docs: 5
Top issues (showing 3 of 11)
1. Ambiguous definition: active_customer
Context doc says "logged in within 30 days"
dbt model uses "has_active_subscription = true"
2. Missing join: campaign_attribution
No foreign key or join path between 'campaigns' and 'orders'
3. Conflicting aggregation: expansion_revenue
dbt model sums upgrade deltas; context doc uses full new plan amount
Accepted: "logged in within 30 days" (Notion glossary)
Archived: "has_active_subscription = true" (dbt model)
The subscription-based definition is still in the edit history
if you need it later.
✓ "What was total revenue in Q4 2025?"
Expected: $2,450,000 | Actual: $2,450,000
SQL: SELECT SUM(amount) FROM orders
WHERE order_date BETWEEN '2025-10-01' AND '2025-12-31'
✗ "What was expansion revenue last quarter?"
Expected: $184,000 | Actual: $211,500
SQL: SELECT SUM(upgrade_amount) FROM subscriptions
WHERE upgrade_date BETWEEN '2025-10-01' AND '2025-12-31'
Likely cause: the ontology definition includes add-on purchases
alongside plan tier upgrades. The expected answer may only
count tier upgrades.
Previous: SUM(upgrade_amount) — all subscription changes
Updated: SUM(upgrade_amount) WHERE change_type = 'tier_upgrade'
Re-running benchmark...
✓ "What was expansion revenue last quarter?"
Expected: $184,000 | Actual: $184,000
Benchmark results: 2/2 passed
Query
Ask a question
Ask a business question in natural language. Cassis maps it to the ontology, generates SQL, and returns the answer with full provenance. If something is ambiguous, missing, or wrong, Cassis surfaces it and the ontology improves as a side effect.
Source: dbt model marts_revenue
SQL: SELECT (SUM(beginning_period_revenue) + SUM(expansion_revenue) - SUM(contraction_revenue) - SUM(churned_revenue)) / SUM(beginning_period_revenue) * 100 AS nrr FROM marts_revenue WHERE period = '2025-Q4'
Cassis maintains conversation context. Follow-up questions reference the same concepts without restating them.
When the question is ambiguous
A question maps to multiple definitions. Cassis surfaces the options and asks you to choose. Your choice gets saved as a permanent disambiguation rule in the ontology.
This is how the ontology self-heals. Every disambiguation choice is captured immediately for the user and proposed as a team-wide default. Once approved, anyone (or any agent) asking about "churn" gets the right metric without guessing.
When something is missing
A question requires a concept that does not exist in the ontology. Cassis explains what is missing, shows what it can partially answer, and suggests the concrete addition needed.
"The marketing channel through which a customer was acquired"
Likely source: campaigns table or UTM tracking data
Description: "The marketing channel through which a customer was acquired"
Mapped to: customers.utm_source
Validation:
✓ Column exists in warehouse schema
✓ 4 distinct values: organic, paid_search, referral, social
Re-running your question...
CAC by channel:
organic: $0 (482 customers)
referral: $38.20 (294 customers)
paid_search: $187.40 (312 customers)
social: $241.60 (145 customers)
The gap is recorded even if you skip the fix. Next time someone asks about CAC by channel, Cassis remembers what is missing and does not start from scratch.
When the answer is wrong
You flag a wrong answer. Cassis diagnoses the issue, proposes a fix to the underlying definition, re-runs the query, and records the correction with full provenance.
SQL: SELECT SUM(amount) FROM orders WHERE order_date BETWEEN '2025-10-01' AND '2025-12-31' AND status = 'completed'
Proposed: SUM(orders.amount) WHERE status NOT IN ('refunded', 'chargeback')
Re-running with corrected definition...
Previous result: $2.3M
Corrected result: $2.1M
✓ Definition updated.
Edit history updated:
2026-03-05 Updated by @marc.
Change: "Added exclusion for refunds and chargebacks"
Trigger: user feedback on conversation
The correction is permanent. Every future query using the revenue metric applies the updated definition. The edit history records what changed, who changed it, and why.
Evaluate
Manage test cases
Build and maintain a dataset of question/expected-answer pairs. These are the ground truth Cassis is measured against.
1. "What was total revenue in Q4 2025?" → $2,100,000
2. "How many active customers do we have?" → 1,823
3. "What's our logo churn rate this quarter?" → 3.2%
4. "What was MRR last month?" → $412,000
5. "What's the average order value for enterprise customers?" → $8,450
Total test cases in dataset: 12
Run evaluations
Launch an evaluation against the current ontology. Cassis runs every test case, compares the result to the expected answer, and reports a summary. Nothing is modified.
Overall score: 83% (10/12 passed)
Avg. latency: 3.2s per question
✓ Passed: 10
✗ Failed: 2
Failures:
#7 "What's our CAC by channel?"
Expected: breakdown by channel | Actual: total CAC only
#11 "What was expansion revenue last quarter?"
Expected: $184,000 | Actual: $211,500
Review results
Drill into any test case to see what happened: expected vs actual answer, the SQL generated, and the full reasoning trace. Diagnose failures without guesswork.
Question: "What was expansion revenue last quarter?"
Expected: $184,000
Actual: $211,500
Verdict: FAIL (14.9% deviation)
Reasoning trace:
1. Resolved "expansion revenue" → expansion_revenue metric
2. Definition: SUM of plan upgrades in period
3. SQL: SELECT SUM(upgrade_amount) FROM subscriptions
WHERE upgrade_date BETWEEN '2025-10-01' AND '2025-12-31'
Likely cause: The metric includes plan tier changes
and add-on purchases. The expected answer may only count
tier upgrades. Check the expansion_revenue definition.
Track history
Browse past evaluation runs to track quality over time. Each run records the ontology state at that point, so you can compare results and trace what changed between runs.
Run #4 Today 83% (10/12) 12 test cases
Run #3 2026-03-18 80% (8/10) 10 test cases
Run #2 2026-03-11 70% (7/10) 10 test cases
Run #1 2026-03-04 60% (6/10) 10 test cases
Trend: quality improving (+23% since first run).
⚠ Ontology modified since Run #4. Results may have changed.
Curate
Browse and search
The ontology is a first-class object you can explore. Search by keyword, inspect full definitions with provenance and usage stats, or list everything at a glance.
subscriptions entity
"One row per active subscription"
mrr metric
"Monthly recurring revenue from subscriptions"
subscription_status dimension
"Current status: active, paused, cancelled"
churn_rate metric
"...lost subscriptions in period..."
customers → subscriptions relationship · one_to_many
plan_type dimension
"Subscription tier: free, pro, enterprise"
Description:
"Sum of all order amounts, excluding refunds and chargebacks"
SQL logic:
SUM(orders.amount)
WHERE orders.status NOT IN ('refunded', 'chargeback')
Entity: orders
Dimensions: order_date, customer_id, product_id, region
Dependencies: orders.amount, orders.status
Sources: dbt_definitions, user_edit
Edit history:
2026-03-01 Created. Bootstrapped from dbt model marts_revenue.
2026-03-05 Updated. Excluded refunds and chargebacks.
Trigger: user feedback.
Usage:
Used in 14 queries | Last used: 2026-03-05
revenue_per_user customers · inferred · Multiple aggregation patterns
cac campaigns · query_logs · No dbt model backing
expansion_revenue subscriptions · context_docs · Conflicting definitions
trial_conversion customers · inferred · Missing denominator logic
Low-confidence metrics are where the bootstrap had the least evidence. Reviewing them first gives you the highest leverage.
Add and edit
Add new definitions, edit existing ones, or remove things that shouldn't be there. Every change is validated against the warehouse schema and recorded in the edit history.
Description: "Average revenue per customer per month of tenure"
SQL logic: total_order_amount / (customer_tenure_days / 30)
Entity: customers
Dependencies: total_order_amount, customer_tenure_days
Validation:
✓ SQL valid against warehouse schema
✓ All dependencies resolved
Description:
Previous: "One row per order"
Updated: "One row per completed order, excluding test and internal orders"
⚠ 3 past query results may differ with this updated definition.
The revenue and order_count metrics depend on the orders entity.
No dependent objects. No queries affected.
Recorded in edit history.
Maintain
Stay in sync
Your data environment changes: new models, renamed columns, updated metric logic. These tools detect what changed and surface the ontology updates needed.
New table: product_reviews
Suggested: add as new entity with 6 dimensions
Renamed column: customers.signup_date → customers.created_at
Suggested: update dimension reference
Dropped column: orders.legacy_status
Suggested: remove dimension (no metrics depend on it)
New column: orders.discount_amount
Suggested: add as dimension on orders entity
Type change: events.event_value (varchar → numeric)
Suggested: review metrics that reference this column
Auto-applied: 2 (non-breaking)
· orders.discount_amount added as dimension (new column, no dependents)
· orders.legacy_status removed (column dropped, no metrics depend on it)
Needs review: 3
New metric: net_revenue (dbt)
Suggested: add metric "revenue minus refunds"
Modified metric: expansion_revenue (dbt)
Filter added: WHERE change_type = 'tier_upgrade'
Ontology already matches (updated during benchmark review)
Updated description: customers.status (dbt)
Suggested: update dimension description
Auto-applied: 2 (non-breaking)
· customers.status description updated to match dbt
· expansion_revenue already aligned, no change needed
Needs review: 1
Updated definition: active_customer
Previous: "logged in within 30 days"
Updated: "logged in within 14 days"
Suggested: update business term and review dependent metrics
New term: product_qualified_lead
Found in updated glossary
Suggested: add as business term, check if related metric exists
Export
Export the ontology to standard formats for use in other tools. Define once in Cassis, use everywhere.
Objects exported: 271 (filtered: accepted only, 21 pending review)
Format: dbt_yaml
⚠ 3 virtual dimensions with complex SQL were simplified
for dbt compatibility. Review the export for accuracy.