Introduction to the Cassis data intelligence layer

Cassis reads your warehouse, your dbt project, your docs, your query logs, and your dashboards. It cross-references everything and builds a structured ontology: entities, metrics, dimensions, relationships. Every definition tracks where it came from, what it is based on, and how it has evolved.

Anyone, or any AI agent, can then ask a question in natural language. Cassis maps it to the ontology, generates SQL, and returns the answer with full provenance.

This documentation walks through every capability available in Cassis. Cassis is available through its own conversational interface or via MCP.

How it works

Reads your environment
Connects to your warehouse, dbt project, query logs, documentation, and BI exports. Combines all of them to infer definitions.
Builds a structured ontology
Entities, metrics, dimensions, relationships, and business terms. Each carries a confidence level based on how much evidence supports it.
Tracks provenance
Every definition records where it came from. An append-only edit history shows who changed what, when, and why.
Improves through use
Every question answered, every ambiguity resolved, every correction made feeds back. The system gets smarter as a side effect of normal work.
Serves humans and agents
Cassis exposes the ontology and enables anyone, human or AI agent, to query the underlying data through it. The ontology is the governed, auditable layer between raw data and wherever that data needs to be queried. Cassis serves queries through its own interface and via MCP, so any compatible tool or agent gets the same trusted context.

Connect your data

Cassis reads from multiple sources to build the richest possible ontology. Start by connecting your warehouse, then add whatever context you have.

Cassis · setting up
Connect to our Snowflake warehouse.
Connected to acme-corp.snowflakecomputing.com using service account cassis_readonly. Read-only access.
Schemas found: 12
Tables found: 347
Columns: 4,218

Introspected table names, column names, column types,
primary keys, foreign keys, and warehouse comments.
Ingest our dbt project. The repo is github.com/acme/analytics.
Connected to GitHub repo. Parsed dbt project.
Files parsed: 23
Metrics found: 41
Dimensions found: 112
Entities found: 18

⚠ 1 warning: models/staging/stg_events.yml, column 'user_id' has no description
We also have ad-hoc SQL scripts in a separate repo: github.com/acme/data-scripts. Mostly analyst queries and one-off extracts.
Connected to GitHub repo. Scanned SQL files.
SQL files found: 42
Unique query patterns: 28
Transformation logic extracted: 31

Cross-referenced with dbt project. 14 queries reference
models already ingested. 17 use raw tables directly.
Ingest these internal docs. They cover metric definitions and onboarding context. [📎 data-glossary.md, onboarding-guide.md, metric-definitions.md]
Parsed 3 documents. 34 terms extracted, 28 definitions matched.
3 conflicts found:

active_customer has 2 definitions
Notion glossary: "logged in within 30 days"
dbt model: "has_active_subscription = true"

All flagged for review during bootstrap.
Pull the last 90 days of query logs from Snowflake.
Analysed 12,840 queries over 90 days.
Unique query patterns: 347
Most queried tables: orders, customers, events, subscriptions
Common aggregations found: 23
Common joins found: 15
Ingest our Metabase dashboard exports. [📎 dashboards-export.json]
Parsed 4 dashboards, 31 charts.
Metrics mapped to ontology: 22
Metrics not yet in ontology: 9

Unmapped:
Chart 'MRR trend' uses: monthly_recurring_revenue (not in ontology)
Chart 'Cohort retention' uses: retention_rate (not in ontology)
... and 7 more

Generate the ontology

Bootstrap combines everything Cassis has ingested and produces a structured ontology. Each definition carries a confidence level based on how much evidence supports it.

Cassis · bootstrapping
Bootstrap the ontology from everything we've ingested.
Cross-referencing warehouse schema + dbt definitions + uploaded docs + query logs + BI exports...
Ontology built. 2 minutes 14 seconds.

24 entities · 47 metrics · 138 dimensions · 31 relationships · 52 business terms

Confidence breakdown:
High: 168 (multiple sources agree)
Medium: 96 (single strong source or partial agreement)
Low: 28 (inferred from weak signal)

Each definition tracks its sources and rationale:
Where it came from (dbt, docs, query patterns, schema)
Which sources agree or disagree
When it was last reviewed or edited

11 conflicts need your review. 6 entities have no description yet.

Review and validate

The bootstrap summary shows what looks solid and what needs attention. Review objects one at a time, or run benchmarks to validate the ontology against known answers.

Cassis · reviewing
Show me the bootstrap summary. What needs attention?
Entities
Total: 24 | With descriptions: 18 | Missing: 6

Metrics
Total: 47
From dbt: 32 | From query patterns: 10 | From docs: 5

Top issues (showing 3 of 11)

1. Ambiguous definition: active_customer
Context doc says "logged in within 30 days"
dbt model uses "has_active_subscription = true"

2. Missing join: campaign_attribution
No foreign key or join path between 'campaigns' and 'orders'

3. Conflicting aggregation: expansion_revenue
dbt model sums upgrade deltas; context doc uses full new plan amount
For active_customer, the login-based definition is the right one. Accept that.
Resolved: active_customer

Accepted: "logged in within 30 days" (Notion glossary)
Archived: "has_active_subscription = true" (dbt model)

The subscription-based definition is still in the edit history
if you need it later.
Run a benchmark. I know total revenue in Q4 2025 was $2.45M, and expansion revenue was $184K.
Benchmark results: 1 passed, 1 failed

"What was total revenue in Q4 2025?"
Expected: $2,450,000 | Actual: $2,450,000
SQL: SELECT SUM(amount) FROM orders
WHERE order_date BETWEEN '2025-10-01' AND '2025-12-31'

"What was expansion revenue last quarter?"
Expected: $184,000 | Actual: $211,500
SQL: SELECT SUM(upgrade_amount) FROM subscriptions
WHERE upgrade_date BETWEEN '2025-10-01' AND '2025-12-31'

Likely cause: the ontology definition includes add-on purchases
alongside plan tier upgrades. The expected answer may only
count tier upgrades.
Right, expansion revenue should only count plan tier upgrades, not add-ons.
Updated: expansion_revenue

Previous: SUM(upgrade_amount) — all subscription changes
Updated: SUM(upgrade_amount) WHERE change_type = 'tier_upgrade'

Re-running benchmark...
"What was expansion revenue last quarter?"
Expected: $184,000 | Actual: $184,000

Benchmark results: 2/2 passed

Ask a question

Ask a business question in natural language. Cassis maps it to the ontology, generates SQL, and returns the answer with full provenance. If something is ambiguous, missing, or wrong, Cassis surfaces it and the ontology improves as a side effect.

Cassis · connected to prod warehouse
What was net revenue retention last quarter?
Net revenue retention for Q4 2025: 108.3% (beginning_period_revenue + expansion − contraction − churn) / beginning_period_revenue, for the period Oct 1 to Dec 31.
Used: net_revenue_retention (metric) · beginning_period_revenue · expansion_revenue · contraction_revenue · churned_revenue
Source: dbt model marts_revenue
SQL: SELECT (SUM(beginning_period_revenue) + SUM(expansion_revenue) - SUM(contraction_revenue) - SUM(churned_revenue)) / SUM(beginning_period_revenue) * 100 AS nrr FROM marts_revenue WHERE period = '2025-Q4'
What about Q3?
104.7% Same metric, different period (Jul 1 to Sep 30).
SQL: SELECT (SUM(beginning_period_revenue) + SUM(expansion_revenue) - SUM(contraction_revenue) - SUM(churned_revenue)) / SUM(beginning_period_revenue) * 100 AS nrr FROM marts_revenue WHERE period = '2025-Q3'

Cassis maintains conversation context. Follow-up questions reference the same concepts without restating them.

When the question is ambiguous

A question maps to multiple definitions. Cassis surfaces the options and asks you to choose. Your choice gets saved as a permanent disambiguation rule in the ontology.

Cassis · connected to prod warehouse
What's our churn rate?
The ontology has two churn metrics. Which one do you mean?
Logo churn
% of customers lost in period
Revenue churn
% of ARR lost in period
Logo churn. Save that as the default.
Logo churn this quarter: 3.2% COUNT of customers whose status changed to 'churned' / COUNT of customers at period start.
✓ Saved as @sophie's default · Mar 25. "Churn rate" resolves to logo churn for Sophie's queries. Pending team-wide approval by an ontology editor.

This is how the ontology self-heals. Every disambiguation choice is captured immediately for the user and proposed as a team-wide default. Once approved, anyone (or any agent) asking about "churn" gets the right metric without guessing.

When something is missing

A question requires a concept that does not exist in the ontology. Cassis explains what is missing, shows what it can partially answer, and suggests the concrete addition needed.

Cassis · connected to prod warehouse
What's our customer acquisition cost by channel?
I can compute total CAC ($142.50 per customer, total marketing spend / total new customers), but there's no channel-level attribution in the ontology.
Gap detected
I can see total marketing spend (from the campaigns entity) and new customer count (from the customers entity), but there's no mapping between campaigns and acquisition channels. I can't break down CAC by channel without it.
Suggested addition: acquisition_channel (dimension on customers)
"The marketing channel through which a customer was acquired"
Likely source: campaigns table or UTM tracking data
Add it. The column is customers.utm_source in the warehouse.
Created: acquisition_channel dimension on customers

Description: "The marketing channel through which a customer was acquired"
Mapped to: customers.utm_source

Validation:
Column exists in warehouse schema
4 distinct values: organic, paid_search, referral, social

Re-running your question...

CAC by channel:
organic: $0 (482 customers)
referral: $38.20 (294 customers)
paid_search: $187.40 (312 customers)
social: $241.60 (145 customers)

The gap is recorded even if you skip the fix. Next time someone asks about CAC by channel, Cassis remembers what is missing and does not start from scratch.

When the answer is wrong

You flag a wrong answer. Cassis diagnoses the issue, proposes a fix to the underlying definition, re-runs the query, and records the correction with full provenance.

Cassis · connected to prod warehouse
What was total revenue last quarter?
Total revenue for Q4 2025: $2.3M
Used: revenue (metric)
SQL: SELECT SUM(amount) FROM orders WHERE order_date BETWEEN '2025-10-01' AND '2025-12-31' AND status = 'completed'
That's wrong. Revenue should exclude refunded orders, not just count completed ones.
Definition update proposed
Previous: SUM(orders.amount) WHERE status = 'completed'
Proposed: SUM(orders.amount) WHERE status NOT IN ('refunded', 'chargeback')
Editing as @marc (role: ontology editor)

Re-running with corrected definition...
Previous result: $2.3M
Corrected result: $2.1M

✓ Definition updated.
Edit history updated:
2026-03-05 Updated by @marc.
Change: "Added exclusion for refunds and chargebacks"
Trigger: user feedback on conversation

The correction is permanent. Every future query using the revenue metric applies the updated definition. The edit history records what changed, who changed it, and why.

Manage test cases

Build and maintain a dataset of question/expected-answer pairs. These are the ground truth Cassis is measured against.

Cassis · evaluation
Add these test cases to the evaluation dataset.
5 test cases added:

1. "What was total revenue in Q4 2025?" → $2,100,000
2. "How many active customers do we have?" → 1,823
3. "What's our logo churn rate this quarter?" → 3.2%
4. "What was MRR last month?" → $412,000
5. "What's the average order value for enterprise customers?" → $8,450

Total test cases in dataset: 12

Run evaluations

Launch an evaluation against the current ontology. Cassis runs every test case, compares the result to the expected answer, and reports a summary. Nothing is modified.

Cassis · evaluation
Run an evaluation.
Running 12 test cases against the current ontology...
Evaluation complete. 38 seconds.

Overall score: 83% (10/12 passed)
Avg. latency: 3.2s per question

✓ Passed: 10
✗ Failed: 2

Failures:
#7 "What's our CAC by channel?"
Expected: breakdown by channel | Actual: total CAC only
#11 "What was expansion revenue last quarter?"
Expected: $184,000 | Actual: $211,500

Review results

Drill into any test case to see what happened: expected vs actual answer, the SQL generated, and the full reasoning trace. Diagnose failures without guesswork.

Cassis · evaluation
Drill into the expansion revenue failure.
Test case #11: expansion_revenue

Question: "What was expansion revenue last quarter?"
Expected: $184,000
Actual: $211,500
Verdict: FAIL (14.9% deviation)

Reasoning trace:
1. Resolved "expansion revenue" → expansion_revenue metric
2. Definition: SUM of plan upgrades in period
3. SQL: SELECT SUM(upgrade_amount) FROM subscriptions
WHERE upgrade_date BETWEEN '2025-10-01' AND '2025-12-31'

Likely cause: The metric includes plan tier changes
and add-on purchases. The expected answer may only count
tier upgrades. Check the expansion_revenue definition.

Track history

Browse past evaluation runs to track quality over time. Each run records the ontology state at that point, so you can compare results and trace what changed between runs.

Cassis · evaluation
Show me past evaluation runs.
4 past runs:

Run #4 Today 83% (10/12) 12 test cases
Run #3 2026-03-18 80% (8/10) 10 test cases
Run #2 2026-03-11 70% (7/10) 10 test cases
Run #1 2026-03-04 60% (6/10) 10 test cases

Trend: quality improving (+23% since first run).
⚠ Ontology modified since Run #4. Results may have changed.

Browse and search

The ontology is a first-class object you can explore. Search by keyword, inspect full definitions with provenance and usage stats, or list everything at a glance.

Cassis · browsing ontology
Search the ontology for anything related to subscriptions.
6 results:

subscriptions entity
"One row per active subscription"

mrr metric
"Monthly recurring revenue from subscriptions"

subscription_status dimension
"Current status: active, paused, cancelled"

churn_rate metric
"...lost subscriptions in period..."

customers → subscriptions relationship · one_to_many

plan_type dimension
"Subscription tier: free, pro, enterprise"
Show me the full definition of the revenue metric.
revenue metric | confidence: high

Description:
"Sum of all order amounts, excluding refunds and chargebacks"

SQL logic:
SUM(orders.amount)
WHERE orders.status NOT IN ('refunded', 'chargeback')

Entity: orders
Dimensions: order_date, customer_id, product_id, region
Dependencies: orders.amount, orders.status
Sources: dbt_definitions, user_edit

Edit history:
2026-03-01 Created. Bootstrapped from dbt model marts_revenue.
2026-03-05 Updated. Excluded refunds and chargebacks.
Trigger: user feedback.

Usage:
Used in 14 queries | Last used: 2026-03-05
List all low-confidence metrics.
4 low-confidence metrics:

revenue_per_user customers · inferred · Multiple aggregation patterns
cac campaigns · query_logs · No dbt model backing
expansion_revenue subscriptions · context_docs · Conflicting definitions
trial_conversion customers · inferred · Missing denominator logic

Low-confidence metrics are where the bootstrap had the least evidence. Reviewing them first gives you the highest leverage.

Add and edit

Add new definitions, edit existing ones, or remove things that shouldn't be there. Every change is validated against the warehouse schema and recorded in the edit history.

Cassis · editing ontology
Add a metric: customer_lifetime_value. It's the total order amount divided by customer tenure in months.
Created: customer_lifetime_value metric

Description: "Average revenue per customer per month of tenure"
SQL logic: total_order_amount / (customer_tenure_days / 30)
Entity: customers
Dependencies: total_order_amount, customer_tenure_days

Validation:
SQL valid against warehouse schema
All dependencies resolved
Update the description of the orders entity. It should say "One row per completed order, excluding test and internal orders."
Updated: orders

Description:
Previous: "One row per order"
Updated: "One row per completed order, excluding test and internal orders"

⚠ 3 past query results may differ with this updated definition.
The revenue and order_count metrics depend on the orders entity.
Remove the legacy_status dimension. Deprecated column.
Removed: legacy_status

No dependent objects. No queries affected.
Recorded in edit history.

Stay in sync

Your data environment changes: new models, renamed columns, updated metric logic. These tools detect what changed and surface the ontology updates needed.

Cassis · syncing
We shipped a bunch of schema changes last week. What's drifted?
5 changes detected:

New table: product_reviews
Suggested: add as new entity with 6 dimensions

Renamed column: customers.signup_date → customers.created_at
Suggested: update dimension reference

Dropped column: orders.legacy_status
Suggested: remove dimension (no metrics depend on it)

New column: orders.discount_amount
Suggested: add as dimension on orders entity

Type change: events.event_value (varchar → numeric)
Suggested: review metrics that reference this column

Auto-applied: 2 (non-breaking)
· orders.discount_amount added as dimension (new column, no dependents)
· orders.legacy_status removed (column dropped, no metrics depend on it)
Needs review: 3
Re-sync the dbt definitions. We updated some metrics last sprint.
3 changes detected:

New metric: net_revenue (dbt)
Suggested: add metric "revenue minus refunds"

Modified metric: expansion_revenue (dbt)
Filter added: WHERE change_type = 'tier_upgrade'
Ontology already matches (updated during benchmark review)

Updated description: customers.status (dbt)
Suggested: update dimension description

Auto-applied: 2 (non-breaking)
· customers.status description updated to match dbt
· expansion_revenue already aligned, no change needed
Needs review: 1
Our data glossary was updated last week. Re-sync it. [📎 data-glossary.md]
2 changes detected:

Updated definition: active_customer
Previous: "logged in within 30 days"
Updated: "logged in within 14 days"
Suggested: update business term and review dependent metrics

New term: product_qualified_lead
Found in updated glossary
Suggested: add as business term, check if related metric exists

Export

Export the ontology to standard formats for use in other tools. Define once in Cassis, use everywhere.

Cassis · exporting
Export the ontology as dbt YAML. Only include accepted definitions.
Exported to dbt YAML.

Objects exported: 271 (filtered: accepted only, 21 pending review)
Format: dbt_yaml

⚠ 3 virtual dimensions with complex SQL were simplified
for dbt compatibility. Review the export for accuracy.
Also available as LookML, JSON, or markdown.

Interested?

We're building the data intelligence layer. Sign up to hear from us first.

Stay tuned