Staging LayerOne model per source table. Cleans, deduplicates, and standardizes raw data. Materialized as Views. No business logic — consumed by intermediate models only.
stg_customersCore customer identity and account profile
Primary Hub270K rows · daily
▶
Source System
In this project
Synthetic customer records generated directly in Snowflake using a Python script to simulate a real CRM export. Stored in RAW.customers.
In production
CRM / core banking platform exports a nightly flat file to S3. Snowflake Task runs COPY INTO on a daily schedule into RAW.customers.
Why We Use It
Foundation of the entire data model. Every other source joins to this on customer_id. Provides identity, demographics, and account standing for all downstream models.
Freshness SLA · Grain
Daily at 02:00 UTC. Warn after 24h, error after 48h. · One row per customer. customer_id is unique and non-null.
Contributes Downstream
Customer identity → dim_customersKYC status → account eligibility logicSignup date → tenure in dim_customersGeography → fraud signalsAccount status → active customer filterLanguage → campaign personalization
Columns
Column
Type
Definition
customer_idPK
varchar
Unique customer identifier. Natural key used across all models.
first_name
varchar
Lowercased and trimmed. Combined with last_name → full_name in dim_customers.
last_name
varchar
Lowercased and trimmed.
email
varchar
Primary email. Unique. Used for campaign matching in stg_campaigns.
phone
varchar
Nullable. Not always captured at signup.
signup_date
date
Account creation date. Drives customer_tenure_days in dim_customers.
state
varchar
US state code. Used in geographic segmentation.
country
varchar
ISO 2-letter country code. Used in cross-border fraud signal.
active / suspended / closed / dormant. Filters active population in mart models.
preferred_language
varchar
Defaults to en. Used in campaign personalization logic.
_loaded_at
timestamp
Snowflake load timestamp. Used in source freshness checks.
stg_ordersCustomer purchase events and order lifecycle
Transactional1.2M rows · daily
▶
Source System
In this project
Synthetic order records generated in Snowflake via Python script simulating OMS transaction output. Stored in RAW.orders.
In production
OMS (Order Management System) exports daily flat files to S3. Snowflake Task runs COPY INTO for batch loading. Incremental strategy on order_date for large volumes.
Why We Use It
Drives the Frequency (F) and Monetary (M) dimensions of RFM scoring. Powers all purchase behavior aggregations in intermediate models and fact tables.
Freshness SLA · Grain
Daily at 02:00 UTC. Warn after 24h, error after 48h. · One row per order. Multiple orders per customer.
Contributes Downstream
Total orders → frequency score (RFM)Total spend → monetary score (RFM)Last order date → recency score (RFM)Avg order value → KPI in fct_customer_valueChannel → behavioral segmentationProduct category → int_product_affinity
Columns
Column
Type
Definition
order_idPK
varchar
Unique order identifier from OMS.
customer_idFK
varchar
References stg_customers.customer_id.
order_date
date
Date order was placed. Used for recency calculation and cohort analysis.
FK to stg_products. Used in int_product_affinity model.
quantity
int
Number of units ordered. Used in avg_qty_per_order feature.
channel
varchar
web / mobile / in-store / partner. Used in channel mix reporting.
_loaded_at
timestamp
Snowflake load timestamp.
stg_transactionsPayment-level records powering fraud detection
Fraud Signals5M rows · near-realtime
▶
Source System
In this project
Synthetic payment records generated in Snowflake via Python script. Highest volume source — 5M+ rows. Stored in RAW.transactions.
In production
Core payments ledger exports to S3 on micro-batch schedule. Snowpipe auto-ingest on S3 event notification for near-realtime loading into RAW.transactions.
Why We Use It
Only source with payment-level granularity. Enables velocity checks, amount anomaly detection, failed transaction ratios, and cross-border risk signals that drive the IsolationForest fraud model.
Freshness SLA · Grain
Near-realtime via Snowpipe. Warn after 1h, error after 3h. · One row per payment attempt including failed and reversed transactions.
Contributes Downstream
Total spend → monetary dimensionFailed txn ratio → fraud signalVelocity (txns per day) → fraud signalCross-border flag → fraud signalPayment method diversity → ML featureis_high_risk flag → fct_transactions
Columns
Column
Type
Definition
transaction_idPK
varchar
Unique payment attempt identifier.
customer_idFK
varchar
References stg_customers.customer_id.
order_idFK
varchar
References stg_orders.order_id. Nullable — some transactions are standalone payments.
transaction_date
timestamp
UTC timestamp of payment attempt. Used for velocity and time-of-day risk scoring.
amount
float
Transaction value in local currency. Negative values = refunds.
currency
varchar
ISO currency code. Non-USD values used in cross-border flag.
payment_method
varchar
card / ach / wallet / wire. Used in payment method diversity feature.
transaction_type
varchar
purchase / refund / reversal / fee. Filter on purchase for spend metrics.
status
varchar
success / failed / pending / reversed. Failed ratio is a key fraud signal.
merchant_id
varchar
Merchant identifier. Used for distinct_merchants count in intermediate model.
_loaded_at
timestamp
Snowpipe load timestamp. Critical for near-realtime freshness SLA.
stg_eventsWeb and app clickstream for behavioral features
Behavioral8M rows · hourly
▶
Source System
In this project
Synthetic clickstream events generated in Snowflake via Python script. Highest volume source by row count — 8M+ rows. Stored in RAW.events.
In production
Web and mobile analytics platform batches events hourly to S3. Snowpipe auto-ingest for near-realtime loading. Partitioned by event_date for query efficiency.
Why We Use It
Provides behavioral engagement signals that financial data cannot — session frequency, page depth, device preference, and recency of digital activity. Critical ML features for churn prediction.
Freshness SLA · Grain
Hourly batch. Warn after 3h, error after 6h. · One row per event action. Many events per session, many sessions per customer.
Contributes Downstream
Total sessions → engagement featureAvg session duration → churn signalPages per session → depth of engagementMobile session ratio → device preferenceDays since last session → recency signalNew device flag → fraud signal
Columns
Column
Type
Definition
event_idPK
varchar
Unique event identifier generated by tracking platform.
customer_idFK
varchar
References stg_customers. Nullable for anonymous / pre-login sessions.
session_id
varchar
Groups events into a single browsing session. Used to count total_sessions.
event_type
varchar
page_view / click / form_submit / login / logout. Filter varies by use case.
page_url
varchar
URL where the event occurred.
event_timestamp
timestamp
UTC timestamp. Used for session duration and time-of-day analysis.
device_type
varchar
mobile / desktop / tablet. Used for mobile_session_ratio and new_device_flag.
browser
varchar
Browser identifier. Secondary signal for device fingerprinting in fraud model.
_loaded_at
timestamp
Load timestamp.
stg_productsProduct catalog for affinity and category analysis
Reference / Lookup~5K rows · weekly
▶
Source System
In this project
Synthetic product catalog generated in Snowflake via Python script. Small, slow-changing reference table. Stored in RAW.products.
In production
Product catalog system exports weekly flat file to S3. Snowflake Task runs COPY INTO with TRUNCATE + INSERT strategy given small volume and full refreshes.
Why We Use It
Enriches order data with product category and price. Without this source we cannot build product affinity features or understand which categories drive revenue per segment.
Freshness SLA · Grain
Weekly refresh. Warn after 8 days, error after 14 days. · One row per product SKU. Joined to stg_orders on product_id.
Contributes Downstream
Top category → int_product_affinityCategory diversity score → ML featureUnit price → gross margin in fct_ordersis_active → filter discontinued products
Columns
Column
Type
Definition
product_idPK
varchar
Unique product SKU. Joins to stg_orders.product_id.
product_name
varchar
Display name of the product.
category
varchar
Top-level product category. Primary grouping for affinity analysis.
subcategory
varchar
Secondary classification. Used for granular affinity scoring.
unit_price
float
Listed price in USD. Used to calculate gross_margin in fct_orders.
is_active
boolean
False = discontinued. Filter to active products for recommendation use cases.
_loaded_at
timestamp
Load timestamp.
stg_campaignsMarketing outreach engagement and conversion signals
Marketing500K rows · daily
▶
Source System
In this project
Synthetic campaign send records generated in Snowflake via Python script simulating a marketing automation platform output. Stored in RAW.campaigns.
In production
Marketing automation platform (email / SMS / push) exports nightly to S3. One row per campaign send per customer. Snowflake Task runs COPY INTO with incremental load on sent_at.
Why We Use It
Provides marketing responsiveness signals — open rate, click rate, conversion — used as behavioral features for churn prediction and segment responsiveness scoring.
Freshness SLA · Grain
Daily at 02:00 UTC. Warn after 24h. · One row per campaign send per customer.
Contributes Downstream
Email open rate → feat_customer_segmentationClick-through rate → engagement signalConversion flag → revenue_attributed in fct_campaignsChannel preference → personalization logicCampaign ROI → marketing dashboard KPI
Columns
Column
Type
Definition
campaign_idPK
varchar
Unique send identifier. Composite of campaign + customer + send_date in source.
customer_idFK
varchar
References stg_customers.customer_id.
campaign_name
varchar
Name of the campaign as defined in the marketing platform.
channel
varchar
email / sms / push / in-app. Used for channel mix and preference analysis.
sent_at
timestamp
UTC timestamp of message delivery.
opened_at
timestamp
Nullable. Timestamp of open event. Null = not opened.
clicked_at
timestamp
Nullable. Timestamp of click event. Null = not clicked.
converted
boolean
True if customer made a purchase within 7 days of receiving campaign.
_loaded_at
timestamp
Load timestamp.
stg_support_ticketsCustomer service interactions and satisfaction scores
Support / CSAT50K rows · daily
▶
Source System
In this project
Synthetic support ticket records generated in Snowflake via Python script simulating a helpdesk platform export. Stored in RAW.support_tickets.
In production
Customer support platform exports nightly to S3. Snowflake Task runs COPY INTO with incremental load on created_at. Lowest volume source but high signal-to-noise ratio.
Why We Use It
Support ticket rate and CSAT scores are among the strongest leading indicators of churn. A customer filing multiple high-priority tickets with low satisfaction is significantly more likely to churn within 90 days.
Freshness SLA · Grain
Daily at 02:00 UTC. Warn after 24h. · One row per support ticket. Multiple tickets possible per customer.
Contributes Downstream
Support ticket rate → feat_customer_segmentationAvg CSAT score → customer health signalHigh-priority ticket count → churn risk signalUnresolved ticket flag → at-risk identificationResolution time → operational KPI in dashboards
Columns
Column
Type
Definition
ticket_idPK
varchar
Unique support ticket identifier from helpdesk platform.
customer_idFK
varchar
References stg_customers.customer_id.
created_at
timestamp
When the ticket was opened by the customer.
resolved_at
timestamp
Nullable. When closed. Null = still open.
category
varchar
billing / fraud / technical / account / general. Used for issue type distribution.
priority
varchar
low / medium / high / critical. High + critical tickets are strong churn signals.
status
varchar
open / in-progress / resolved / closed. Open tickets on active customers = at-risk flag.
satisfaction_score
int
CSAT score 1–5. Nullable — not always submitted post-resolution.
_loaded_at
timestamp
Load timestamp.
Intermediate LayerBusiness joins and aggregations. Materialized as Ephemeral — compiled as CTEs inside downstream models. Never queried directly by BI tools or analysts.
int_customer_ordersPer-customer order behavior aggregations
Ephemeral1 row / customer
▶
Materialization
In this project
Ephemeral — compiled as a CTE inside downstream mart models. No physical table created in Snowflake.
In production
Could be promoted to a View or Table if multiple downstream models reference it independently or if query compilation cost becomes significant.
Why This Model Exists
Aggregates raw order rows into one customer-level summary. Keeps mart models clean — no GROUP BY logic in the final layer. Centralizes order metric definitions in one place.
Ephemeral — compiled as a CTE. Aggregates 5M+ transaction rows to 1 row per customer before joining to mart models.
In production
Given the high volume of stg_transactions, this model would likely be promoted to an incremental Table in production to avoid re-aggregating 5M rows on every run.
Why This Model Exists
Reduces 5M+ transaction rows to one customer-level summary. Computes failed transaction ratios and spend metrics used by both fraud and segmentation models.
Source Models
stg_transactions · stg_customers
Consumed By
fct_customer_value → total_spendint_customer_fraud_signals → failed_txn_ratiofeat_fraud_behavior → velocity and spend features
Columns
Column
Type
Definition
customer_idPK
varchar
FK to stg_customers. One row per customer.
total_transactions
int
Count of all payment attempts including failed.
total_spend
float
Sum of successful purchase amounts in USD. Excludes refunds and fees.
avg_transaction_amount
float
Mean transaction value. Used in amount anomaly detection.
max_transaction_amount
float
Largest single transaction. Used to flag unusually large payments.
distinct_merchants
int
Number of unique merchants transacted with. Diversity signal.
Count of failed payment attempts. Used to compute failed_txn_ratio.
last_transaction_date
timestamp
Most recent successful transaction timestamp.
int_customer_fraud_signalsRule-based fraud flags derived from transaction patterns
Fraud LogicEphemeral
▶
Materialization
In this project
Ephemeral — all fraud flag logic compiled as CTEs within feat_fraud_behavior and fct_transactions. Business rules are centralized here for consistency.
In production
Would be promoted to an incremental Table — fraud signal computation on 5M+ rows is expensive and results should be persisted daily rather than recomputed on every model run.
Why This Model Exists
Centralizes all rule-based fraud flag definitions. Ensures the same flag logic is used consistently across the is_high_risk column in fct_transactions and the fraud feature table for ML.
Mean number of transactions per active day. Used in velocity scoring.
max_single_day_amount
float
Highest total spend in a single calendar day.
int_session_featuresPer-customer digital engagement aggregations
Ephemeral1 row / customer
▶
Materialization
In this project
Ephemeral — aggregates 8M+ event rows down to 1 row per customer before joining to mart. Anonymous (null customer_id) events are excluded.
In production
Given 8M+ daily event rows, this would be an incremental Table in production — incrementally processing only new sessions rather than re-aggregating the full event history daily.
Why This Model Exists
Reduces raw clickstream to customer-level engagement metrics. These behavioral signals complement transaction data for a more complete view of customer health and churn risk.
Ephemeral — joins stg_orders and stg_products, then ranks categories by purchase frequency per customer. Compiled as a CTE inside mart models.
In production
Table materialization preferred — product affinity calculations involve window functions over order history which are expensive to recompute on every run.
Why This Model Exists
Translates raw order and product data into customer preference signals. Enables product-based personalization and enriches ML segmentation features with category diversity scores.
Category with highest purchase count for this customer.
second_category
varchar
Second most purchased category. Used for cross-sell recommendations.
category_diversity_score
float
Normalized entropy of category distribution. Higher = more diverse buyer.
repeat_purchase_rate
float
Proportion of orders that are repeat purchases in the same category.
avg_qty_per_order
float
Mean quantity per order. Used in bulk vs single-unit buyer segmentation.
Mart LayerAnalytics-ready star schema. Materialized as Tables. Consumed by Sigma, Power BI, MetricFlow semantic layer, and ML feature models. Is KPI column marks fields surfaced in executive dashboards.
dim_customersCustomer dimension — cleaned profile for all fact joins
Dimension1 row / customer · daily
▶
Materialization
In this project
Full refresh Table. Rebuilt daily. Small enough (~270K rows) that incremental is not needed.
In production
Table with full refresh daily. SCD Type 2 history tracked separately in snap_customer_profile for attributes like kyc_status and account_status.
Why This Model Exists
Single cleaned customer dimension. All fact tables join here on customer_id. Provides stable, deduplicated customer attributes for BI tools and mart models.
Unique customer identifier. FK source for all fact tables.
full_name
varchar
No
Concatenation of first_name + last_name. Display field only.
email
varchar
No
Primary contact email. Used for campaign matching.
signup_date
date
No
Account creation date. Used for cohort analysis and tenure.
state
varchar
No
US state code. Used in geographic segmentation dashboards.
country
varchar
No
ISO 2-letter country code.
kyc_status
varchar
Yes
KYC verification state. Surfaced in compliance dashboard.
account_status
varchar
Yes
active / suspended / closed / dormant. Used in active customer count KPI.
customer_tenure_days
int
Yes
Days since signup_date. Used in cohort and retention dashboards.
is_active
boolean
Yes
Derived from account_status = active. Standard active customer filter.
dbt_updated_at
timestamp
No
Timestamp of last dbt model run. Used for freshness monitoring.
fct_ordersOrder-level fact table with enriched purchase data
Fact1 row / order · daily
▶
Materialization
In this project
Full refresh Table rebuilt daily. Joins stg_orders with dim_customers and stg_products for enriched order facts.
In production
Incremental Table on order_date. Only processes new or updated orders each run. Merge strategy on order_id handles late-arriving status updates.
Why This Model Exists
Provides order-level granularity for revenue reporting, channel analysis, and product performance dashboards. Acts as the bridge between customer and product dimensions.
Source Models
dim_customers · stg_orders · stg_products
Consumed By
Sigma — revenue by channel dashboardPower BI — product performance reportfct_transactions → order_id join for enrichment
Columns
Column
Type
Is KPI
Definition
order_idPK
varchar
No
Unique order identifier.
customer_idFK
varchar
No
References dim_customers.customer_id.
product_idFK
varchar
No
References stg_products.product_id.
order_date
date
No
Date order was placed. Partition key for incremental loads.
order_status
varchar
No
completed / cancelled / pending / returned.
total_amount
float
Yes
Gross order value in USD. Core revenue KPI in Sigma dashboard.
quantity
int
No
Units ordered. Used for volume reporting.
channel
varchar
Yes
web / mobile / in-store / partner. Channel mix is a tracked KPI.
gross_margin
float
Yes
total_amount minus unit_price × quantity. Profitability KPI.
fct_transactionsPayment-level fact table with fraud risk flag
Fact · Fraud1 row / transaction · daily
▶
Materialization
In this project
Incremental Table on transaction_date. Processes only new transactions each run. Merge strategy on transaction_id.
In production
Incremental Table — same strategy. Given 5M+ daily rows, full refresh is not viable. is_high_risk flag recalculated only for transactions within the incremental window.
Why This Model Exists
Provides transaction-level detail for fraud investigation, payment reporting, and compliance dashboards. The is_high_risk flag makes fraud-flagged transactions directly queryable by analysts.
Power BI — fraud monitoring dashboardSigma — transaction explorerCompliance team — high-risk transaction exports
Columns
Column
Type
Is KPI
Definition
transaction_idPK
varchar
No
Unique payment attempt identifier.
customer_idFK
varchar
No
References dim_customers.customer_id.
order_idFK
varchar
No
References fct_orders.order_id. Nullable.
transaction_date
timestamp
No
UTC timestamp. Incremental partition key.
amount
float
Yes
Transaction value in USD. Total flagged amount is a compliance KPI ($4.2M).
currency
varchar
No
ISO currency code.
payment_method
varchar
No
card / ach / wallet / wire.
transaction_type
varchar
No
purchase / refund / reversal / fee.
status
varchar
No
success / failed / pending / reversed.
is_high_risk
boolean
Yes
True if transaction triggered fraud signals. 28K flagged in production. Core fraud KPI.
fct_campaignsCampaign engagement and revenue attribution fact table
Fact · Marketing1 row / send · daily
▶
Materialization
In this project
Full refresh Table rebuilt daily. Joins stg_campaigns with dim_customers and applies revenue attribution logic.
In production
Incremental Table on sent_at date. Revenue attribution recalculated for the 7-day conversion window, so a 7-day lookback is applied on each incremental run.
Why This Model Exists
Provides campaign-level ROI and engagement metrics for the marketing team. Enables channel performance comparison and conversion rate reporting across email, SMS, and push campaigns.
Source Models
dim_customers · stg_campaigns · fct_orders
Consumed By
Sigma — campaign performance dashboardMarketing team — channel ROI reportingPower BI — conversion funnel report
Columns
Column
Type
Is KPI
Definition
campaign_idPK
varchar
No
Unique send identifier per customer per campaign.
customer_idFK
varchar
No
References dim_customers.customer_id.
campaign_name
varchar
No
Campaign name from marketing platform.
channel
varchar
Yes
email / sms / push / in-app. Channel mix is a tracked marketing KPI.
sent_at
timestamp
No
UTC timestamp of message delivery.
opened
boolean
Yes
True if message was opened. Open rate is a campaign performance KPI.
clicked
boolean
Yes
True if CTA was clicked. Click-through rate is a tracked KPI.
converted
boolean
Yes
True if purchase made within 7 days of send. Conversion rate is core marketing KPI.
revenue_attributed
float
Yes
Revenue from orders within 7-day attribution window. Campaign ROI KPI.
Full refresh Table rebuilt daily. Joins all intermediate models into a single Customer 360 row per customer. Central model of the entire project.
In production
Incremental Table on customer_id. Only reprocesses customers with activity in the last 24h. RFM scores recalculated on a rolling 90-day window.
Why This Model Exists
Single source of truth for all customer-level KPIs. Powers Sigma dashboards, Power BI reports, ML feature tables, and the executive scorecard. Every downstream consumer reads from here.