Staging 7
Intermediate 5
Mart 5
Staging Layer One model per source table. Cleans, deduplicates, and standardizes raw data. Materialized as Views. No business logic — consumed by intermediate models only.
stg_customers Core customer identity and account profile
Primary Hub270K rows · daily
Source System
In this project
Synthetic customer records generated directly in Snowflake using a Python script to simulate a real CRM export. Stored in RAW.customers.
In production
CRM / core banking platform exports a nightly flat file to S3. Snowflake Task runs COPY INTO on a daily schedule into RAW.customers.
Why We Use It
Foundation of the entire data model. Every other source joins to this on customer_id. Provides identity, demographics, and account standing for all downstream models.
Freshness SLA · Grain
Daily at 02:00 UTC. Warn after 24h, error after 48h. · One row per customer. customer_id is unique and non-null.
Contributes Downstream
Customer identity → dim_customers KYC status → account eligibility logic Signup date → tenure in dim_customers Geography → fraud signals Account status → active customer filter Language → campaign personalization
Columns
ColumnTypeDefinition
customer_idPKvarcharUnique customer identifier. Natural key used across all models.
first_namevarcharLowercased and trimmed. Combined with last_name → full_name in dim_customers.
last_namevarcharLowercased and trimmed.
emailvarcharPrimary email. Unique. Used for campaign matching in stg_campaigns.
phonevarcharNullable. Not always captured at signup.
signup_datedateAccount creation date. Drives customer_tenure_days in dim_customers.
statevarcharUS state code. Used in geographic segmentation.
countryvarcharISO 2-letter country code. Used in cross-border fraud signal.
kyc_statusvarcharpending / verified / failed / expired. Gates transaction eligibility.
account_statusvarcharactive / suspended / closed / dormant. Filters active population in mart models.
preferred_languagevarcharDefaults to en. Used in campaign personalization logic.
_loaded_attimestampSnowflake load timestamp. Used in source freshness checks.
stg_orders Customer purchase events and order lifecycle
Transactional1.2M rows · daily
Source System
In this project
Synthetic order records generated in Snowflake via Python script simulating OMS transaction output. Stored in RAW.orders.
In production
OMS (Order Management System) exports daily flat files to S3. Snowflake Task runs COPY INTO for batch loading. Incremental strategy on order_date for large volumes.
Why We Use It
Drives the Frequency (F) and Monetary (M) dimensions of RFM scoring. Powers all purchase behavior aggregations in intermediate models and fact tables.
Freshness SLA · Grain
Daily at 02:00 UTC. Warn after 24h, error after 48h. · One row per order. Multiple orders per customer.
Contributes Downstream
Total orders → frequency score (RFM) Total spend → monetary score (RFM) Last order date → recency score (RFM) Avg order value → KPI in fct_customer_value Channel → behavioral segmentation Product category → int_product_affinity
Columns
ColumnTypeDefinition
order_idPKvarcharUnique order identifier from OMS.
customer_idFKvarcharReferences stg_customers.customer_id.
order_datedateDate order was placed. Used for recency calculation and cohort analysis.
order_statusvarcharcompleted / cancelled / pending / returned. Only completed count toward RFM.
total_amountfloatGross order value in USD before refunds.
product_idvarcharFK to stg_products. Used in int_product_affinity model.
quantityintNumber of units ordered. Used in avg_qty_per_order feature.
channelvarcharweb / mobile / in-store / partner. Used in channel mix reporting.
_loaded_attimestampSnowflake load timestamp.
stg_transactions Payment-level records powering fraud detection
Fraud Signals5M rows · near-realtime
Source System
In this project
Synthetic payment records generated in Snowflake via Python script. Highest volume source — 5M+ rows. Stored in RAW.transactions.
In production
Core payments ledger exports to S3 on micro-batch schedule. Snowpipe auto-ingest on S3 event notification for near-realtime loading into RAW.transactions.
Why We Use It
Only source with payment-level granularity. Enables velocity checks, amount anomaly detection, failed transaction ratios, and cross-border risk signals that drive the IsolationForest fraud model.
Freshness SLA · Grain
Near-realtime via Snowpipe. Warn after 1h, error after 3h. · One row per payment attempt including failed and reversed transactions.
Contributes Downstream
Total spend → monetary dimension Failed txn ratio → fraud signal Velocity (txns per day) → fraud signal Cross-border flag → fraud signal Payment method diversity → ML feature is_high_risk flag → fct_transactions
Columns
ColumnTypeDefinition
transaction_idPKvarcharUnique payment attempt identifier.
customer_idFKvarcharReferences stg_customers.customer_id.
order_idFKvarcharReferences stg_orders.order_id. Nullable — some transactions are standalone payments.
transaction_datetimestampUTC timestamp of payment attempt. Used for velocity and time-of-day risk scoring.
amountfloatTransaction value in local currency. Negative values = refunds.
currencyvarcharISO currency code. Non-USD values used in cross-border flag.
payment_methodvarcharcard / ach / wallet / wire. Used in payment method diversity feature.
transaction_typevarcharpurchase / refund / reversal / fee. Filter on purchase for spend metrics.
statusvarcharsuccess / failed / pending / reversed. Failed ratio is a key fraud signal.
merchant_idvarcharMerchant identifier. Used for distinct_merchants count in intermediate model.
_loaded_attimestampSnowpipe load timestamp. Critical for near-realtime freshness SLA.
stg_events Web and app clickstream for behavioral features
Behavioral8M rows · hourly
Source System
In this project
Synthetic clickstream events generated in Snowflake via Python script. Highest volume source by row count — 8M+ rows. Stored in RAW.events.
In production
Web and mobile analytics platform batches events hourly to S3. Snowpipe auto-ingest for near-realtime loading. Partitioned by event_date for query efficiency.
Why We Use It
Provides behavioral engagement signals that financial data cannot — session frequency, page depth, device preference, and recency of digital activity. Critical ML features for churn prediction.
Freshness SLA · Grain
Hourly batch. Warn after 3h, error after 6h. · One row per event action. Many events per session, many sessions per customer.
Contributes Downstream
Total sessions → engagement feature Avg session duration → churn signal Pages per session → depth of engagement Mobile session ratio → device preference Days since last session → recency signal New device flag → fraud signal
Columns
ColumnTypeDefinition
event_idPKvarcharUnique event identifier generated by tracking platform.
customer_idFKvarcharReferences stg_customers. Nullable for anonymous / pre-login sessions.
session_idvarcharGroups events into a single browsing session. Used to count total_sessions.
event_typevarcharpage_view / click / form_submit / login / logout. Filter varies by use case.
page_urlvarcharURL where the event occurred.
event_timestamptimestampUTC timestamp. Used for session duration and time-of-day analysis.
device_typevarcharmobile / desktop / tablet. Used for mobile_session_ratio and new_device_flag.
browservarcharBrowser identifier. Secondary signal for device fingerprinting in fraud model.
_loaded_attimestampLoad timestamp.
stg_products Product catalog for affinity and category analysis
Reference / Lookup~5K rows · weekly
Source System
In this project
Synthetic product catalog generated in Snowflake via Python script. Small, slow-changing reference table. Stored in RAW.products.
In production
Product catalog system exports weekly flat file to S3. Snowflake Task runs COPY INTO with TRUNCATE + INSERT strategy given small volume and full refreshes.
Why We Use It
Enriches order data with product category and price. Without this source we cannot build product affinity features or understand which categories drive revenue per segment.
Freshness SLA · Grain
Weekly refresh. Warn after 8 days, error after 14 days. · One row per product SKU. Joined to stg_orders on product_id.
Contributes Downstream
Top category → int_product_affinity Category diversity score → ML feature Unit price → gross margin in fct_orders is_active → filter discontinued products
Columns
ColumnTypeDefinition
product_idPKvarcharUnique product SKU. Joins to stg_orders.product_id.
product_namevarcharDisplay name of the product.
categoryvarcharTop-level product category. Primary grouping for affinity analysis.
subcategoryvarcharSecondary classification. Used for granular affinity scoring.
unit_pricefloatListed price in USD. Used to calculate gross_margin in fct_orders.
is_activebooleanFalse = discontinued. Filter to active products for recommendation use cases.
_loaded_attimestampLoad timestamp.
stg_campaigns Marketing outreach engagement and conversion signals
Marketing500K rows · daily
Source System
In this project
Synthetic campaign send records generated in Snowflake via Python script simulating a marketing automation platform output. Stored in RAW.campaigns.
In production
Marketing automation platform (email / SMS / push) exports nightly to S3. One row per campaign send per customer. Snowflake Task runs COPY INTO with incremental load on sent_at.
Why We Use It
Provides marketing responsiveness signals — open rate, click rate, conversion — used as behavioral features for churn prediction and segment responsiveness scoring.
Freshness SLA · Grain
Daily at 02:00 UTC. Warn after 24h. · One row per campaign send per customer.
Contributes Downstream
Email open rate → feat_customer_segmentation Click-through rate → engagement signal Conversion flag → revenue_attributed in fct_campaigns Channel preference → personalization logic Campaign ROI → marketing dashboard KPI
Columns
ColumnTypeDefinition
campaign_idPKvarcharUnique send identifier. Composite of campaign + customer + send_date in source.
customer_idFKvarcharReferences stg_customers.customer_id.
campaign_namevarcharName of the campaign as defined in the marketing platform.
channelvarcharemail / sms / push / in-app. Used for channel mix and preference analysis.
sent_attimestampUTC timestamp of message delivery.
opened_attimestampNullable. Timestamp of open event. Null = not opened.
clicked_attimestampNullable. Timestamp of click event. Null = not clicked.
convertedbooleanTrue if customer made a purchase within 7 days of receiving campaign.
_loaded_attimestampLoad timestamp.
stg_support_tickets Customer service interactions and satisfaction scores
Support / CSAT50K rows · daily
Source System
In this project
Synthetic support ticket records generated in Snowflake via Python script simulating a helpdesk platform export. Stored in RAW.support_tickets.
In production
Customer support platform exports nightly to S3. Snowflake Task runs COPY INTO with incremental load on created_at. Lowest volume source but high signal-to-noise ratio.
Why We Use It
Support ticket rate and CSAT scores are among the strongest leading indicators of churn. A customer filing multiple high-priority tickets with low satisfaction is significantly more likely to churn within 90 days.
Freshness SLA · Grain
Daily at 02:00 UTC. Warn after 24h. · One row per support ticket. Multiple tickets possible per customer.
Contributes Downstream
Support ticket rate → feat_customer_segmentation Avg CSAT score → customer health signal High-priority ticket count → churn risk signal Unresolved ticket flag → at-risk identification Resolution time → operational KPI in dashboards
Columns
ColumnTypeDefinition
ticket_idPKvarcharUnique support ticket identifier from helpdesk platform.
customer_idFKvarcharReferences stg_customers.customer_id.
created_attimestampWhen the ticket was opened by the customer.
resolved_attimestampNullable. When closed. Null = still open.
categoryvarcharbilling / fraud / technical / account / general. Used for issue type distribution.
priorityvarcharlow / medium / high / critical. High + critical tickets are strong churn signals.
statusvarcharopen / in-progress / resolved / closed. Open tickets on active customers = at-risk flag.
satisfaction_scoreintCSAT score 1–5. Nullable — not always submitted post-resolution.
_loaded_attimestampLoad timestamp.
Intermediate Layer Business joins and aggregations. Materialized as Ephemeral — compiled as CTEs inside downstream models. Never queried directly by BI tools or analysts.
int_customer_orders Per-customer order behavior aggregations
Ephemeral1 row / customer
Materialization
In this project
Ephemeral — compiled as a CTE inside downstream mart models. No physical table created in Snowflake.
In production
Could be promoted to a View or Table if multiple downstream models reference it independently or if query compilation cost becomes significant.
Why This Model Exists
Aggregates raw order rows into one customer-level summary. Keeps mart models clean — no GROUP BY logic in the final layer. Centralizes order metric definitions in one place.
Source Models
stg_customers · stg_orders · stg_products
Consumed By
fct_customer_value → total_orders, avg_order_value, recency fct_orders → order-level enrichment feat_customer_segmentation → frequency_norm, monetary_norm
Columns
ColumnTypeDefinition
customer_idPKvarcharFK to stg_customers. One row per customer.
total_ordersintCount of completed orders only. Excludes cancelled and returned.
completed_ordersintSame as total_orders. Kept explicit for clarity in downstream joins.
cancelled_ordersintCount of cancelled orders. Used for cancellation rate metric.
first_order_datedateDate of customer's first completed order. Used for cohort analysis.
last_order_datedateMost recent completed order date. Drives recency score in fct_customer_value.
avg_order_valuefloattotal_order_value / total_orders. Pre-computed here to avoid repetition in mart.
total_order_valuefloatSum of total_amount across all completed orders. Maps to monetary dimension.
top_categoryvarcharMost purchased product category. Used in product affinity and segmentation.
days_since_last_orderintCurrent date minus last_order_date. Key recency signal for RFM scoring.
int_customer_transactions Per-customer payment behavior aggregations
Ephemeral1 row / customer
Materialization
In this project
Ephemeral — compiled as a CTE. Aggregates 5M+ transaction rows to 1 row per customer before joining to mart models.
In production
Given the high volume of stg_transactions, this model would likely be promoted to an incremental Table in production to avoid re-aggregating 5M rows on every run.
Why This Model Exists
Reduces 5M+ transaction rows to one customer-level summary. Computes failed transaction ratios and spend metrics used by both fraud and segmentation models.
Source Models
stg_transactions · stg_customers
Consumed By
fct_customer_value → total_spend int_customer_fraud_signals → failed_txn_ratio feat_fraud_behavior → velocity and spend features
Columns
ColumnTypeDefinition
customer_idPKvarcharFK to stg_customers. One row per customer.
total_transactionsintCount of all payment attempts including failed.
total_spendfloatSum of successful purchase amounts in USD. Excludes refunds and fees.
avg_transaction_amountfloatMean transaction value. Used in amount anomaly detection.
max_transaction_amountfloatLargest single transaction. Used to flag unusually large payments.
distinct_merchantsintNumber of unique merchants transacted with. Diversity signal.
payment_methods_usedintCount of distinct payment methods. Higher = lower fraud risk typically.
failed_transactionsintCount of failed payment attempts. Used to compute failed_txn_ratio.
last_transaction_datetimestampMost recent successful transaction timestamp.
int_customer_fraud_signals Rule-based fraud flags derived from transaction patterns
Fraud LogicEphemeral
Materialization
In this project
Ephemeral — all fraud flag logic compiled as CTEs within feat_fraud_behavior and fct_transactions. Business rules are centralized here for consistency.
In production
Would be promoted to an incremental Table — fraud signal computation on 5M+ rows is expensive and results should be persisted daily rather than recomputed on every model run.
Why This Model Exists
Centralizes all rule-based fraud flag definitions. Ensures the same flag logic is used consistently across the is_high_risk column in fct_transactions and the fraud feature table for ML.
Source Models
stg_transactions · stg_customers · int_customer_transactions
Consumed By
fct_transactions → is_high_risk flag feat_fraud_behavior → all normalized fraud features customer_risk_scores → IsolationForest input
Columns
ColumnTypeDefinition
customer_idPKvarcharFK to stg_customers. One row per customer.
high_velocity_flagbooleanTrue if avg daily transactions exceed threshold. Velocity anomaly signal.
unusual_amount_flagbooleanTrue if any transaction exceeds 3x the customer's avg transaction amount.
new_device_flagbooleanTrue if transaction came from a device not seen in prior 30 days.
cross_border_flagbooleanTrue if transactions include non-USD currency. Geographic risk signal.
night_txn_ratiofloatProportion of transactions occurring between 00:00–05:00 UTC.
failed_txn_ratiofloatfailed_transactions / total_transactions. Key fraud signal.
avg_daily_txn_countfloatMean number of transactions per active day. Used in velocity scoring.
max_single_day_amountfloatHighest total spend in a single calendar day.
int_session_features Per-customer digital engagement aggregations
Ephemeral1 row / customer
Materialization
In this project
Ephemeral — aggregates 8M+ event rows down to 1 row per customer before joining to mart. Anonymous (null customer_id) events are excluded.
In production
Given 8M+ daily event rows, this would be an incremental Table in production — incrementally processing only new sessions rather than re-aggregating the full event history daily.
Why This Model Exists
Reduces raw clickstream to customer-level engagement metrics. These behavioral signals complement transaction data for a more complete view of customer health and churn risk.
Source Models
stg_events · stg_customers
Consumed By
fct_customer_value → total_sessions engagement signal feat_customer_segmentation → session_frequency_norm, avg_session_dur_norm
Columns
ColumnTypeDefinition
customer_idPKvarcharFK to stg_customers. One row per customer. Anonymous sessions excluded.
total_sessionsintCount of distinct session_ids linked to this customer.
avg_session_duration_secintMean session length in seconds. Low duration = low engagement / churn signal.
pages_per_sessionfloatAvg number of page_view events per session. Depth of engagement.
mobile_session_ratiofloatProportion of sessions on mobile device. Used for device preference segmentation.
last_active_datedateDate of most recent session. Digital recency signal.
days_since_last_sessionintCurrent date minus last_active_date. Strong churn predictor.
int_product_affinity Per-customer product category preference signals
AffinityEphemeral
Materialization
In this project
Ephemeral — joins stg_orders and stg_products, then ranks categories by purchase frequency per customer. Compiled as a CTE inside mart models.
In production
Table materialization preferred — product affinity calculations involve window functions over order history which are expensive to recompute on every run.
Why This Model Exists
Translates raw order and product data into customer preference signals. Enables product-based personalization and enriches ML segmentation features with category diversity scores.
Source Models
stg_orders · stg_products · stg_customers
Consumed By
feat_customer_segmentation → category_diversity_norm dim_customers → top product category enrichment personalization engine → product recommendations
Columns
ColumnTypeDefinition
customer_idPKvarcharFK to stg_customers. One row per customer.
top_categoryvarcharCategory with highest purchase count for this customer.
second_categoryvarcharSecond most purchased category. Used for cross-sell recommendations.
category_diversity_scorefloatNormalized entropy of category distribution. Higher = more diverse buyer.
repeat_purchase_ratefloatProportion of orders that are repeat purchases in the same category.
avg_qty_per_orderfloatMean quantity per order. Used in bulk vs single-unit buyer segmentation.
Mart Layer Analytics-ready star schema. Materialized as Tables. Consumed by Sigma, Power BI, MetricFlow semantic layer, and ML feature models. Is KPI column marks fields surfaced in executive dashboards.
dim_customers Customer dimension — cleaned profile for all fact joins
Dimension1 row / customer · daily
Materialization
In this project
Full refresh Table. Rebuilt daily. Small enough (~270K rows) that incremental is not needed.
In production
Table with full refresh daily. SCD Type 2 history tracked separately in snap_customer_profile for attributes like kyc_status and account_status.
Why This Model Exists
Single cleaned customer dimension. All fact tables join here on customer_id. Provides stable, deduplicated customer attributes for BI tools and mart models.
Source Models
stg_customers · int_product_affinity
Consumed By
fct_orders · fct_transactions · fct_campaigns · fct_customer_value Sigma dashboards — customer profile views snap_customer_profile — SCD Type 2 snapshot
Columns
ColumnTypeIs KPIDefinition
customer_idPKvarcharNoUnique customer identifier. FK source for all fact tables.
full_namevarcharNoConcatenation of first_name + last_name. Display field only.
emailvarcharNoPrimary contact email. Used for campaign matching.
signup_datedateNoAccount creation date. Used for cohort analysis and tenure.
statevarcharNoUS state code. Used in geographic segmentation dashboards.
countryvarcharNoISO 2-letter country code.
kyc_statusvarcharYesKYC verification state. Surfaced in compliance dashboard.
account_statusvarcharYesactive / suspended / closed / dormant. Used in active customer count KPI.
customer_tenure_daysintYesDays since signup_date. Used in cohort and retention dashboards.
is_activebooleanYesDerived from account_status = active. Standard active customer filter.
dbt_updated_attimestampNoTimestamp of last dbt model run. Used for freshness monitoring.
fct_orders Order-level fact table with enriched purchase data
Fact1 row / order · daily
Materialization
In this project
Full refresh Table rebuilt daily. Joins stg_orders with dim_customers and stg_products for enriched order facts.
In production
Incremental Table on order_date. Only processes new or updated orders each run. Merge strategy on order_id handles late-arriving status updates.
Why This Model Exists
Provides order-level granularity for revenue reporting, channel analysis, and product performance dashboards. Acts as the bridge between customer and product dimensions.
Source Models
dim_customers · stg_orders · stg_products
Consumed By
Sigma — revenue by channel dashboard Power BI — product performance report fct_transactions → order_id join for enrichment
Columns
ColumnTypeIs KPIDefinition
order_idPKvarcharNoUnique order identifier.
customer_idFKvarcharNoReferences dim_customers.customer_id.
product_idFKvarcharNoReferences stg_products.product_id.
order_datedateNoDate order was placed. Partition key for incremental loads.
order_statusvarcharNocompleted / cancelled / pending / returned.
total_amountfloatYesGross order value in USD. Core revenue KPI in Sigma dashboard.
quantityintNoUnits ordered. Used for volume reporting.
channelvarcharYesweb / mobile / in-store / partner. Channel mix is a tracked KPI.
gross_marginfloatYestotal_amount minus unit_price × quantity. Profitability KPI.
fct_transactions Payment-level fact table with fraud risk flag
Fact · Fraud1 row / transaction · daily
Materialization
In this project
Incremental Table on transaction_date. Processes only new transactions each run. Merge strategy on transaction_id.
In production
Incremental Table — same strategy. Given 5M+ daily rows, full refresh is not viable. is_high_risk flag recalculated only for transactions within the incremental window.
Why This Model Exists
Provides transaction-level detail for fraud investigation, payment reporting, and compliance dashboards. The is_high_risk flag makes fraud-flagged transactions directly queryable by analysts.
Source Models
dim_customers · stg_transactions · fct_orders · int_customer_fraud_signals
Consumed By
Power BI — fraud monitoring dashboard Sigma — transaction explorer Compliance team — high-risk transaction exports
Columns
ColumnTypeIs KPIDefinition
transaction_idPKvarcharNoUnique payment attempt identifier.
customer_idFKvarcharNoReferences dim_customers.customer_id.
order_idFKvarcharNoReferences fct_orders.order_id. Nullable.
transaction_datetimestampNoUTC timestamp. Incremental partition key.
amountfloatYesTransaction value in USD. Total flagged amount is a compliance KPI ($4.2M).
currencyvarcharNoISO currency code.
payment_methodvarcharNocard / ach / wallet / wire.
transaction_typevarcharNopurchase / refund / reversal / fee.
statusvarcharNosuccess / failed / pending / reversed.
is_high_riskbooleanYesTrue if transaction triggered fraud signals. 28K flagged in production. Core fraud KPI.
fct_campaigns Campaign engagement and revenue attribution fact table
Fact · Marketing1 row / send · daily
Materialization
In this project
Full refresh Table rebuilt daily. Joins stg_campaigns with dim_customers and applies revenue attribution logic.
In production
Incremental Table on sent_at date. Revenue attribution recalculated for the 7-day conversion window, so a 7-day lookback is applied on each incremental run.
Why This Model Exists
Provides campaign-level ROI and engagement metrics for the marketing team. Enables channel performance comparison and conversion rate reporting across email, SMS, and push campaigns.
Source Models
dim_customers · stg_campaigns · fct_orders
Consumed By
Sigma — campaign performance dashboard Marketing team — channel ROI reporting Power BI — conversion funnel report
Columns
ColumnTypeIs KPIDefinition
campaign_idPKvarcharNoUnique send identifier per customer per campaign.
customer_idFKvarcharNoReferences dim_customers.customer_id.
campaign_namevarcharNoCampaign name from marketing platform.
channelvarcharYesemail / sms / push / in-app. Channel mix is a tracked marketing KPI.
sent_attimestampNoUTC timestamp of message delivery.
openedbooleanYesTrue if message was opened. Open rate is a campaign performance KPI.
clickedbooleanYesTrue if CTA was clicked. Click-through rate is a tracked KPI.
convertedbooleanYesTrue if purchase made within 7 days of send. Conversion rate is core marketing KPI.
revenue_attributedfloatYesRevenue from orders within 7-day attribution window. Campaign ROI KPI.
fct_customer_value Customer 360 unified profile — core analytics table
Fact · Customer 3601 row / customer · daily
Materialization
In this project
Full refresh Table rebuilt daily. Joins all intermediate models into a single Customer 360 row per customer. Central model of the entire project.
In production
Incremental Table on customer_id. Only reprocesses customers with activity in the last 24h. RFM scores recalculated on a rolling 90-day window.
Why This Model Exists
Single source of truth for all customer-level KPIs. Powers Sigma dashboards, Power BI reports, ML feature tables, and the executive scorecard. Every downstream consumer reads from here.
Source Models
dim_customers · int_customer_orders · int_customer_transactions · int_session_features · int_product_affinity
Consumed By
Sigma — retention and segmentation dashboards Power BI — churn and LTV reports feat_customer_segmentation — KMeans input feat_fraud_behavior — IsolationForest input Executive KPI scorecard snap_customer_profile — SCD Type 2 tracking
Columns
ColumnTypeIs KPIDefinition
customer_idPKvarcharNoUnique customer identifier. FK to dim_customers.
total_ordersintYesLifetime completed order count. Drives frequency score and dashboard metrics.
total_spendfloatYesLifetime gross spend in USD. Core monetary KPI in executive scorecard.
avg_order_valuefloatYestotal_spend / total_orders. Tracked in executive KPI scorecard.
days_since_last_orderintNoRecency signal. Input to recency_score. Not surfaced directly in dashboards.
total_sessionsintNoLifetime session count from stg_events. Engagement input feature.
recency_scoreintNoRFM recency score 1–5. Internal scoring. Not a dashboard KPI directly.
frequency_scoreintNoRFM frequency score 1–5. Internal scoring.
monetary_scoreintNoRFM monetary score 1–5. Internal scoring.
rfm_segmentvarcharYesRFM label e.g. Champions, At Risk. Primary segmentation KPI in Sigma dashboard.
predicted_ltvfloatYes12-month predicted lifetime value in USD. Executive scorecard KPI.
churn_probabilityfloatYes0–1 churn risk score. >0.7 = high risk. Drives retention alerts. 85% model accuracy.
customer_segmentvarcharYesKMeans cluster label e.g. High Value, New. Used in personalization and dashboard.
composite_scorefloatNoWeighted blend of RFM + LTV + churn. Internal customer ranking signal.