benchmark / evidence dashboard
WHAT VERIFIED.
One evidence view for what C-DAG actually ran, what verified, what failed, and what remains bounded.
validation coverage
COVERAGE
- mortgage performance
- mortgage applications
- consumer complaints
- CRT
- CAS
- public loss-exposure records
validation lanes
WHAT RAN
| Lane | Rows | Result | Replay | Audit-chain |
|---|---|---|---|---|
| Freddie + Fannie + HMDA | 30,000 | APPROVE 9,584 / REVIEW 4,368 / DECLINE 16,048 | 100% replay success on sampled validation audit records | verified |
verification
WHAT VERIFIED
Replay verification
Replay verification: 100% success on sampled validation audit records across reported validation runs.
Audit-chain integrity
Audit-chain integrity: verified across reported validation runs; tamper behavior covered by tests.
Holdout baseline
Rows: train 58,579 / test 41,421
Test positives: 201
Metrics: AUC 0.573062 / PR-AUC 0.006059
Distribution: APPROVE 36,336 / REVIEW 5,085 / DECLINE 0
Holdout baseline
Holdout baseline: real public outcome validation. Signal remains limited and should be treated as baseline governance evidence, not production model performance.
estimated exposure mapping
LOSS EXPOSURE MAPPING
Parsed public records map failure types to replayable evidence and audit-ready artifacts. This does not claim prevention or savings.
Records parsed: 5. Sources present: CFPB, FINRA, SEC, OCC, AI operational-loss research. Not yet parsed: FFIEC.
CFPB / Wells Fargo $3.7B order
Exposure: $3.7B
Failure type: Consumer-harm and servicing-control breakdown.
Missing artifact: Cross-workflow, replayable decision trace.
C-DAG fit: trace, counterfactual, replay, hash-chain, evidence pack, risk-exposure mapping
EVALUATE THE EVIDENCE.
Benchmark data is stored in validation/benchmark_metrics.json and summarized in validation/benchmark_report.md.