benchmark / evidence dashboard

WHAT VERIFIED.

One evidence view for what C-DAG actually ran, what verified, what failed, and what remains bounded.

validation coverage

COVERAGE

100k+ public financial rows processed

6 validation lanes

117 file corpus inspected

102 usable structured candidates

mortgage performance
mortgage applications
consumer complaints
CRT
CAS
public loss-exposure records

validation lanes

WHAT RAN

Lane	Rows	Result	Replay	Audit-chain
Freddie + Fannie + HMDA	30,000	APPROVE 9,584 / REVIEW 4,368 / DECLINE 16,048	100% replay success on sampled validation audit records	verified

verification

WHAT VERIFIED

Replay verification

Replay verification: 100% success on sampled validation audit records across reported validation runs.

Audit-chain integrity

Audit-chain integrity: verified across reported validation runs; tamper behavior covered by tests.

Holdout baseline

Rows: train 58,579 / test 41,421

Test positives: 201

Metrics: AUC 0.573062 / PR-AUC 0.006059

Distribution: APPROVE 36,336 / REVIEW 5,085 / DECLINE 0

Holdout baseline

Holdout baseline: real public outcome validation. Signal remains limited and should be treated as baseline governance evidence, not production model performance.

estimated exposure mapping

LOSS EXPOSURE MAPPING

Parsed public records map failure types to replayable evidence and audit-ready artifacts. This does not claim prevention or savings.

Records parsed: 5. Sources present: CFPB, FINRA, SEC, OCC, AI operational-loss research. Not yet parsed: FFIEC.

CFPB / Wells Fargo $3.7B order

Exposure: $3.7B

Failure type: Consumer-harm and servicing-control breakdown.

Missing artifact: Cross-workflow, replayable decision trace.

C-DAG fit: trace, counterfactual, replay, hash-chain, evidence pack, risk-exposure mapping

EVALUATE THE EVIDENCE.

Benchmark data is stored in validation/benchmark_metrics.json and summarized in validation/benchmark_report.md.