NexusEval

0 Devlogs
0 Total hours

This platform is a governance layer for local LLMs that goes beyond accuracy metrics to answer: can this specific model be trusted? It runs models through six risk dimensions, racial bias, gender bias, socioeconomic bias, intersectional failures, robustness, and toxicity , It uses integrated evaluators including AIF360, DeepEval, Fairlearn, and HELM. My core creation is the Intersectionality Engine: rather than flagging bias along a single axis, it detects compounding failures, cases where a model degrades specifically when race and gender intersect, patterns that single-attribute tools miss entirely. All outputs are normalized through a Metric Standardization Layer into a unified JSON schema, feeding a Risk Heatmap that gives compliance teams a single auditable artifact, not a spreadsheet of scores, but a clear answer to "where does this model break, and for whom?"