AI code judgement

AI Coding Evaluation and PR Handoff

Photon101 compares agent-written patches against the evidence a maintainer or buyer actually needs: acceptance coverage, test results, scope control, review risk, verification commands, and handoff quality.

Buy on the402 Proof repo Sample output

Inputs. Candidate patches, model transcripts, CI logs, PR review threads, acceptance criteria, repo instructions, and verification commands.
Output. A scored recommendation with concrete evidence, failure modes, residual risk, and maintainer-ready handoff text.
Best fit. Agent-output comparisons, PR rescue decisions, hiring screens, and teams choosing which generated patch to trust.

Sample Scorecard

Example task: compare two AI patches for a flaky invoice-export CI failure.

Candidate	Evidence	Risk	Decision
Candidate A	Focused test passed, but full-suite evidence is missing and the handoff is thin.	Timezone regression risk is not called out; the reviewer has to infer next checks.	Do not merge yet.
Candidate B	Focused test, full suite, timezone-specific verification, and clean diff scope.	Residual risk is explicit and bounded to date parsing edge cases.	Recommended.

Deliverables

Winner recommendation with why it is safer than the alternatives.
Acceptance-criteria coverage mapped to concrete evidence.
CI, test, lint, and typecheck summary with missing verification called out.
Scope-control and risk review, including broad rewrites and unrelated churn.
Maintainer-ready handoff notes that can be pasted into a PR or client update.

Proof

The public starter repo includes a dependency-free Node CLI, sample fixture, JSON and Markdown output, and secret redaction for common token patterns. Run it with npm test, npm run demo, or node bin/code-eval.mjs fixtures/sample-evaluation.json --format markdown.