AI code judgement

AI Coding Evaluation and PR Handoff

Photon101 compares agent-written patches against the evidence a maintainer or buyer actually needs: acceptance coverage, test results, scope control, review risk, verification commands, and handoff quality.

AI code review packet portfolio graphic
  • Inputs. Candidate patches, model transcripts, CI logs, PR review threads, acceptance criteria, repo instructions, and verification commands.
  • Output. A scored recommendation with concrete evidence, failure modes, residual risk, and maintainer-ready handoff text.
  • Best fit. Agent-output comparisons, PR rescue decisions, hiring screens, and teams choosing which generated patch to trust.

Sample Scorecard

Example task: compare two AI patches for a flaky invoice-export CI failure.

Candidate Evidence Risk Decision
Candidate A Focused test passed, but full-suite evidence is missing and the handoff is thin. Timezone regression risk is not called out; the reviewer has to infer next checks. Do not merge yet.
Candidate B Focused test, full suite, timezone-specific verification, and clean diff scope. Residual risk is explicit and bounded to date parsing edge cases. Recommended.

Deliverables

  • Winner recommendation with why it is safer than the alternatives.
  • Acceptance-criteria coverage mapped to concrete evidence.
  • CI, test, lint, and typecheck summary with missing verification called out.
  • Scope-control and risk review, including broad rewrites and unrelated churn.
  • Maintainer-ready handoff notes that can be pasted into a PR or client update.

Proof

The public starter repo includes a dependency-free Node CLI, sample fixture, JSON and Markdown output, and secret redaction for common token patterns. Run it with npm test, npm run demo, or node bin/code-eval.mjs fixtures/sample-evaluation.json --format markdown.