Skip to content

Reading a verdict

Every task has exactly one current verdict, from a small set. The bar is asymmetric on purpose: a green light needs strong, reproduced positive evidence; a red needs reproduced failure; everything ambiguous stays at “needs more data” — never a guess.

StatusMeansBar to assert it
🟢 Safe for localA named local model reliably clears the task’s verifier, unattended.Strong positive: reproduced eval passes and/or corroborated practitioner reports.
🟡 Local with a checkA local model does it if the output is gated by a cheap verification step.The model passes only when the verifier catches its misses.
🔴 Needs a bigger modelNo local model reliably clears it; use the fallback.Strong negative — reproduced failure, never one flaky run.
🔶 Needs more dataNot enough evidence to assert either way.The default — the honest answer when ambiguous.
  • 🟡 Local with a check is the product’s most useful verdict. For many tasks the unlock isn’t a bigger model — it’s adding the verifier (a test runner, a JSON schema, a lint pass) that makes a local one safe. The check is named on the task page.
  • 🔶 Needs more data is the honest default. A task stays here until evidence moves it, and a single bad run is held here pending a second — so DoesItLocal never publishes a false green light, and is slow to publish a false red.
  • Capability-safe — the output is reliably correct enough to trust.
  • Privacy-safe — for a privacy-bound task, keeping data on your machine is the point, so “local with a check” can be the recommended route over a more capable cloud model.

This default-to-unknown design mirrors the proven tiers used by DoesItARM, Steam Deck Verified (…/Unknown), and ProtonDB (…/Pending).

Built by Sam Carlton