Reading a verdict

Every task has exactly one current verdict, from a small set. The bar is asymmetric on purpose: a green light needs strong, reproduced positive evidence; a red needs reproduced failure; everything ambiguous stays at “needs more data” — never a guess.

Status	Means	Bar to assert it
🟢 Safe for local	A named local model reliably clears the task’s verifier, unattended.	Strong positive: reproduced eval passes and/or corroborated practitioner reports.
🟡 Local with a check	A local model does it if the output is gated by a cheap verification step.	The model passes only when the verifier catches its misses.
🔴 Needs a bigger model	No local model reliably clears it; use the fallback.	Strong negative — reproduced failure, never one flaky run.
🔶 Needs more data	Not enough evidence to assert either way.	The default — the honest answer when ambiguous.

The two states that matter most

🟡 Local with a check is the product’s most useful verdict. For many tasks the unlock isn’t a bigger model — it’s adding the verifier (a test runner, a JSON schema, a lint pass) that makes a local one safe. The check is named on the task page.
🔶 Needs more data is the honest default. A task stays here until evidence moves it, and a single bad run is held here pending a second — so DoesItLocal never publishes a false green light, and is slow to publish a false red.

”Safe” means two things

Capability-safe — the output is reliably correct enough to trust.
Privacy-safe — for a privacy-bound task, keeping data on your machine is the point, so “local with a check” can be the recommended route over a more capable cloud model.

This default-to-unknown design mirrors the proven tiers used by DoesItARM, Steam Deck Verified (…/Unknown), and ProtonDB (…/Pending).

Built by Sam Carlton