Reading a verdict
Every task has exactly one current verdict, from a small set. The bar is asymmetric on purpose: a green light needs strong, reproduced positive evidence; a red needs reproduced failure; everything ambiguous stays at “needs more data” — never a guess.
| Status | Means | Bar to assert it |
|---|---|---|
| 🟢 Safe for local | A named local model reliably clears the task’s verifier, unattended. | Strong positive: reproduced eval passes and/or corroborated practitioner reports. |
| 🟡 Local with a check | A local model does it if the output is gated by a cheap verification step. | The model passes only when the verifier catches its misses. |
| 🔴 Needs a bigger model | No local model reliably clears it; use the fallback. | Strong negative — reproduced failure, never one flaky run. |
| 🔶 Needs more data | Not enough evidence to assert either way. | The default — the honest answer when ambiguous. |
The two states that matter most
Section titled “The two states that matter most”- 🟡 Local with a check is the product’s most useful verdict. For many tasks the unlock isn’t a bigger model — it’s adding the verifier (a test runner, a JSON schema, a lint pass) that makes a local one safe. The check is named on the task page.
- 🔶 Needs more data is the honest default. A task stays here until evidence moves it, and a single bad run is held here pending a second — so DoesItLocal never publishes a false green light, and is slow to publish a false red.
”Safe” means two things
Section titled “”Safe” means two things”- Capability-safe — the output is reliably correct enough to trust.
- Privacy-safe — for a privacy-bound task, keeping data on your machine is the point, so “local with a check” can be the recommended route over a more capable cloud model.
This default-to-unknown design mirrors the proven tiers used by DoesItARM, Steam Deck Verified (…/Unknown), and ProtonDB (…/Pending).
Built by Sam Carlton