Concept: built-in Get Your Shit Done
We love local LLMs. We built codehamr to get the most out of the ~30B class on hardware you already own. This guide is about the one mechanism that makes that work.
The problem
You ask a local coding agent to fix a failing test. It edits a file, says "Done, test passes now." You run the test. Still red.
This is the default failure mode of ~30B LLMs. They pattern match what "finished" sounds like and write it. That single behavior is what makes most local agents hard to trust.
The fix
The agent has to show the receipt.
Before claiming done, the agent must paste a real chunk of terminal output from a passing check. codehamr keeps a log of every check it ran and compares byte for byte. No matching receipt, no done.
This rule is GYSD, short for Get Your Shit Done. Mechanical. The agent cannot talk its way past it.
Example
Honest agent
The agent runs pytest test_login.py and gets this output.
===== 1 passed in 0.34s =====
codehamr saves that line in its log. The agent claims done and quotes the line as receipt.
===== 1 passed in 0.34s =====
codehamr finds the string in the log. Accepted.
Lying agent
The agent skips the test and claims done with this evidence.
tests pass now
codehamr searches the log. Nothing matches. Rejected. The agent reads the rejection and has to actually run a test before trying again.
Why local LLMs need this
Every LLM is prone to faking completion, top tier rarely, local ~30B much more often. GYSD makes the loop more robust on any model. No real terminal output, no done.
The win is biggest on local, where that robustness is what makes the ~30B class actually reliable.
Why codehamr is minimal everywhere else
Same reasoning. codehamr ships with no MCP, no subagents, no skill system, no router, no memory file. A ~30B LLM has the context for it. qwen3.6:27b runs fine at 256k. But on local, quality per token drops faster as length grows. Every byte of harness is a byte not spent on your code.
Lean is not aesthetic. It is simple on purpose.
What GYSD does not solve
GYSD proves the agent ran a passing check. It does not prove the check was meaningful. A bad test still gets a green checkmark.
That is why the system prompt forbids trivial checks like echo done and pushes the red then green pattern on bug fixes, and why you still watch every step live in your terminal. Three layers stack, no single one is magic.
The model has to prove it. Not promise it.