My first eval killed v1
I built an LLM classifier, shipped it on vibes, then ran my first hand-labelled eval. Three of nine verdicts held up. Here's what that killed and what I'm rebuilding.
Essays and notes. Reverse-chron. Sporadic by design.
I built an LLM classifier, shipped it on vibes, then ran my first hand-labelled eval. Three of nine verdicts held up. Here's what that killed and what I'm rebuilding.
Claude Code on Max plans hides the standard 200k Opus 4.7 — here's how to surface it without disabling 1M context.
OpenAI is capitalizing on Anthropic's third-party tool ban with new Codex pricing — but the fine print isn't great for solo devs.
Browser agents don't need jailbreaks when a rendered page can already act as instructions.
Whisper + a hotkey + one small Python script. Thirty minutes to install. Pays for itself in a week.