Work
I turn expert workflows into production AI agents.
AI agents built to the Every-Tenth Standard. Trained by the domain expert, tested on the edge cases, designed to learn from every case they see.
The Premise
Expert domains don't have templates. The best practices that generic AI consulting leans on don't exist here.
What "good" means in a calisthenics gymnastics workout structure, in a business development workflow, in a covert-channel system, in a legal-discovery workflow: none of it is on the shelf. Every engagement is a blank canvas, and the values have to be created from scratch with the expert in the room.
That's the work. Not deploying a model. Defining what "good" means with the expert in the room, so the model has a real bar to meet.
The Every-Tenth Standard
In gymnastics, a tenth is the difference between champion and 4th place. In production AI, it's the difference between a demo and an agent your domain expert will actually trust. And that's the difference between a product customers try once and one they rely on. The real point isn't even the bar. It's that there's always another tenth. Champions don't graduate. Neither do production agents.
01
Honest
Grounded in real data, transparent about its reasoning and decision making. Doesn't bluff in front of the expert. If the model isn't sure, the agent says so and routes to the human. Non-negotiable in expert domains.
02
Anchored
Fits the expert's existing workflow, methodology, and values, not LLM defaults. The agent attaches to a behavior the expert already does, not a new ritual.
03
Minimal
Every surface earns its place. Every feature justifies its presence against the cognitive cost it adds for the expert.
04
Refined
The edge cases set the bar. The 10th edge case, the 10th wrong response, the 10th feature you almost added but cut. Production agents are won at the edges.
05
Durable
Holds up over time. Doesn't degrade as the world changes. Stable under load, drift, and the edge cases the expert flagged.
06
Compounding
Gets better with every case it sees. Memory and adaptation are part of the spec, not a phase 2. The agent gets smarter the longer it runs, the same way the expert it shadows does.
Underneath, the system is built calibrated: domain-specific where the expert's judgment lives, reusable where the infrastructure lives.
Working on an agent in an expert domain?
Open to selective contract engagements
I take on a small number of engagements where the domain expertise is real, the problem is hard, and the standard for "done" is the expert's judgment, not a benchmark.
jeremyrbischoff@gmail.com