Building the Stochastic Sandpit for AI

By Dennis Kennedy on February 27, 2026

We’ve spent the last couple of years treating generative AI like a vending machine. Select a task. Insert a prompt. Retrieve a product. And to be fair, in many legal and professional contexts that’s exactly the right frame: accuracy and precision matter and “creative” output in payroll or billing codes is usually just a polished error.

But there’s a quieter problem underneath the accuracy debate. These models are, at base, “stochastic parrots,” a term from computational linguistics introduced by Emily Bender, Timnit Gebru, and colleagues in their influential 2021 paper, for systems that generate fluent, plausible-sounding text by predicting statistically likely next tokens, without anything resembling genuine understanding or grounded reasoning. They are exceptionally good at producing language that sounds like it arrived via careful thought.

In other words, output that is smooth precisely where it should be textured. Which means the danger isn’t just hallucination. It’s fluent, confident, well-formatted conclusions that leave no impression of the reasoning behind them. There are no rough seams, no weight, and no trace of what pressed them into shape.

That’s what I’ve been calling the impressions problem. And it’s what the “stochastic sandpit” approach is designed to solve.

I think we’ve overlearned the wrong lesson. We’ve spent too much time trying to make AI accurate, and not enough time making it useful for thinking. The result is a strange mismatch: we’re neglecting the thing humans need when the stakes are real and the problem is messy: better judgment.

Here’s the distinction I want to put on the table. Accuracy is primarily a machine optimization target. Insight is primarily a human achievement. If what you want from AI isn’t a “perfect output,” but a better-informed investigator, someone who sees more options, catches blind spots earlier, and recognizes where the argument is skating on thin ice, then you need a different kind of working space. You need a place where the model is allowed to be wrong in interesting ways, without letting wrongness leak into your final work.

That place is what I’m calling my Stochastic Sandpit.

The Stochastic Sandpit

A stochastic sandpit is a deliberately designed thinking environment where you use AI more like a musical instrument than a vending machine. The goal isn’t to “get the answer.” The goal is to generate productive variation like frames, tensions, counterarguments, and edge cases so you can interrogate the problem more honestly and write (or decide) more intelligently on the other side.

This is where the core problem lands: if the machine is too clean, it leaves no impressions to read. Clean output hides seams. It hides uncertainty. It hides the leap from premise to conclusion. It produces something polished enough to lull you into thinking the work is done. But for investigators reading the seams is the work. You want the trail. You want the shape of what was there. You want to know where the reasoning jumped tracks.

Two Modes, Not Two AIs: Insurance Mode vs. Sandpit Mode

To make this practical, it helps to separate two modes of using AI by applying two different intentions and two different standards.

Insurance mode is what many organizations are building (and buying): guardrails, curated workflows, constrained outputs, compliance overlays, and liability management. It’s optimized for predictability and audit posture. You recognize it from constrained scope in the form of narrow tasks, bounded outputs, and fewer degrees of freedom. Conservative completions are the focus with less variance, fewer surprises, and fewer “creative” leaps. Done well, outputs are designed to be reviewable, repeatable, and defensible. This is the world of one-shot prompts designed to give “answers” and “results.”

In a lot of operational contexts, that’s exactly what you want. Nobody wants “creative exploration” in payroll, standard contract term language, or routine document formatting. However, insurance tends to compress the messy middle of thinking into a neat, answer-shaped object. It looks finished. It sounds confident. That alone can quietly reduce the amount of active reasoning the human does, especially when the output is fluent enough to feel authoritative.

Sandpit mode is the opposite posture. You use AI as a probabilistic engine for exploration to create a space for breadth, reframing, surprise, and sometimes “productive wrongness” that surfaces the assumptions you didn’t realize you were making. The point is not to ship what the model says. The point is to see the problem differently so you can ship your work more responsibly.

One important fairness point: insurance mode can still support thinking when it’s designed to surface uncertainty (for example, through structured critique, provenance cues, and forced alternatives) rather than simply to polish prose. The core problem is confusing a polished answer with an investigated conclusion and assuming that gives you “safety.” That’s why separating exploration from production matters. If you treat exploration like production, you either clamp down so hard that everything becomes bland or you accept risk you didn’t intend to accept. The sandpit is a container for exploration that prevents category errors.

What “Insight” Means Here (And It’s Not Just a Vibe)

If “insight” is going to do load-bearing work, it needs a working definition. In sandpit practice, insight is not “a clever paragraph.” Insight shows up when at least one of these happens. A new frame appears that changes what you think the real problem is. A hidden assumption surfaces that steers your reasoning unnoticed. A consequential edge case or failure mode emerges that alters your plan, your advice, or your confidence.

And here’s the discipline you need: if a sandpit session doesn’t produce at least one of those, treat it as warm-up, not insight, and stop. Otherwise, it becomes a vibe session, and “vibe sessions” are where people convince themselves they did thinking when they mostly did typing. And the AI agrees with you.

The Impressions: What Clean Output Erases

When I say “impressions,” I mean the traces you want to read so you can think better. In sandpit mode, you want the system to leave evidence of the things polished prose routinely erases: assumptions (what must be true for the conclusion to hold), inferential leaps (where it moved from A to C without showing B), missing premises (unstated warrants doing hidden work), uncertainty markers (what it can’t actually support), alternative hypotheses (other plausible explanations or frames), and failure modes (how this breaks when it meets reality). You can, and should, add your own items to the list.

Clean output gives you a conclusion without the shape of what made it. The sandpit gives you that missing shape. That’s exactly what a careful investigator needs to decide what belongs in the final work, and what doesn’t. Lawyers know this instinctively: a document that’s too clean has been processed and polished. The sandpit is where you see the draft before the polish, while the impressions are still readable.

A Gritty Example: The Sandpit in Legal Work (Without Breaking Anything)

Imagine you’re helping a client develop an internal AI policy. The stakes are real. You’re not going to “jam” with privileged details in a public tool. But you still need to think across incentives and failure modes, not just abstract principles.

So, you build a safe artifact: the goals (reduce risk without freezing innovation), the constraints (privacy, retention, vendor terms, regulatory exposure), the audiences (IT, legal, business leaders, frontline users), and the pressure points (fear, time, unclear guidance, internal politics). Then you run a sandpit session aimed at mapping the decision space, not writing the memo.

You ask for five frames (compliance-first, innovation-first, risk-tiered, training-first, governance-first). For each frame, you ask what it highlights and what it hides. You force an assumption sweep: what are we presuming about user behavior, incentives, and enforcement? You request failure modes: how could this policy create risk even if everyone “complies”? You demand the smart skeptic critique. You red team it.

What you get isn’t paste-ready language. You get a map: the objections you’ll face, the edge cases that will embarrass you later if you ignore them now, and the places your first draft was too linear. Then you switch modes and write the actual guidance carefully, with accountability and real-world constraints. The sandpit didn’t write the client memo. It made the memo-writer better. That’s the point.

A Second Micro-Case: Litigation or Negotiation Thinking

Here’s a smaller example that happens constantly in real practice. Take a draft argument section in a motion, or a negotiation position you’re about to take on a disputed contract clause. In production mode, you tend to press forward: make it clean, make it strong, make it persuasive.

In sandpit mode, you do something different. You ask for the strongest opposing brief against your position, the most likely misunderstanding a judge (or business principal) will have, and the edge case that turns your “routine” language into a future dispute.

Most of what comes back never ships . It shouldn’t. But it reliably surfaces the assumptions you didn’t realize you were making, and it gives you a better checklist for what you need to verify before you write the final version. Again, the value is not the text. The value is in the investigator and the investigation.

The Sandpit Safety Rule

To prevent the most common failure , sandpit text accidentally becoming production text , you need a hard boundary. I’ll share mine:

Sandpit Safety Rule: Nothing leaves the sandpit as “final” until a human rewrites it in production mode with source anchors, verification, and accountability.

“Source anchors” matter. It means that any key factual assertion in the final work must have an identifiable origin: a document, a record, a cite, a client-provided fact, a dataset, a contemporaneous note, or something else you could point to if asked, “Where did that come from?” The sandpit may help you discover what you need to know, but it does not get to invent what you claim to know.

Or shorter: no sandpit text ships without a human rewrite + verification pass + source anchor. This one rule does an enormous amount of work. It lets you benefit from variance while keeping responsibility where it belongs. It’s also the essence of human-in-the loop.

The Steve Gadd Rudiments of Thinking

A good sandpit session isn’t random. It has rudiments, those disciplines you practice so the variance becomes signal. Steve Gadd was one of the most respected session drummers in modern music, the drummer’s drummer, known not for flashiness but for mastery of the basic “rudiments,” the foundational patterns practiced until they become instinctive. His playing sounds effortless, even improvisational, but it rests on disciplined repetition of simple patterns. The sandpit works the same way.

Rohan Puranik recently observed that Jimi Hendrix was a systems engineer, not just a guitar player. What made his playing extraordinary wasn’t the notes. Instead, it was his notes combined with his command of feedback, room acoustics, and the instrument’s instability. He didn’t fight the noise in the signal. He designed with it.

The sandpit works similarly. Recently, running a sandpit pass on a framework I was developing, the model produced a “failure mode” I’d unconsciously excluded from my own thinking. Imagine a scenario where full compliance with the proposed guidelines would actually increase certain liability exposure. It was wrong about the details, and I knew that when I saw it, but the initial frame was right. It surfaced a question I needed to answer before I wrote the final version. That’s the edge of instability doing useful work.

Some of my own rudiments are simple drills: “Reframe it five ways.” “List hidden assumptions.” “Give the strongest opposing argument.” “Hunt edge cases.” “Do the cui bono pass: who benefits from this framing?” “Name what evidence would change your mind.”

Properly understood, these are thinking habits, not prompting tricks. The AI is just a fast variation engine. You’re the one doing judgment. And over time, the practice changes you as you get better at spotting leaps, better at recognizing missing premises, better at noticing what you’re avoiding. The questions I ask now are so much better than when I first started using AI.

Steelman: “Why Not Just Make AI More Reliable?”

A serious counterargument deserves serious treatment. The skeptic might say that the real problem isn’t over-focusing on accuracy; it’s that we don’t have enough reliability yet. If models were reliably correct, they could generate insight and accuracy. Messy exploration, they might say, is only noise with charisma. What we need is evaluation, grounding, and verification and not a romantic vision of a sandpit or children with a toy bucket and shovel on the beach.

There’s truth in that rather stark vision. Reliability matters. Evaluation matters. Verification matters. Messiness by itself is not insight. Maybe we’d be better off without Hendrix’s experiments. I’ll let someone else try to make that argument.

But even a future with far more reliable models won’t make insight automatic because insight depends on context, values, stakes, and interpretation. More importantly, at organizational scale, “insurance mode” will remain the default for most deployed systems because it is rational. When your job is to reduce liability across thousands of users, you will trade some cognitive texture for predictability. That doesn’t make insurance mode bad; it makes it inevitable.

Which is why the cognitive use of AI in the way it can help humans see better will remain underdeveloped unless we intentionally cultivate it. The sandpit isn’t a substitute for reliability. It’s a complement through a mode that strengthens the human side of the loop, and makes the eventual production work more careful, not less.

Confidentiality, Privilege, and Jamming Without Leaking

For lawyers and other high-stakes professionals, confidentiality and privilege are not footnotes; they’re first principles. That means sandpit practice needs safe patterns: work with hypotheticals or abstractions when the real facts are sensitive. Focus on structure, not substance (decision spaces, constraints, and failure modes). Redact aggressively if you must use an artifact at all. Use controlled environments where policy allows and governance is clear. Treat the sandpit as a thinking gym, not a filing cabinet. It’s for generating moves, not storing secrets.

If you can’t do it safely with real material, don’t do it with real material. Practice with training scenarios until the discipline is strong enough to be trustworthy.

A Simple Sandpit Protocol (Stealable)

If you want a starter protocol, here’s a practical one you can use tomorrow.

Step 1: Declare the mode (permission structure). At the top of your prompt, say: “This is a stochastic sandpit. Goal is insight, not correctness. Generate options, tensions, and failure modes. Label uncertainty. Do not write in final memo voice.”

Step 2: Paste your artifact. A paragraph, outline, issue statement, or constraint list.

Step 3: Run three passes (rudiments). Pass A: Frames: “Give five frames. For each: what it highlights and what it hides.” Pass B: Assumptions & tensions: “List assumptions. Identify contradictions and tradeoffs.” Pass C: Traps & failure modes: “Give edge cases, strongest counterargument, and likely failure modes.”

Step 4: Capture the yield. Pick one durable output: a revised thesis, an outline, top ten questions, a risks/edge cases list, or a next-step plan. Then switch modes and write the real thing.

If you do this well, the product of the session is not a paragraph you can paste. The product is a cleaner mind and a better checklist.

Conclusion: From Vending Machine to Investigator’s Workshop

We started with the vending machine because it was an irresistible story. You simply ask the machine and get the answer. In low-stakes settings, that story is often good enough. In high-stakes work, it’s a trap, because it tempts us to confuse polished language with investigated truth.

That’s why I’m increasingly convinced the next phase of AI competence isn’t just prompt craft. It’s workflow architecture. It’s knowing when you are exploring and when you are producing, and refusing to mix those standards.

Insurance mode will keep expanding, and for good reasons. Organizations need predictability. They need defensibility. They need systems that fail safely. But if the machine is too clean, it leaves no impressions to read. Without those impressions, the human can’t do the job that actually matters, which is judgment. At scale, that’s not a small risk. It’s how organizations become stochastic parrots fed by stochastic parrots.

Yes, let’s keep pushing reliability forward. Let’s build evaluation, grounding, and verification into the stack. But let’s also build the thinking spaces, because insight doesn’t arrive as a product feature. It arrives as a practice. The best practitioners won’t just prompt better. They’ll learn to play at the edge of instability. Then they know exactly when to step back into production mode.

The vending machine model has outlived its usefulness. More and more, we are kicking the vending machine because it won’t deliver what we paid for and hoping something will happen. The investigator needs a workshop. The stochastic sandpit is one way to build a place where the machine can be imperfect in productive ways, so the human can be more careful where it counts. And when you move from the sandpit back to production, the goal isn’t to make the machine sound smarter. The goal is to make your final work leave less to question, because you read the impressions while the sand was still wet.

[Originally posted on DennisKennedy.Blog (https://www.denniskennedy.com/blog/)]

DennisKennedy.com is the home of the Kennedy Idea Propulsion Laboratory

Like this post? Buy me a coffee

DennisKennedy.Blog is part of the LexBlog network.