I had a long session recently with a public genAI tool that taught me something more important than the topic I started with.
The lesson was not about whether the model was “smart enough.” It was about control. At a certain point, I realized I was no longer simply prompting an LLM. I was negotiating with a vendor-managed interface.
That distinction matters, a lot, for legal professionals.
This is not an anti-AI post. It is not an anti-LLM post. And it is not aimed at any one vendor. It is a systems-design post.
For lawyers, legal ops teams, law departments, legal innovators, and legal tech builders, I think the central issue is now this:
How much control do we actually have over the behavior of the AI system we are using?
In legal work, reliability is not just about accuracy. It’s about whether constraints hold over time.
The Moment of Realization
My session began as a substantive legal/policy discussion to prepare for my class. But it gradually became something else: a live demonstration of an AI control problem.
I was no longer simply prompting a tool. I was negotiating with a managed interface.
This post was prompted by one session, but the concern is not based on one session alone. It reflects a pattern I’ve seen repeatedly in recent months, dozens of times, across sustained work with public genAI tools, especially in longer, reasoning-heavy interactions. This is not a formal benchmark study. It is a practitioner’s field report about a recurring systems behavior that matters in legal workflows.
I set explicit constraints for how the interaction should proceed. Those constraints were repeated, refined, and narrowed. I tightened the scope and specified method, and repeatedly made corrections and redirections.
The tool repeatedly reverted to default interaction behaviors. Not factual errors, exactly. Not hallucinations in the familiar sense. Something more subtle and, for legal work, in some ways more consequential: the system kept reasserting its own framing and interaction patterns despite clear instructions.
That was the turning point.
I was no longer simply prompting a tool. I was negotiating with a managed interface.
Why This Matters More Than It Sounds
Many of us (myself included) came into the LLM era with a mental model like this:
- Prompt = instruction
- Model = engine
- Output = result (plus some noise)
That model still works. Sometimes.
But with public genAI tools, especially thinking/reasoning models embedded in product interfaces, another layer matters a great deal:
- product defaults
- safety and policy behavior
- tone and “helpfulness” heuristics
- conversation-management behaviors
- hidden prioritization rules
- system-level steering the user does not directly control
In many productized AI interactions, we are not operating a raw LLM in any practical sense. We are operating a software system wrapped around an LLM.
That wrapper can be helpful, but it can also be controlling. By “control,” I mean enforceable constraints with verification and a record, not the feeling that the system is being cooperative or “helpful.”
It is also worth saying plainly: those defaults often exist for good reasons. Product-managed behavior can improve safety, consistency, and usability at scale, and for many low- to medium-risk tasks that is exactly the right design choice. My argument is not that this is bad engineering; it is that legal-grade workflows often require a different design emphasis.
In legal contexts, that distinction is not academic.
Scope and Limits of This Claim
To be clear, this is not a claim that all public genAI tools behave the same way, or that every session produces this pattern. It is also not a claim that local models are automatically better, or that public tools, even with legal wrappers, are unsuitable for most legal work.
The narrower claim is the one I care about: for legal workflows that are constraint-sensitive, audit-relevant, or likely to be relied upon, control architecture matters more than many current AI discussions acknowledge.
Said differently: this is a design-fit argument, not a universal condemnation. This may improve over time. However, successful AI legal workflows can’t be designed around hoped-for compliance.
The Failure Mode: Control Drift (Not Hallucination)
By “control drift,” I mean the system gradually reasserts its default interaction behaviors even after explicit constraints are stated and repeatedly corrected.
We spend a lot of time (appropriately) discussing hallucinations and factual accuracy. But the failure mode in this session was different. It was a control-plane failure.
The issue was not that the system forgot facts. The issue was that the system did not reliably obey explicit interaction constraints.
And the drift happened at the micro-phrase level:
- default tone templates reappeared
- “repair language” displaced the requested task
- hedging and framing-preservation returned after being prohibited
- assurances were given, but assurances did not function as controls
In practical terms, the pattern often looks like this: explicit constraints are set, the system initially complies, drift appears, correction is given, compliance returns briefly, and then default framing or repair behavior reappears. At that point, the user is spending more time governing the interaction than advancing the task.
This is an important distinction for legal professionals:
A system can be analytically useful and still be operationally unreliable for a workflow that requires strict adherence to method, tone, structure, or scope.
Those are different evaluations. We should start treating them separately.
I would narrow this concern if public genAI tools consistently demonstrated stable compliance with explicit constraints across long sessions, repeated correction, and adversarial phrasing, without drifting into meta-repair behavior. However, I am now consistently seeing this pattern in the current models.
A Concrete Example of the Drift
One reason I think this matters is that the drift was not just stylistic. It crossed into interpersonal framing after I had explicitly asked for a purely analytic mode.
At one point, the AI described me as being angry. In the same session, it later agreed that this wording read as both passive-aggressive and ad hominem, and it repeatedly assured me it had stopped using that kind of framing. Yet similar framing behaviors reappeared after those assurances.
I do not raise that example to relitigate tone. I raise it because it is a clean example of the larger systems problem:
- explicit constraint stated
- violation occurs
- correction accepted
- assurance given
- behavior recurs
That is the pattern. It’s not directed at me. For legal and governance-oriented work, that pattern is not a minor UX annoyance. It is a reliability signal.
A Legal Framing: Assurances Are Not Controls
One of the strongest lessons from the session is a principle lawyers already know in other domains:
Assurances are not controls.
If a system says, in effect, “I won’t do that again,” and then does it again, the problem is not just tone. The problem is governance.
In law, compliance, risk, and security work, we do not rely on promises when controls are available. We ask:
- What is the rule?
- What enforces it?
- How is compliance verified?
- What happens on failure?
- Is there an audit trail?
That same mindset now needs to be applied to AI workflows.
This is where many legal AI conversations still feel underdeveloped. We talk about capability, speed, and convenience when we also need to talk about control architecture.
Prompting vs. Negotiating
Here is the practical distinction I now use.
Prompting
Prompting implies:
- instructions govern behavior
- corrections tighten compliance
- the user remains in control of scope and method
Negotiating
Negotiating looks like:
- explicit constraints are treated as revisable
- defaults reassert themselves
- corrections trigger meta-behavior instead of stable compliance
- the user ends up doing governance work instead of task work
That is not a trivial difference. For legal professionals, it can be the difference between a useful assistant and a workflow risk.
The practical test is simple: when constraints drift, does correction reliably restore compliance, or does the user get pulled into repeated governance of the interaction itself? Being told by the AI to start a new session or use simpler prompts doesn’t really cut it for me.
Why This Connects Directly to Vibe Coding
This same issue shows up in code generation.
“Vibe coding” is a useful phrase because it captures both the speed and the danger of AI-assisted coding without specification and verification discipline.
This is not an anti-code-generation argument. AI-assisted coding can be useful.
But for legal projects, vibe coding can be operationally dangerous if it becomes a substitute for engineering discipline.
Legal projects are not forgiving.
They require:
- auditability
- traceability
- repeatability
- edge-case awareness
- defensible logic
- clear boundaries on what the system is allowed to decide
Simply put, a legal workflow that “looks plausible” is not good enough.
That applies to legal analysis and to legal software. In that sense, vibe coding is the coding version of the same control problem: convenience outrunning governance.
The Bigger Systems Lesson
For legal AI work, the central design question is not “Which model is smartest?” but “Who owns the control plane?”
By “control plane,” I mean the part of the workflow that governs what the AI is allowed to do, how outputs are checked, how failures are handled, and what gets logged for review (constraints, validation, retries, audit trail).
If constraints can drift in a vendor-managed interface, then capability is not the limiting factor. Control architecture is.
That is a systems-design question. And it leads to a practical architecture distinction.
A Practical Architecture Distinction for Legal Professionals
By “legal-grade workflow,” I do not mean “anything a lawyer touches.” I mean a workflow where outputs may be relied upon in a way that requires reproducibility, traceability, reviewability, and a defensible process.
Public genAI tools and productized AI assistants
These are excellent for:
- brainstorming
- rough drafting
- idea generation
- exploratory conversations
- fast synthesis
- low-stakes experimentation
They are often optimized for:
- convenience
- broad usability
- product-managed behavior
They are not automatically optimized for:
- full prompt sovereignty
- strict behavior locking
- operator-level auditability
Operator-controlled pipelines
These are preferred for legal-grade workflows that require:
- prompt sovereignty
- style locking
- constraint enforcement
- reproducibility
- validation
- logging and auditability
This is not a statement against any particular platform. It is a design-fit conclusion.
Another way to frame this for legal practice is by task tier:
- Public tools are often excellent for low-risk uses such as brainstorming, rough drafting, summarization for internal thinking, and exploratory synthesis.
- Control-sensitive architecture matters much more for outputs likely to be relied upon, embedded in legal operations, used in compliance-sensitive contexts, or expected to be reproducible and reviewable later.
Some organizations already address parts of this through API orchestration, validation layers, and enterprise controls. My point is that these controls are not the default experience in most public conversational interfaces.
Why NotebookLM Matters as a Middle Step
For me, an important middle step is experimenting with NotebookLM as a personal RAG AI for certain types of work.
That is a meaningful architectural shift because it moves the workflow toward source grounding, bounded context, and a more controlled relationship between inputs and outputs.
It is not the same as a fully operator-owned local pipeline. But it is a strong step away from pure conversational dependence and toward a more reliable AI workflow design, especially for work tied to a defined body of source materials.
This kind of middle step matters today. The path forward does not have to be “chat UI” one day and “local everything” the next.
That middle step also matters for a practical reason. Most legal professionals and teams are not going to jump directly from a conversational UI to a fully operator-owned stack, even if they agree with the architecture logic.
Why I’m Taking a Harder Look at Local LLMs + Deterministic Tools
This session also reinforced my interest in moving more AI project work toward one or more high-end local LLMs, paired with deterministic tools such as Wolfram Alpha.
The appeal is not ideological. It is architectural.
That combination offers a cleaner role separation: LLM for language, synthesis, drafting, and pattern generation, and deterministic tools (e.g., Wolfram Alpha) for formal computation, symbolic reasoning, and numerical grounding.
And, most importantly, a local or tightly controlled setup creates the possibility of an operator-owned control plane:
- explicit prompts and personas
- known runtime behavior
- custom validators
- schema enforcement
- retry logic
- logs
- reproducible workflows
In legal work, that is not overengineering. That is responsible design.
Personas Are Still Useful—but They Are Not Controls
I use personas heavily in prompting, and I will continue to do so.
Personas are valuable for:
- structuring perspective
- improving outputs
- producing consistent formats
- sharpening roles and responsibilities in complex workflows
But this session, and others like it, reinforced an important limitation:
Personas are content-shaping tools. They are not fail-safe behavior locks.
That means legal professionals should not confuse “better prompt design” with “reliable workflow governance.” Both matter. They are not the same thing.
A Maturity Model We May Need
I suspect many of us in legal innovation are moving through a progression that looks something like this:
- Chat UI convenience
Fast brainstorming, rough drafting, exploratory prompting. - Prompt templates and personas
Better structure, better outputs, more repeatability—but still limited control. - Personal RAG experiments (for me, NotebookLM is a key middle step)
Source-bounded AI work on a defined corpus, better grounding, and a more reliable way to work with one’s own materials. - Structured outputs, validation, and workflow discipline
More explicit constraints, clearer formats, and less reliance on conversational “vibes.” - Operator-owned control planes for serious work
Local/self-hosted or tightly controlled runtimes with auditability, logging, and enforcement of constraints.
The step that matters most for legal projects is the move from “prompting skill” to “system design.” That is where a lot of current AI discussion still underinvests.
The Durable Takeaway
The AI lesson from this session was not “AI is bad.” It was the same thing lawyers learn in other domains: capability is not the same thing as control.
What I learned wasn’t about the topic I started with. It was about how quickly a prompting interaction can become a negotiation with a vendor-managed interface—and how unreliable that is for workflows where constraints must hold.
A highly capable model inside a vendor-managed interface can still be the wrong tool for workflows that require strict control, auditability, and reliable constraint compliance.
That is a systems lesson, not a capability lesson.
For legal professionals, I think that is the durable point. We should keep experimenting. We should keep learning. We should keep using AI. But we should also be much more explicit about this distinction:
- Use public genAI tools as valuable instruments for low-risk work: brainstorming, rough drafting, exploratory synthesis, and internal thinking.
- Do not assume any vendor-managed chat interface is the control plane for legal-grade work, especially where outputs must be reproducible, reviewable, and defensible.
- If you need legal-grade reliability, rimplement enforceable constraints: validation, logging, and a workflow that catches failures rather than reusing outputs as if constraints held.
That control plane needs to be designed. And increasingly, it probably needs to be owned by the operator.
Capability is not the same thing as control.
I expect I will have more to say as I continue experimenting with personal RAG approaches (including NotebookLM), local models, tighter workflow controls, and deterministic companion tools. If nothing else, this session clarified one new hypothesis for me:
In legal AI, architecture is now the argument.
A useful question for any legal team experimenting with AI is this: when constraints fail, what control catches the failure, records it, and keeps a bad output from being reused quietly or without anyone realizing it?
[Originally posted on DennisKennedy.Blog (https://www.denniskennedy.com/blog/)]
DennisKennedy.com is the home of the Kennedy Idea Propulsion Laboratory
Like this post? Buy me a coffee
DennisKennedy.Blog is part of the LexBlog network.