Back to blog

The hidden standard behind every reliable AI Agent

Bartosz Adam Gonczarek

Vice President, Co-founder

May 22, 2026

Category Post

Agentic AI AI Advisory

Table of content

“After deploying multiple production-grade AI agents, I found myself turning to philosophy of language — because it offered a more precise vocabulary for explaining why some agents work better than others”
Bartosz Adam Gonczarek, PhD — Adjunct Professor at University of Technology and Arts, Wroclaw Business Academy Branch, and Co-founder of Vstorm

At Vstorm, we build agentic AI systems for production environments. That means real clients, real workflows, and real consequences when something goes wrong.

Over the last two years, we’ve shipped multiple systems handling medical triage, highly customized print-on-demand order completions, design of engineering operating models, and a growing range of domain-specific automation tasks. Some of these systems work exceptionally well, consistently, reliably, and at accuracy levels that make them genuinely useful.

These are different from AI solutions that drift, confuse users, or produce outputs that are technically correct but somewhat wrong in a way that’s hard to name. For a long time, I looked for a way to articulate what separated those two groups. There was something more fundamental going on, something that showed up in the way that language is used by these systems. And when I turned to the philosophy of language, things started to click.

The problem that didn’t have a name

Large Language Models are trained on the surface patterns of language, which Wittgenstein called surface grammar: the syntactic clothing of words, the outward form of a sentence. They are extraordinarily good at this. LLMs can produce a fluent apology, a confident medical disclaimer, a reassuring follow-up, all without any of the underlying understanding that those expressions normally presuppose coming from human beings.

Wittgenstein, a philosopher of language from the 20th century, is known for distinguishing surface grammar from depth grammar, being the actual logic of how words function within a specific practice. When a person says “I know,” they are making a claim that carries epistemic responsibility: a commitment to justification, to being held accountable for being wrong.

On the other hand, when an LLM produces “I know,” it is generating a high-probability token sequence. The surface is identical, but the depth is often absent. This gap, between the confident surface and the hollow depth, is the root of most LLM failures out there. Not hallucination in the narrow technical sense, but something broader.

The language is being used outside the game it belongs to. Such systems commit what Wittgenstein would call a language-game violation, importing the rules of one exchange into another where they do not apply. The user becomes disoriented, not because the information is wrong, but because the mode of speaking is wrong. They expected a constrained, professional, careful voice, but what they get is often made to please, with a loose connection to credibility.

Such AI failures that I have seen aren’t accuracy failures in a statistical sense; they are attunement failures.

What is attunement?

In a peer-reviewed paper that I’ll present at the AWLS symposium in August 2026, I introduce attunement as a condition under which a language-using machine operates without generating the kinds of linguistic confusion Wittgenstein spent his career diagnosing. The paper is called “Attunement in Words: A Wittgensteinian Criterion for Language-Using Machines.”

The term itself is borrowed from therapeutic practice, where attunement describes a caregiver’s calibrated responsiveness to another person’s emotional and communicative state. In this context, it is a relational adjustment that enables a reciprocal, sense-making exchange.

I have chosen to extend it to human-machine language interaction: attunement is the mutual adjustment in which a machine does not overreach the conceptual and linguistic norms of its exchange, while the human’s expectations are calibrated to what the system can actually do.

In practical terms, I define attunement through three conditions:

Surface grammar discipline. The system operates within the surface grammar of its domain without using linguistic fluency to assert things it cannot verify. It signals the limits of its outputs clearly.
Language-game consistency. The system does not import terms from one domain into another in misleading ways.
Aspect restraint. The system does not invoke the full evocative range of words whose depth grammar it cannot access or understand. It uses words in their domain-constrained sense.

These conditions can be verified by the behavior of an agentic system, without needing to make any claims about what is happening inside it. That is deliberate. Wittgenstein was sceptical of inferring inner states from outer behavior as a criterion of meaning.

Attunement follows the same logic: it is a pattern of exchange, not a mental event.

A surprising finding

But here is something I did not expect when I started writing the paper: Attunement is hardly a future ideal. It is not something that can be engineered. At Vstorm, we have already shipped systems that approximate it. Every system that we managed to launch which has overperformed roughly 95% accuracy in its intended domain shares the same structural trait: it stays in its lane.

It operates within the grammatical rules of its context. It signals uncertainty instead of glossing over it. It does not borrow vocabulary from adjacent domains to apply it loosely. Instead, it is constrained, deliberately and architecturally. This constraint is precisely what makes them trustworthy.

Such systems were not designed with Wittgenstein in mind. No one on the engineering team was reading Philosophical Investigations between sprints. But they satisfy his criteria anyway. In other words, constrained, intentional design naturally produces attunement. The convergence here is the finding.

Why it matters

Right now, most teams building agentic AI systems discover attunement failures the hard way: through user complaints, edge case audits, or the vague sense that a system which passes all its benchmarks still doesn’t feel right.

The diagnostic tools that are available to spot hallucination, misalignment, and low confidence capture only a part of the problem. A system may produce a response with high confidence and low hallucination rate and still commit a language-game violation that leaves the user misled.

Wittgenstein’s framework provides a more precise vocabulary to tag such failures. Surface/depth grammar distinction gives us language for the gap between fluent output and grounded use. Language-game consistency gives us language for domain boundary violations. Aspect restraint gives us language for the problem of evocative overreach, when a system invokes the full emotional or conceptual weight of a word it is only statistically approximating. What is actually needed is attunement as a design criterion.

What this means for Agentic AI

I expect that with the publication, attunement will prove to be a useful prerequisite for every production-grade agentic AI system. The reason is structural. As agentic systems take on more complex, multi-step tasks (reasoning across tools, communicating outputs to non-technical users, operating at the intersection of multiple professional domains) the risk of language-game violations compounds at every step. The systems that will reach, and sustain, above 95% accuracy in production are the ones that solve this. My soon-to-be-published paper will establish the theoretical framework. The follow-up work, design pattern formalization, is what comes next.

What philosophers would say about it

I am not sure Wittgenstein would have been comfortable with the idea of his remarks being applied as an engineering specification. But I think he would have recognized the problem himself with ease: language used carelessly, confidently, and in the wrong context obviously leads to confusion instead of clarity. That is the problem the world is currently solving. And now, at least, we have a name for it.

After the conference, I’ll share the full paper, but feel free to reach out to us if you’d like early access prior to publication.

Wish to learn more?

Meet directly with the author and PhD. He will be happy to discuss his research and insights gathered from real implementations of 30+ agentic projects.

Book your call with Bart today

Bartosz Adam Gonczarek is a researcher and practitioner at the intersection of philosophy of language and applied AI. He is an Adjunct Professor at the University of Technology and Arts and is co-founder of Vstorm.

Last updated: May 21, 2026

Summarize with AI

The LLM Book

The LLM Book explores the world of Artificial Intelligence and Large Language Models, examining their capabilities, technology, and adaptation.

Read it now

Join the newsletter!

The hidden standard behind every reliable AI Agent

The problem that didn’t have a name

What is attunement?

A surprising finding

Why it matters

What this means for Agentic AI

What philosophers would say about it

Wish to learn more?

Summarize with AI

The LLM Book

Read more from this category

Why most healthcare AI pilots fail and what mid-market health systems should do differently

What makes a decision-maker ready for AI adoption?

How to choose a healthcare AI implementation partner: 8 questions to ask before you sign

AI proof of concept vs production-grade agent: key differences is design and intent