Agentic AI claim processing: from hours of manual review to minutes of verified analysis

A US healthcare insurance company processing complex, multi-document accident claims engaged Vstorm to reduce administrative burden and error rates in its claim review workflow. The resulting system combines two LLMs, algorithmic validation, document parsing via LlamaParse, and live API access to benefits databases: cutting processing time from hours to minutes while maintaining human-level accuracy. The solution delivers a verified, human-reviewer-ready summary at the end of every claim cycle.

pexels pixabay
2

LLMs used

4

sources of data

3 hours

Time required to manually process claim before

8 minutes

Time of claim processing now

undisclosed

The client is one of the leading US healthcare insurance companies, offering multiple plan types tailored to different financial circumstances and coverage needs.

Healthcare / Finances

United States

200+ employees

The significant share of claims are accident-related, though its product range extends to maternity coverage, non-accident medical treatment, and other benefit categories.

Vstorm’s impact, the TL;DR:

  • GPT and Gemini used to ensure the quality
  • LlamaParse used to parse the documents
  • Triple check on trustworthiness – GPT, Gemini and algorithmic procedures
  • Time cut on document processing, without sacrificing the accuracy and trustworthiness
  • Multi-step process that includes digital documents, scans and API analysis
  • Delivered summary that is ready for a human specialist review

The harsh reality of healthcare processing

According to the 2025 State of Claims report that 41% of survey respondents say that at least one in ten insurance claims are denied, creating a huge administrative burden and putting pressure on the healthcare system. Of those denials, 26% are said to arise from inaccurate or incomplete data collected at patient intake, errors that could be caught earlier in the process and which cause further chaos downstream.

This challenge is also seen in the insurance company, with additional administrative overhead added when patients request reevaluation, which triggers a second review cycle, additional paperwork, and further strain on an already stretched operations team.

The best scenario possible is to process claim fast, with no errors, so it is clearly approved or clearly denied, based on complete information, with no ambiguity left for appeals to exploit. And that is where Vstorm entered to help.

The complicated case of healthcare insurance

The claim processing in healthcare, especially in accidents, requires a lot of manual, complicated work. Insured individuals need to deliver an exhaustive list of documents, digital and physical alike. These include, but are not limited to:

  • The original insurance policy, where conditions, limitations and coverage are specified
  • Medical documents that provide detailed information on the procedures performed
  • Detailed descriptions of related accidents
  • Any additional documents where applicable, such as a police report or confirmation that a particular accident had actually taken place

Within these documents, it is necessary to find several pieces of interconnected information which provide a basis on whether the insurer is going to pay for the procedure, or not. At a high level, the processing looks as following:

  • Checking the time of the accident and whether the insurance was active at the time given
  • Checking if the procedures the patient received were covered and are in the limits of the insurance
  • Checking possible exclusions and additional terms and their possible impact on the payment

The challenges

This process sounds straightforward, but there are challenges along the way that make it difficult to process manually and which result in the 10% of false-positive denials mentioned above.

Extracting dates

There are many dates to extract from documents, and even more of them are included and make the image foggy. For example, the documentation has a date of the accident included, along with the birth date of the patient, and the starting and ending dates of the insurance. Also, timestamps from systems and emails, visit dates, and other information were included.

The problem gets even more complicated if one has multiple insurance policies overlapping, for example one private, one from their employer, and one provided by a veteran program.

Identifying which dates are relevant, and in what relationship to each other, requires careful, structured reading. A human reviewer doing this at volume, across dozens of claims per day, is exposed to compounding fatigue risk.

Tackling the handwriting and messy papers

The dates are delivered in both digital documents (for example a copy of the insurance policy) and physical forms completed by hand. And handwriting itself can be tricky, misleading and often illegible. At this point it is easy to confuse, for example, a flat 7 with a 1, and make comparable mistakes.

The papers themselves are also not always in a perfect shape. They can be slightly torn or creased, printed on low-quality paper, or more. Each increasing the chance of a misread that rolls through the rest of the review.

Spotting exclusions

Depending on the insurance, there are multiple exclusions and cases that need to be taken into the account when deciding whether to pay full cost of treatment or not. A common exclusion for example is not taking responsibility for accidents, as when an accident occurred while driving under the influence. It is also common to not pay for treatments if standard medical procedure order was not kept, for example, one should not take an X-ray without a direct request from a qualified medical professional.

This is contextual, policy-specific work that cannot be handled by rule-based systems alone.

Multimodal analysis

Last but not least, the documents include multiple modalities. Apart from the text, claims documentation can include images, graphs, and occasionally maps, particularly when the insured party is demonstrating that emergency transport was necessary.

TriStorm process

Strategic alignment and planning

Our consultants and engineers at Vstorm began by examining the existing claim processing workflow in detail. The objective was not to automate everything, but to identify where the highest operational leverage existed and where a targeted agentic solution would return measurable results as quickly as possible. That meant mapping where errors concentrated, where time was lost, and where a system could act with confidence without removing human oversight.

Proof of Value

Before committing to a production architecture, the engineering team worked alongside the client to test multiple approaches and technology combinations. The evaluation criteria were ROI potential, reliability under realistic document conditions, and compliance with healthcare-specific regulatory requirements. This phase eliminated approaches that performed well in controlled conditions but introduced unacceptable risk at scale.

Process augmentation

Following validation, the system was handed over to the client’s team for operational deployment. The design was built for integration into existing workflows with the goal being that specialists would receive better-prepared input, not that specialists would be removed from the process.

The technical solution: how it works

The system we built is a multi-step agentic workflow that processes each claim from raw document input through to a structured, reviewer-ready summary. It draws on four data sources: digital policy documents, scanned physical forms, supporting modalities (images, maps), and live API connections to the client’s benefits database.

Dual-LLM QA analysis

When OCR procedures are being run, two large language models analyse the claim documents independently. Running two models in parallel is a deliberate architectural decision. Where both models reach the same conclusion, confidence in that output is high. Where they diverge, the discrepancy is flagged for closer review rather than resolved automatically.

The outcome is then validated using the algorithm-based tools, so another layer of quality assurance can be added.

Indentifying exclusions

As exclusions are usually policy and case-specific, the only way to get the knowledge about them is to extract them from the provided documentation. Again, this is done by LLMs that have fine-tuned prompts and procedures to follow, allowing them to compare exclusions and cases found within the documentation provided.

This makes it possible to determine whether the exclusion or other case is relevant or not.

Benefits database integration

The system is connected via API to the client’s central benefits database, which enables real-time verification of what was paid for a particular claim. For example, there may be limits for particular procedures, be that monthly, quarterly or yearly ones. Without this integration, the LLMs would be reasoning over documents alone, without visibility into the current state of the policy.

The integration of this API and the LLMs lets the system build a complete picture of the case and prepare an analysis of all possible circumstances that may influence the final decision.

Human-reviewed summary

At the end of the workflow, the system produces a structured summary of findings: extracted dates, applicable exclusions, benefit status, and any flags raised during the dual-LLM or algorithmic review. That summary is passed to a human specialist, who makes the final decision. The system does not approve or deny claims. It prepares the specialist to do so with complete, verified information in front of them.

VStorm’s Impact

2 LLMs combined

Algorithmic evaluation

Summary ready for review

Multi-source data

Summary

The architecture is built for extensibility. The system is currently being extended to cover additional policy types, including maternity care and hospital treatment, applying the same workflow to a broader claim surface with minimal rearchitecting required.

The solution runs on Microsoft Azure. CosmosDB serves as the primary database. Document parsing is handled by LlamaParse. LLM analysis uses OpenAI’s GPT and Google’s Gemini. The benefits verification layer connects via API to the client’s existing internal systems.

Ready to see how Agentic AI in Healthcare transforms business workflows?

Meet directly with our founders and PhD AI engineers. We will demonstrate real implementations from 30+ agentic projects and show you the practical steps to integrate them into your specific workflows—no hypotheticals, just proven approaches