Why AI Gets Health Data Wrong: The Limits of LLMs in Digital Health

A few weeks ago, a Washington Post journalist invited the new ChatGPT Health to audit a decade of his life, feeding the model 29 million steps and 6 million heartbeat measurements. He requested a simple grade for his cardiovascular fitness; the bot handed him an "F". Yet, further assessments swung erratically from an "F" to a "D" across different conversations. Anthropic’s Claude offered no better clarity. While his physician dismissed the results and Dr. Eric Topol, a worldwide leading voice in digital medicine, characterized the findings as entirely "baseless", this is not merely a chatbot having a bad day. It reveals a profound dissonance between the probabilistic architecture of Large Language Models (LLMs) and the rigorous requirements of human physiology.

Today, we explore why these systems fail when confronted with real physiological data, where the technical mismatch between language models and the human body originates, and what it actually takes to turn wearable signals into reliable health insights.

What Was Wrong with LLMs in healthcare?

The Washington Post experiment exposed a triad of failures: unstable reasoning, unvetted metrics, and opaque privacy.

First, inconsistency. LLMs are probabilistic systems. They generate responses based on patterns rather than grounded physiological reasoning. Ask the same question twice, and small contextual shifts can lead to different outputs. There is no internal mechanism to verify whether one answer is more accurate than another. Parameters like "temperature" manage output randomness for creativity, not clinical confidence.

Second, the misuse of metrics. The models treated estimates such as heart rate variability or VO₂max as definitive truths. In reality, these are approximations. Wearable-derived VO₂max, for example, can deviate significantly. Without understanding these limitations, LLMs present uncertain signals as objective facts.

Finally, privacy concerns. Many of these tools operate outside regulated healthcare environments, raising questions about how sensitive health data is handled.

Why Can’t LLMs Understand the Human Body?

At the core, this is a technical mismatch.

LLMs are built to process language. They break information into tokens, discrete chunks of text, and predict what comes next. But the human body is not a sequence of words. It is a continuous, dynamic system.

Physiological data unfold over time. Heart rate, movement, and sleep are all defined by patterns, variability, and context. When this data is compressed into text, the temporal dynamics that define health are lost.

Wearable ecosystems add to the challenge. Different devices use different sampling rates, algorithms, and definitions. A “step” is not always a step. Sleep stages vary. Heart rate variability is calculated differently across platforms.

Without a unified framework, this creates fragmentation. For an LLM, that fragmentation leads to confusion rather than insight.

There is also the issue of missing data. Wearables produce gaps—when devices are not worn, batteries die, or sensors disconnect. These gaps are meaningful. But generic AI models do not understand them. Instead of acknowledging uncertainty, they often fill in the blanks.

In short, when faced with incomplete or inconsistent data, LLMs improvise.

Precision Harmonization: Turning Signal into Narrative

Health data needs to be harmonized before it can be interpreted. That means translating signals from multiple devices into a unified model where data is consistent and comparable. At Thryve, this is achieved through our wearable API layer that integrates data from over 500 sources. Instead of treating each device as its own system, it creates a shared language for physiological data.

This does more than standardize inputs. It embeds domain knowledge, understanding how metrics relate, how they should be interpreted, and where their limitations lie. Only once this foundation is in place can AI be applied reliably. In this setup, LLMs are not the source of truth. They act as interfaces, helping translate structured insights into human-readable outputs.

How Can Privacy Be Ensured in AI Health Systems?

Handling health data requires more than technical capability; it requires trust.

One emerging solution is Time-Series Retrieval-Augmented Generation (RAG). Instead of sending raw personal data to a model, relevant insights are retrieved from secure systems and passed in an anonymized format.

This ensures that:

Personally identifiable information stays protected
Data remains within controlled environments
Insights can still be generated without exposing raw signals

Privacy, in this context, is not an add-on. It is built into the architecture.

What Comes After Generative AI in Health?

The limitations of current systems point toward a broader shift.

We are moving into an era of Physical Intelligence, where AI systems are designed to understand cause-and-effect relationships in the human body, not just patterns in text.

This means:

Learning from continuous physiological data
Understanding how behavior impacts health over time
Moving from correlation to causation

In this model, LLMs still play a role, but as interfaces rather than decision engines. They help communicate insights generated by specialized systems rather than producing them independently.

What’s the Real Takeaway?

The Washington Post experiment may seem like an isolated case, but it highlights a fundamental issue.

The human body cannot be reduced to text. It cannot be accurately interpreted through probabilistic outputs alone. It requires systems designed for continuous, complex, and contextual data.

AI will play a major role in the future of health. But success will depend on how well we adapt these systems to the realities of human physiology, not the other way around.

The next generation of digital health will not be built on language models alone. It will be built on systems that truly understand the body.

Book a demo with Thryve and unlock the full potential of LLMs in Healthcare!

‍

Can AI Understand Your Health? Why LLMs Struggle with Wearable Data

What Was Wrong with LLMs in healthcare?

Why Can’t LLMs Understand the Human Body?

Precision Harmonization: Turning Signal into Narrative

How Can Privacy Be Ensured in AI Health Systems?

What Comes After Generative AI in Health?

What’s the Real Takeaway?

André De Oliveira Gomes

Latest Updates and Perspectives

Let’s Explore the Future of Health Data Together.

Wearable API to access
and analyze +500 devices
and data sources.

Can AI Understand Your Health? Why LLMs Struggle with Wearable Data​

What Was Wrong with LLMs in healthcare?

Why Can’t LLMs Understand the Human Body?

Precision Harmonization: Turning Signal into Narrative

How Can Privacy Be Ensured in AI Health Systems?

What Comes After Generative AI in Health?

What’s the Real Takeaway?

Stay ahead with our curated insights and expert analyses.

André De Oliveira Gomes

Latest Updates and Perspectives

Let’s Explore the Future of Health Data Together.

Wearable API to access and analyze +500 devices and data sources.

Stay Informed

Can AI Understand Your Health? Why LLMs Struggle with Wearable Data

Wearable API to access
and analyze +500 devices
and data sources.