Apple Watch, Oura, WHOOP and Fitbit: What a Personal Experiment Reveals—and What It Doesn't

Jun 8

Written By M C

A recent article caught my attention because it attempted something many wearable users have wondered about:

Which wearable is actually the most accurate?

The author wore multiple devices simultaneously, compared them against a sleep laboratory study, and documented the results in detail. As consumer health journalism goes, it was thoughtful, transparent, and more rigorous than most wearable reviews.

As a neurologist with a particular interest in sleep, biometrics, metabolism, and brain health, I appreciated the effort. But I also found myself asking a different question:

Can a single-person experiment tell us which wearable is truly the most accurate?

The answer is both yes and no.

The article provides useful real-world insights into how these devices perform for one individual. However, determining accuracy requires a different type of evidence: validation studies involving dozens or hundreds of participants, compared against established gold-standard measurements.

Fortunately, those studies now exist.

What the Article Did Well

Before discussing limitations, it's worth acknowledging what the author did right.

First, she compared wearable data against polysomnography (PSG), the gold standard for sleep measurement.

Second, multiple devices were tested simultaneously, allowing direct comparisons under identical conditions.

Third, the article examined several metrics that matter to consumers:

Sleep duration
Sleep stages
Heart rate
Recovery scores
User experience

Most wearable reviews never go beyond comparing app screenshots or subjective impressions. This was a more thoughtful approach.

The Challenge of N = 1

The primary limitation is not the quality of the effort. It is the sample size.

Wearable performance is not fixed. Accuracy varies based on numerous factors, including:

Age
Sex
Skin tone
Body composition
Sleep disorders
Fitness level
Arrhythmias
Movement patterns
Device placement

A wearable that performs exceptionally well in one person may perform differently in another.

From a scientific perspective, a sample size of one can generate interesting observations, but it cannot establish overall accuracy.

That requires larger validation studies.

What the Scientific Literature Shows

Over the past several years, researchers have compared many popular wearables—including Apple Watch, Oura Ring, WHOOP, Fitbit, Garmin, and Polar devices—against laboratory standards such as polysomnography and electrocardiography (ECG).

The results are surprisingly consistent.

Sleep Duration: Better Than Many People Think

One of the strongest findings across studies is that modern wearables are generally quite good at estimating total sleep time.

Most devices perform reasonably well when measuring:

Time asleep
Bedtime
Wake time
Sleep duration trends

For most users, these metrics are often accurate enough to identify meaningful behavioral patterns.

Sleep Staging: Still a Work in Progress

This is where things become more complicated.

Consumers often assume that their wearable can precisely determine how much REM sleep, deep sleep, and light sleep they obtained.

In reality, wearables do not directly measure:

Brain activity (EEG)
Eye movements
Muscle tone

These are the physiological signals used by sleep laboratories to classify sleep stages.

Instead, wearables infer sleep stages from combinations of:

Heart rate
Heart rate variability
Movement
Skin temperature
Proprietary algorithms

As a result, even the best devices frequently struggle when compared against PSG.

Interestingly, this mirrors one of the key findings from the journalist's experiment: different devices often disagreed substantially about sleep stages.

The scientific literature suggests that this disagreement is not unusual.

Heart Rate: One of the Strongest Metrics

Heart rate measurements have become one of the most reliable outputs from modern wearables.

Multiple validation studies demonstrate strong agreement between devices such as Apple Watch, Oura Ring, WHOOP, and ECG-based reference standards, particularly during rest and sleep.

For tracking trends over time, resting heart rate appears to be one of the most useful wearable metrics available today.

HRV: Useful, But More Variable

Heart rate variability (HRV) has become increasingly popular as a marker of recovery, stress resilience, and autonomic nervous system function.

Recent studies suggest that nocturnal HRV measurements can perform reasonably well, but accuracy varies considerably depending on:

The device used
The specific HRV metric measured
The timing of measurement
The underlying algorithm

This likely explains why users often see very different HRV values when comparing Oura, WHOOP, Apple Watch, Polar, and other platforms.

In clinical practice, the trend over time is often more informative than any single daily HRV value.

Calorie Burn: Still the Weakest Metric

If there is one area where wearables continue to struggle, it is energy expenditure.

Despite increasingly sophisticated sensors and algorithms, calorie estimates remain highly variable when compared with laboratory measurements.

This finding has been remarkably consistent across multiple studies.

For that reason, calorie burn estimates should generally be viewed as rough approximations rather than precise physiological measurements.

The Question That Matters Most

When discussing wearable accuracy, I often think we focus on the wrong question.

The question is not:

"Can my wearable perfectly measure my biology?"

The more useful question is:

"Can my wearable help me recognize patterns that improve my health?"

A wearable may not perfectly quantify:

Deep sleep
Recovery
Stress
Readiness

Yet it may still help users recognize:

Chronic sleep deprivation
Reduced physical activity
Early signs of illness
Elevated resting heart rate
Changes in recovery patterns
The impact of lifestyle choices

From a brain health perspective, that may be where the greatest value lies.

My Takeaway

The journalist's experiment provided an engaging and informative look at how modern wearables perform in everyday life.

But the broader scientific literature suggests a more nuanced conclusion.

Today's wearables are generally quite good at measuring:

Heart rate
Sleep duration
Activity trends
Changes from an individual's baseline

They are less reliable for:

Precise sleep staging
Energy expenditure
Drawing conclusions from a single day's data
Diagnosing medical conditions

The real power of wearables may not be their ability to perfectly measure physiology.

Their greatest value may be that they encourage people to pay attention to behaviors that influence long-term health: sleep, exercise, recovery, stress management, and metabolic health.

For those interested in brain health, that may ultimately matter more than whether last night's deep sleep was 72 minutes or 88.

References

de Zambotti M, et al. Validation of Six Wearable Devices for Estimating Sleep, Heart Rate and Heart Rate Variability in Healthy Adults. Nature and Science of Sleep. 2022.
Robbins R, et al. Validation of Apple Watch Series 8, Oura Ring Gen3, and Fitbit Sense 2 Against Polysomnography. Sleep. 2024.
Van den Eynde J, et al. Consumer Wearables and Their Usefulness in Health Research: An Umbrella Review. JMIR mHealth and uHealth. 2024.
Miller DJ, et al. Validation of Oura Gen3, Oura Gen4, WHOOP 4.0, Garmin and Polar HRV Metrics Against ECG. Sensors. 2025.
Chinoy ED, et al. Performance of Commercial Wearables in Measuring Sleep and Wakefulness. Sleep. 2021.
de Zambotti M, et al. Wearable Sleep Technology in Clinical and Research Settings. Sleep Medicine Reviews. 2019.

About the author:

Myrna Cardiel, MD is a neurologist, former Clinical Professor of Neurology at NYU Langone Health, and founder of Cardiel Precision Brain Health. She specializes in migraine, women's brain health, sleep, cognitive health, and the intersection of metabolism and brain function.

An early adopter of wearable technologies and health tracking tools, Dr. Cardiel is particularly interested in how devices such as Oura, Apple Watch, continuous glucose monitors, and other digital health tools can be used to support evidence-based, personalized approaches to brain health and healthy aging.

As both a clinician and wearable user, she believes the greatest value of these technologies lies not in producing perfect measurements, but in helping individuals recognize patterns that support better long-term health.

M C