Health

How accurate are consumer sleep trackers, really?

A side-by-side comparison of six widely used wearables against polysomnography — the clinical reference standard — in a small home sleep-lab study, with implications for what the numbers in your phone are actually telling you.

By Aaron Lindholm, Kavita Iyer

Published March 14, 2026 · 8 min read

The bottom line

Consumer sleep trackers are reasonably good at measuring total sleep time and reasonably bad at measuring sleep stages. Trends across multiple weeks are more reliable than any individual night. Treat the architecture-level claims (REM percentage, deep sleep duration) with substantial skepticism.

The promise of consumer sleep tracking is that an inexpensive wrist or finger device, worn nightly, will tell you something useful about your sleep that you could not otherwise know. The claim has been around since the original Fitbit added sleep tracking in 2012, and it has gotten more elaborate every year since. The current generation of devices does not just count minutes — it claims to measure REM sleep, deep sleep, sleep efficiency, sleep “stages,” sleep “scores.”

How well does any of that hold up against the clinical reference?

We ran a small home-based comparison: six wearables, two participants, ten nights of paired data on each device, with each night also recorded by a portable polysomnography setup that produces clinical-grade EEG-based sleep staging. The conclusions are not new to the sleep-research community, but they are not what the consumer marketing implies, and we think they are worth saying clearly.

What the trackers got right

Total sleep time was the metric the consumer wearables handled best. All six devices in our test produced total-sleep-time figures that were within 30 minutes of the polysomnography reference on most nights. The mean absolute error ranged from about 14 minutes (Oura Ring Generation 4) to about 28 minutes (Fitbit Charge 6). For the question “did I sleep around seven hours or around five and a half last night,” these devices answer reliably.

Sleep timing — the time of falling asleep and the time of waking up — was similarly tractable for all six devices, with errors typically under 15 minutes on either side. The wake-after-sleep-onset metric (how much time the user spent awake during the night after first falling asleep) was somewhat less accurate, with errors typically in the 15-to-25-minute range across devices.

For the underlying questions of “did I get enough sleep” and “is my sleep getting more or less fragmented over the last few weeks,” the consumer trackers we tested are good enough.

What the trackers got less right

Sleep-stage architecture — the percentage of the night spent in REM, light, and deep sleep — was meaningfully harder for all six devices. Mean absolute error in REM percentage estimation was in the range of 8 to 14 percentage points across the field, which is the same order of magnitude as the actual REM percentage on a typical night. Deep-sleep estimation was similarly noisy.

Why this is hard is straightforward to explain. Sleep stages are defined by characteristic patterns in the EEG signal. The wearables do not measure EEG; they measure heart rate, heart rate variability, and movement, and they infer the underlying sleep stage from those signals using a learned model. The mapping is imperfect because heart rate and movement carry only partial information about what the brain is doing.

For the question “did I have a particularly REM-poor night last night,” the consumer wearables do not reliably answer. For the question “has my REM percentage trended down over the last six weeks,” they do better — the noise tends to average out over many nights.

Differences between the leading devices

Within the field, the rough ordering on accuracy in our small test was: Oura Ring (Generation 4) and Whoop 5.0 produced the tightest agreement with the polysomnography reference. Apple Watch (Series 10 with the latest watchOS) and Garmin Venu 3 were in the middle. Fitbit Charge 6 and a Samsung Galaxy Watch 7 were toward the back.

These rankings are based on a sample of 60 nights of paired data, which is small enough that we caution against treating the ranking as definitive. The peer-reviewed literature on this question — the largest comparison studies have been published over the last three years — broadly agrees with our finding that the field is competent at total sleep time and uneven at architecture, with somewhat closer rankings between the devices than the marketing implies.

The Oura Ring’s slight edge in our testing matches its position in several of the larger published comparisons, and we suspect it reflects a combination of the finger-vs-wrist sensor placement, which gives a cleaner heart-rate signal, and the company’s longer history of refining the underlying model. We would not, however, take this as a strong reason to switch from a wrist device that you already use consistently.

What to do with the numbers

The most useful posture toward consumer sleep data is one of trend monitoring with skepticism about specifics. Look at total sleep time across weeks, not nights. Look at sleep timing across weeks, not nights. Treat the sleep-stage breakdown as a directional signal rather than a measurement.

If a sleep tracker tells you you got 22 minutes of REM last night, the realistic confidence interval on that number is roughly 0 to 80 minutes — too wide to act on as a single-night reading. If a sleep tracker tells you you’ve averaged 22 minutes of REM per night over the last six weeks, when you used to average 60 minutes, that is a directional signal worth investigating, even if neither the 22 nor the 60 is precisely correct.

The category that the consumer trackers are not equipped to handle is medical diagnosis. Sleep apnea screening, periodic limb movement disorder, REM behavior disorder, narcolepsy — these are diagnoses that require clinical evaluation, sometimes including a formal sleep study. The marketing copy on consumer trackers has tiptoed closer to medical claims in recent years, but the regulatory frameworks have largely held the line; if your symptoms suggest a sleep disorder, see a sleep physician.

The longer view

Consumer sleep tracking is in roughly the same place that consumer heart-rate monitoring was in 2014. The basic measurement (total sleep time, average heart rate) is reliable; the derived measurements (sleep stages, heart rate variability scores) are noisier than the marketing suggests but slowly improving. Five years from now, the architecture-level numbers will likely be tighter than they are today, and the medical-grade claims will have either been validated or quietly retracted.

In the meantime, the most useful answer the technology gives you is the boring one: did you get enough sleep last night, and is the answer changing over time. Treat the rest as an interesting work in progress.

Frequently asked questions

Which is the most accurate consumer sleep tracker?

Total sleep time accuracy is broadly similar across the leading wearables — Oura, Whoop, Apple Watch, Garmin, Fitbit, and a few others — typically within 15 to 30 minutes of polysomnography on a typical night. Sleep-stage accuracy varies more between devices and is meaningfully worse than total-sleep-time accuracy across the board. There is no single device that is dramatically more accurate than the field.

Should I trust the REM and deep-sleep numbers in my app?

Not as point estimates. As multi-week trends — am I getting more or less REM-rich sleep than I was a month ago — they are reasonably useful. As single-night readouts that you should react to, they carry too much measurement error to be reliable.

What is polysomnography and why is it the reference?

Polysomnography is overnight monitoring with electrodes that directly measure brain electrical activity (EEG), eye movement (EOG), muscle activity (EMG), and respiration. It is the established clinical reference for sleep staging because the underlying sleep stages are defined by EEG patterns, which only PSG can measure directly. Wrist-based wearables infer sleep stages from heart rate, heart rate variability, and movement, which is necessarily a less direct measurement.

If consumer trackers can't measure sleep stages accurately, what are they good for?

Total sleep time, sleep timing (when you fell asleep, when you woke up), wake-after-sleep-onset, and overall trend monitoring. They can detect when you are systematically getting less sleep than you used to, when your sleep is becoming more fragmented, and when major life changes are showing up in the data. Those are useful functions even if the architecture-level numbers are noisier than the marketing implies.

Should I get a clinical sleep study?

If you snore loudly, wake up gasping, are excessively sleepy in the daytime, or have other symptoms suggestive of sleep apnea, see a sleep physician. A consumer wearable is not a screening tool for sleep apnea, and the few features that claim to be — like overnight oxygen saturation tracking on some smartwatches — have been validated for trend monitoring rather than clinical diagnosis. A clinical study answers different questions than a wearable does.

← More Health coverage