Home / About / How we test

How we test

We favor long-term hands-on testing over launch-week impressions. Here is how we run that work in practice.

The general principle

The most useful question we can answer for a reader is not "is this product good?" but "is this product still good in six months?" Most consumer testing is done in the first two weeks of ownership, when novelty effects and a fresh battery and a freshly out-of-the-box build quality are all working in the manufacturer's favor. Most consumer regret happens later. Our testing windows are intentionally long.

Hardware reviews

For physical product reviews — appliances, audio gear, monitors, mattresses, kitchen tools — our default testing window is six to twelve weeks. We use the product as part of normal household use rather than under contrived lab conditions, and we keep a daily-or-weekly log of issues, repairs, surface wear, software updates, and anything else that would change a buying recommendation.

Where the product category includes a known durability failure mode (mattress sag, headphone hinge cracks, coffee-grinder burr wear), we extend the testing window or run a deliberate stress test to surface the issue. Multiple Curated Weekly reviews have ended with the staff member breaking the product as part of testing; we report the failure in the review.

App and software reviews

For app and software reviews, we run a minimum eight-week testing period before publishing a review. For categories where the user behavior we care about is itself slow to develop — habit-tracking apps, calorie counters, budgeting tools — we extend the window to twelve weeks or more. Two of our staff routinely use the apps under review every day during testing; we do not rely on a single reviewer's experience for app categories where individual user behavior varies meaningfully.

For nutrition and dietary-tracking apps specifically, we use a weighed-food reference: meals consumed during testing are weighed on a kitchen scale, the components are looked up against USDA FoodData Central nutrient values, and the app's reported numbers are compared against that reference. We report the absolute percent error rather than headline claims.

Roundup methodology

Roundup pieces (best-of features) start with a written category brief: what readers are likely to be choosing between, what the dominant decision criteria are, and what the field of products looks like at the time of writing. We then assemble a tested set, typically four to seven products, that covers the realistic range of buyer needs — including budget, mid-tier, and premium options where applicable.

Our rankings are not numerical scores aggregated across categories. They are written judgments by a named reviewer, expressed as a ranked list with explicit reasoning for each position. We do not auto-generate scores from a weighted spreadsheet of features; readers can do that math themselves from a feature comparison.

Recipe testing

Every recipe published on The Curated Weekly is cooked at least twice in a home kitchen with consumer-grade equipment. The second cook is done by a staff member who did not develop the recipe. Failures, surprises, and timing differences from the first cook are folded back into the published version. Recipes do not appear on this site if they have not been independently verified by a second cook.

Nutrition information, where provided, is calculated from USDA FoodData Central per-ingredient values, summed across the ingredient list, and divided by the stated serving count.

Health and clinical reporting

Health features that draw on clinical research cite the original peer-reviewed literature. We do not present preprint findings as established fact, and we explicitly distinguish in the body of the article between findings from randomized trials, observational studies, and mechanistic or preclinical work.

For consumer health-technology reviews — wearables, blood-pressure cuffs, sleep trackers — we report device output against a clinical or research-grade reference where one is available. We name the reference in the methodology paragraph and report sample sizes.

Sample sizes and statistics

We report sample sizes plainly. When a result comes from a single reviewer's experience, we say so. When a result comes from a multi-week multi-tester structured test, we describe the structure. We do not present anecdote as data.

Pricing

Pricing in our reviews is the retail price at the time of publication. Where prices change frequently — kitchen appliances, electronics — we update the article's modified date when we update a price. Our prices are not pulled from an affiliate API; we check them by hand against the retailer's posted price.

Failure to recommend

Sometimes a roundup ends without a top recommendation. If we cannot honestly recommend a product in a category, we say so and explain why. There is no editorial expectation that every roundup must produce a buy recommendation; a candid "none of these is good enough" is a perfectly acceptable conclusion.