Skip to main content

Lift Test Design Glossary

Updated over a week ago

Lift tests are the foundation of WorkMagic’s approach to measuring true marketing impact. Before setting up or analyzing a test, it’s important to understand the key concepts behind geo-randomization, holdouts, and measurement accuracy.

This glossary serves as a quick reference for all major terms used in WorkMagic’s lift test design. Each definition includes practical context to help you apply these concepts correctly during setup, validation, and analysis; ensuring every test you run produces statistically sound and actionable insights.

Term

Quick Definition

Explanation

DMA

Geographic region used for geo-lift randomization

A DMA is a geographic region where the population receives the same media offerings (TV, digital, etc.).

In geo-lift testing, DMAs are used as units of randomization—some are exposed to ads (test group), others are not (control group).

For example, Tennessee and Kentucky might be test DMAs, while Kansas and Maryland might be control DMAs.

Holdout Group

Regions that are witheld from serving ads

The geographic areas where advertising is turned off (in pause-to-measure tests) or never launched (in launch-to-measure tests). This group serves as the baseline to measure what would happen without the advertising treatment.

Exposed Group

Regions that are exposed to ads

The geographic areas that receive the advertising treatment—either continuing to run ads (in pause-to-measure tests) or launching new campaigns (in launch-to-measure tests). Performance in this group is compared against the holdout group to measure incremental impact.

Pause-to-Measure

Turn off ads in some regions to measure there impact

A test method where you pause advertising in selected regions (the hodout l group) and keep running ads everywhere else. This shows you what happens when ads are absent.

Launch-to-Measure

Launch ads in new regions to measure their impact

A test method where you launch a new campaign only in specific test regions while keeping other as business-as-usual without the campagin. This shows you what happens when the new campaign are present.

AA Test

A validation check before the real test

A pre-test to make sure your measurement setup isn't broken. You split your regions into fake "test" and "control" groups during a period when nothing changes, then check if the analysis falsely detects a difference. If it does, your test design needs adjustment.

Power

The test's probability of detecting a statistically significant result when there is one. This is calculated during the test design to ensure the test can detect meaningful treatement effect.

The likelihood that your test will successfully detect a real advertising effect if such an effect truly exists. Standard practice aims for ≥80% power, meaning at least an 80% chance of identifying true lift. In hypothesis testing terms, power = 1 - β, where β is the Type II error rate (false negative). Higher power reduces the risk of missing real effects.

Power Calculation

Pre-test analysis determining if your design can detect meaningful effects

The process conducted during test design to determine whether your experiment has sufficient power to detect the expected treatment effect.

Power improves by:

(1) reducing variance through smart geo selection,

(2) increasing sample size (include more DMAs,larger holdout%, or longer duration),

(3) increasing treatment intensity (higher budgets), or

(4) testing hypotheses expected to produce larger effects (e.g. testing larger tactics/strategies that are meaningfully different from each other).

This calculation ensures you're not running an underpowered test that wastes time and money.

Statistical Significance

Indicate the likelihood of a test result that have occured is a real effect instead of by random chance. This is a measure of the result post experiment based.

A measure (usually 95% confidence) that tells you whether the lift you observe is real or just random noise. A p-value under 0.05 means the results are statistically significant—you can trust that your ads caused the change.

Holdout %

The estimated portion of incremental orders coming from the regions in the holdout group.

The percentage of total orders or revenue that comes from regions assigned to the holdout group during the test. This represents the portion of business not receiving the advertising treatment. Example: A 10% holdout means approximately 10 out of every 100 daily orders come from holdout regions. The holdout % is determined by the power calculation, and also indicates the potential business impact of witholding ads from these regions.

Daily Budget Recommendation

The minimum daily budget required to meet sufficient statistical power for the test to detect lift if there is truly one.

In general daily budget in the WM UI is cacluated by the input CPA estimate × Expected Daily Orders calculated from the backend model. See test design question section belowe for more details.

Example from the doc:
Axon CPA = $22 → $22 × 377 daily orders ≈ $8,298/day
Meta CPA = $36 → $36 × 304 daily orders ≈ $10,963/day

CPA (Cost per Action, usually Cost per Order)

Average cost to get one conversion

The average cost to acquire one conversion (e.g., an order or new customer).
Calculated as Total Ad Spend / Total Conversions
Lower CPA suggest higher cost efficiency.
In the doc: Axon = $22; Meta = $36 (based on prior 3 months).

MDL(Minimum Detectable Lift) or MDE (Minimum Detectable Effect).

Smallest lift your test can reliably measure given the test design and power calculation. This is stemmed from power calculation

The minimum change in your key metric (e.g., orders, revenue) that your test design can detect with the specified statistical power (usually 80%) and significance level (usually 95%).

Example: If MDL = 2.5%, your test can reliably identify lift of 2.5% or greater, but effects smaller than 2.5% may not achieve statistical significance even if real. MDL can be expressed in percentage change or absolute orders change.

Test Duration

The duration of the test actively running, when the holdout and the exposed group is in place. This is typicall 21 days.

Duration when ads are paused or launched (e.g., 21 days in this plan).
→ Measures immediate impact while treatment is active.

Cooldown Period

The period post test conclusion while lagged conversions from the test are being measured.

Time after the test (e.g., 7 days) to allow delayed ad effects (lagged conversions, browsing) to mature before reading results.uring this period, media setup is returned to business as usually, both holdout and exposed group are released. This is usually set at 7 days, but should be determined based on specific product purchase cycle.

2-Cell Test

The test only includes 2 cell with an exposed group and an holdout group, and measures one treatment (testing subject).

2-Cell Test:
One test group and one control group (e.g., “Axon ON” vs. “Axon OFF”).
→ Simpler, used when testing a single channel.

3-Cell/4-Cell Test

A test includes more than 2 cells includes and measures multiple treatments, each isolated in one cell respectively, and compared to the same baseline reference group.

3-Cell Test: Adds an additional test cell (e.g., “Axon only,” “Meta only,” “Axon+Meta combined”). → Measures cross-channel halo effects and interaction between media platforms.

Did this answer your question?