A/B Testing

A/B testing in OpenScouter lets you compare two or more design variants to understand which performs better for neurodivergent users. Rather than guessing which layout, copy, or interaction pattern is more accessible, you gather real feedback from real people.

When to use A/B testing

Use A/B testing when you have a specific accessibility question with more than one plausible answer. Good examples:

Two navigation structures where it’s unclear which is easier to scan
Two button label styles and you want to know which reads more clearly for dyslexic users
A redesigned form flow compared to the existing one
Colour contrast levels above WCAG minimum where you want to know if higher contrast genuinely helps

A/B testing is not the right tool for open-ended discovery. If you don’t yet know what to fix, run an exploratory study first.

Setup process

Create variants

Start a new A/B test from the Studies dashboard and select A/B Test as the study type. Give each variant a clear name that your team will recognise. Upload prototypes, screenshots, or live URLs for each variant. You can test up to four variants in a single study.

Define metrics

Choose the metrics you want to compare across variants. OpenScouter supports:

Task completion rate - whether testers complete a defined action successfully
Time on task - how long each variant takes to complete
Error rate - how often testers make a recoverable mistake
Confidence rating - how certain testers feel about their actions
Preference - which variant testers say they prefer and why

You can enable multiple metrics. Each one is tracked independently in the results.

Select ND categories

Choose which neurodivergent categories you want represented in your test. OpenScouter currently supports:

Dyslexia
ADHD
Autism
Low vision
Motor differences
Anxiety

Selecting categories here tells OpenScouter’s matching system which testers to recruit. You set a minimum number of testers per category, not a total sample size. The system handles the rest.

Stratified randomization

When testers are assigned to variants, OpenScouter uses stratified randomization rather than simple random assignment. Each ND category is treated as a separate stratum. Testers within each category are randomly distributed across your variants in equal proportions.

This prevents a common problem in accessibility testing where one variant ends up with an unrepresentative sample. If variant A happened to get most of your ADHD testers, any difference in task completion rates could reflect the tester mix rather than the design. Stratified randomization removes that risk.

You do not need to configure this manually. It runs automatically whenever you include two or more ND categories in your study.

Round management

Starting a round

Once your study is configured, click Launch Round to open it to testers. Testers who match your ND category criteria and are available will receive an invitation. You can monitor acceptance rates from the study dashboard.

Pausing and resuming

You can pause a live round at any time. Testers who have already accepted will keep access to their assigned variant. New invitations stop going out. Use pause when you need to fix a broken prototype URL or hold the study while reviewing early results.

Resume from the same dashboard control. The round picks up from where it left off, and new invitations go out to fill any remaining spots.

48-hour expiry and reminders

Each round has a 48-hour completion window. Testers who accept an invitation have 48 hours to complete their test. OpenScouter sends an automatic reminder at the 24-hour mark if the test is still incomplete.

If a tester does not complete within 48 hours, their slot is released and offered to another matched tester. This keeps studies moving without you having to chase individuals manually.

You will receive a notification when a round closes, either because all testers have completed or the window has expired.

Reading Bayesian analytics

OpenScouter uses Bayesian statistics rather than traditional null hypothesis testing. This means you get a probability that one variant is better, not just a p-value.

Convergence

As more testers complete the study, the probability estimates become more stable. The results page shows a convergence indicator for each metric. Early results can shift significantly. Once you have enough data, the estimates stop moving and the indicator shows as converged.

Do not make decisions based on early results before convergence. The system will warn you if you try to export or act on data from a study that has not yet converged.

Confidence thresholds

OpenScouter flags a result as actionable when the probability that one variant outperforms another reaches 95%. This is the default confidence threshold.

The 95% threshold means: given the data collected, there is a 95% probability that the difference you are seeing reflects a real difference in the designs rather than random variation.

You can lower the threshold to 80% for exploratory decisions where speed matters more than certainty. You can raise it to 99% for high-stakes changes. Adjust the threshold in study settings before you launch.

Results below the active threshold are shown in amber. Results that have crossed the threshold are shown in green. Results moving in the wrong direction are shown in red regardless of confidence.

Progress grid

The progress grid gives you a live view of how your study is filling in. It is a matrix with your ND categories on one axis and your variants on the other.

Each cell shows:

How many testers have completed in that category and variant combination
Your target number for that cell
A fill indicator so you can see at a glance where gaps remain

If one cell is filling faster than others, that is normal. Stratified randomization ensures the final distribution is balanced even if completion happens unevenly through the study.

Use the progress grid to decide whether to extend a round. If several cells are far from target, extending gives you more complete data before reading results.