A/B Testing
A/B testing in OpenScouter lets you compare two or more design variants to understand which performs better for neurodivergent users. Rather than guessing which layout, copy, or interaction pattern is more accessible, you gather real feedback from real people.
When to use A/B testing
Use A/B testing when you have a specific accessibility question with more than one plausible answer. Good examples:
- Two navigation structures where it’s unclear which is easier to scan
- Two button label styles and you want to know which reads more clearly for dyslexic users
- A redesigned form flow compared to the existing one
- Colour contrast levels above WCAG minimum where you want to know if higher contrast genuinely helps
A/B testing is not the right tool for open-ended discovery. If you don’t yet know what to fix, run an exploratory study first.
Setup process
Create variants
Start a new A/B test from the Studies dashboard and select A/B Test as the study type. Give each variant a clear name that your team will recognise. Upload prototypes, screenshots, or live URLs for each variant. You can test up to four variants in a single study.
Define metrics
Choose the metrics you want to compare across variants. OpenScouter supports:
- Task completion rate - whether testers complete a defined action successfully
- Time on task - how long each variant takes to complete
- Error rate - how often testers make a recoverable mistake
- Confidence rating - how certain testers feel about their actions
- Preference - which variant testers say they prefer and why
You can enable multiple metrics. Each one is tracked independently in the results.
Select ND categories
Choose which neurodivergent categories you want represented in your test. OpenScouter currently supports:
- Dyslexia
- ADHD
- Autism
- Low vision
- Motor differences
- Anxiety
Selecting categories here tells OpenScouter’s matching system which testers to recruit. You set a minimum number of testers per category, not a total sample size. The system handles the rest.
Stratified randomization
When testers are assigned to variants, OpenScouter uses stratified randomization rather than simple random assignment. Each ND category is treated as a separate stratum. Testers within each category are randomly distributed across your variants in equal proportions.
This prevents a common problem in accessibility testing where one variant ends up with an unrepresentative sample. If variant A happened to get most of your ADHD testers, any difference in task completion rates could reflect the tester mix rather than the design. Stratified randomization removes that risk.
You do not need to configure this manually. It runs automatically whenever you include two or more ND categories in your study.
Round management
Starting a round
Once your study is configured, click Launch Round to open it to testers. Testers who match your ND category criteria and are available will receive an invitation. You can monitor acceptance rates from the study dashboard.
Pausing and resuming
You can pause a live round at any time. Testers who have already accepted will keep access to their assigned variant. New invitations stop going out. Use pause when you need to fix a broken prototype URL or hold the study while reviewing early results.
Resume from the same dashboard control. The round picks up from where it left off, and new invitations go out to fill any remaining spots.
48-hour expiry and reminders
Each round has a 48-hour completion window. Testers who accept an invitation have 48 hours to complete their test. OpenScouter sends an automatic reminder at the 24-hour mark if the test is still incomplete.
If a tester does not complete within 48 hours, their slot is released and offered to another matched tester. This keeps studies moving without you having to chase individuals manually.
You will receive a notification when a round closes, either because all testers have completed or the window has expired.
Reading Bayesian analytics
OpenScouter uses Bayesian statistics rather than traditional null hypothesis testing. This means you get a probability that one variant is better, not just a p-value.
Convergence
As more testers complete the study, the probability estimates become more stable. The results page shows a convergence indicator for each metric. Early results can shift significantly. Once you have enough data, the estimates stop moving and the indicator shows as converged.
Do not make decisions based on early results before convergence. The system will warn you if you try to export or act on data from a study that has not yet converged.
Confidence thresholds
OpenScouter flags a result as actionable when the probability that one variant outperforms another reaches 95%. This is the default confidence threshold.
The 95% threshold means: given the data collected, there is a 95% probability that the difference you are seeing reflects a real difference in the designs rather than random variation.
You can lower the threshold to 80% for exploratory decisions where speed matters more than certainty. You can raise it to 99% for high-stakes changes. Adjust the threshold in study settings before you launch.
Results below the active threshold are shown in amber. Results that have crossed the threshold are shown in green. Results moving in the wrong direction are shown in red regardless of confidence.
Progress grid
The progress grid gives you a live view of how your study is filling in. It is a matrix with your ND categories on one axis and your variants on the other.
Each cell shows:
- How many testers have completed in that category and variant combination
- Your target number for that cell
- A fill indicator so you can see at a glance where gaps remain
If one cell is filling faster than others, that is normal. Stratified randomization ensures the final distribution is balanced even if completion happens unevenly through the study.
Use the progress grid to decide whether to extend a round. If several cells are far from target, extending gives you more complete data before reading results.