Answer
Jul 28, 2018 - 05:14 PM
I don't fully understand how it works but after going through their presentation and reading this blog post, it seems worth it.
Here's is my limited understanding of what it does and how it could be helpful. First let's start with the three common problems in A/B testing today or "traditional statistics", to use Optimizely's vernacular.
1. False Positive (Type I error): Calling a winner when there is no winner i.e. the testing tool determines that a lift had significance when it didn't. The business then makes a decision to use the new page, expecting a rise in sales, leads, app downloads or whatever metric is being optimized.
2. False Negative (Type II error): Failing to call a winner when a winner should have been called i.e. the test tool fails to declare a winner on a test because statistical significance was not attained. This is generally a problem with smaller lifts e.g. a 3.2% lift that failed to meet the desired threshold for statistical signficance (usually 95% plus).
Type II errors are costly because the business won't use the newer page and miss out on potential revenue gains. If your company is doing $100 million in revenue (fairly common for our clients), 3.2% is $3.2 million. Crucially, small gains are multiplicative and so a 3.2% gain on the landing page, a 1.8% gain on the order form and a 2.6% lift on the final cart submission page lead to a (3.2 x 1.8 x 2.6) = 14.976% ~ 15% more revenue! If your testing tool fails to declare a winner on lots of small gains, you will be leaving much dinero on the table!
3. Tests take too long to attain statistical significance: A company's success is largely determined by how many experiments it can run in a given period. If you'd like to double your winners, double your experiments, Jeff Bezos famously said. But of course, you are constrained by the length of time it takes to complete a single experiment.
Enter the Optimizely Stats Engine...
...where all the women are strong, all the men are tall and handsome, and all the kids are above average. From now to eternity we can sing kumbayah and live happily ever after :)
Optimizely's stat's engine does the following things below to address the problems described above.
a) It uses a technique known as 'sequential testing' to 'peek' at the results after every visitor. According to their stats guru, at any time during the course of an experiment the results are valid. Maybe this is an exaggeration but basically, sequential testing lets you arrive at statistical significance faster by relaxing the conditions somewhat--they say that traditional statistics are too restrictive. Therefore, by using sequential testing you get speed without sacrificing accuracy.
b) It uses machine learning and 'adaptive re-inforcement' to allocate more traffic to the variation that has the highest likelihood of attaining a lift. Say you have 3 variations and have set the traffic to 33/33/33, adaptive re-inforcement favors whichever variation is showing early promise. It may use 45/30/25 for example.
This way you get to statistical significance--and a business decision--faster, while also maximizing your results during the experiment. This last point can be important for time sensitive campaigns e.g. marketing an event where you want to get the most attendees possible and there is little benefit to running the 'winning page' after the experiment ends.
This is not a new idea. Google Experiments has offered this option for years. But Optimizely purports to have found a superior algorithm.
c) Lastly Optimizely introduces a new correction to reduce 'false discovery'. When you run multi-variate tests, as you increase the number of variations, the danger of false discovery rises. If you had a 10% chance of false discover with one variation, with 5, you have a 50% chance. Optimizely's corrective feature reduces this danger by 30% to 50%. The more variations you have, the more useful it is.
That is my understanding of the stats engine which requires you to upgrade to their premium tier to use it. The features all seem useful and innovative and to me, should come standard.
But I've also heard that Optimizely is under pressure to attain profitability after raising $300 million. And so they are trying to push premium features on customers who may not need them. They seem to have vacated the mid-market and want all their customers to be enterprise customers.
Here's is my limited understanding of what it does and how it could be helpful. First let's start with the three common problems in A/B testing today or "traditional statistics", to use Optimizely's vernacular.
1. False Positive (Type I error): Calling a winner when there is no winner i.e. the testing tool determines that a lift had significance when it didn't. The business then makes a decision to use the new page, expecting a rise in sales, leads, app downloads or whatever metric is being optimized.
2. False Negative (Type II error): Failing to call a winner when a winner should have been called i.e. the test tool fails to declare a winner on a test because statistical significance was not attained. This is generally a problem with smaller lifts e.g. a 3.2% lift that failed to meet the desired threshold for statistical signficance (usually 95% plus).
Type II errors are costly because the business won't use the newer page and miss out on potential revenue gains. If your company is doing $100 million in revenue (fairly common for our clients), 3.2% is $3.2 million. Crucially, small gains are multiplicative and so a 3.2% gain on the landing page, a 1.8% gain on the order form and a 2.6% lift on the final cart submission page lead to a (3.2 x 1.8 x 2.6) = 14.976% ~ 15% more revenue! If your testing tool fails to declare a winner on lots of small gains, you will be leaving much dinero on the table!
3. Tests take too long to attain statistical significance: A company's success is largely determined by how many experiments it can run in a given period. If you'd like to double your winners, double your experiments, Jeff Bezos famously said. But of course, you are constrained by the length of time it takes to complete a single experiment.
Enter the Optimizely Stats Engine...
...where all the women are strong, all the men are tall and handsome, and all the kids are above average. From now to eternity we can sing kumbayah and live happily ever after :)
Optimizely's stat's engine does the following things below to address the problems described above.
a) It uses a technique known as 'sequential testing' to 'peek' at the results after every visitor. According to their stats guru, at any time during the course of an experiment the results are valid. Maybe this is an exaggeration but basically, sequential testing lets you arrive at statistical significance faster by relaxing the conditions somewhat--they say that traditional statistics are too restrictive. Therefore, by using sequential testing you get speed without sacrificing accuracy.
b) It uses machine learning and 'adaptive re-inforcement' to allocate more traffic to the variation that has the highest likelihood of attaining a lift. Say you have 3 variations and have set the traffic to 33/33/33, adaptive re-inforcement favors whichever variation is showing early promise. It may use 45/30/25 for example.
This way you get to statistical significance--and a business decision--faster, while also maximizing your results during the experiment. This last point can be important for time sensitive campaigns e.g. marketing an event where you want to get the most attendees possible and there is little benefit to running the 'winning page' after the experiment ends.
This is not a new idea. Google Experiments has offered this option for years. But Optimizely purports to have found a superior algorithm.
c) Lastly Optimizely introduces a new correction to reduce 'false discovery'. When you run multi-variate tests, as you increase the number of variations, the danger of false discovery rises. If you had a 10% chance of false discover with one variation, with 5, you have a 50% chance. Optimizely's corrective feature reduces this danger by 30% to 50%. The more variations you have, the more useful it is.
That is my understanding of the stats engine which requires you to upgrade to their premium tier to use it. The features all seem useful and innovative and to me, should come standard.
But I've also heard that Optimizely is under pressure to attain profitability after raising $300 million. And so they are trying to push premium features on customers who may not need them. They seem to have vacated the mid-market and want all their customers to be enterprise customers.
Add New Comment