# A Super Simple A/B Test for Super Simple Draw

I wrote a drawing app named Super Simple Draw. By including “Super Simple” in its name I actually mean it. The main idea is to make it simple to draw, and the code behind it is also very basic.

For example, to allow undo, I just stored every stroke in an array, and undo means redrawing from the first stroke to the (N-1)th stroke. This is an easy implementation, and allows unlimited undo, but at times it could be slow, especially when there are too many strokes in one drawing.

So I started an optimization effort. What if the image at a certain point is cached, so when user taps the undo button, instead of starting from 0, it starts redrawing from that cached image, and then only redraws the strokes after that cached point?

Naturally, the algorithm becomes having an image cached at every k points (e.g. if k = 5, then I cached when user is at the 5th stroke, then at 10th stroke, then at the 15th). I don’t want to overload the memory, so each newly cached image replaces the old one. Now the question is, what is the optimal number for k.

If k is larger, on average every undo needs to redraw more strokes. E.g. if we cache after every 1000 strokes, on average each undo needs to draw 500 strokes. If k is 10, then on average each undo just needs 5 strokes. But the downside of a smaller number is that since there is only 1 image being cached, so once the user repeats undo beyond the point that is cached, then the cache is no longer valid, and we have to redraw every single line from scratch. E.g. if k is 5, and I have 10003 lines right now, if I hit cache 4 times, the first 3 times it will be very fast, but the 4th undo will take forever because to return to the 9999th stroke, we can’t use the image cached at 10000th stroke, so we have to redraw from scratch and that is 9999 strokes.

In reality, everyone draws things differently, and uses different devices, and undo can happen at different points, so we don’t know what is the cost of one stroke exactly, and can’t mathematically figure out the optimal number. So the natural next step is to do an A/B testing. Since the code for Super Simple Draw is dead simple, so I also made a minimalist implementation for A/B testing, the pseudo code roughly looks like this:

class ExperimentUtil {

static treatment(experimentName, groups) {

let key = "xpkey_" + experimentName;

if (localStorage[key]) {

return JSON.parse(localStorage[key]);

} else {

let r = Math.random();

var index = Math.floor(r * groups.length);

index = Math.min(index, groups.length - 1);

let treatment = groups[index];

localStorage[key] = JSON.stringify(treatment);

return treatment;

}

}

} class Experiments {

static cacheCheckpoint() {

return ExperimentUtil.treatment("cacheCheckpoint", ["5", "10", "20", "control"]); // control means no cache, each group will be at 25%

}

}

When I implement the caching logic, I get the k number by calling Experiments.cacheCheckpoint() and use it in the implementation. Then at the end of the undo function, I send analytics event with parameters:

- totalTime: which is the measured time for completing the undo action
- treatmentGroup: which is the set to Experiments.cacheCheckpoint()

I use Mixpanel to collect data, but I guess many other analytics platforms (Flurry, Google Analytics, Localytics, Firebase, etc) can do the same. Basically just look for aggregation of totalTime value for the undo event, and group them by treatmentGroup. If the analytics platform allows downloading all data points, a thorough significance test can be done. E.g. if the numbers follow normal distributions, a T test can be used to determine if the results are statistically significant.

One interesting result I found is, with small k numbers, it tends to perform better for the happy cases (e.g. P25, P50 of the undo completion time), but performs worse for the worst cases (e.g. P90, P99 of the undo completion time). This is expected because more frequent cache update makes the base case (medium) faster, but the cache is more often invalidated, making the worst case worse. The lower end doesn’t really matter, since an improvement from 0.0002 second to 0.0001 won’t be noticeable. What matters is the P90 or P99, e.g. for complex drawings with many lines, if the undo performance improve from 1.2 seconds to 0.8, it’s a big win. The optimization should focus on the range where a difference is noticeable (i.e. closer to or above 1 second).

The exact result doesn’t matter to much here, since it’s very specific to the setup I have in this app. The key message here is that you don’t necessarily need a fancy framework to be able to run an experiment. Of course if something more thorough and elegant can be built it would be nice, but sometimes with limited time and resource, this scrappy approach could be a viable option.