Andrew Anderson on SaaS Experiments

February 2021

In this episode, I talk to one of my optimization heroes, Andrew Anderson.

We cover discovery-based optimization - how to move beyond idea validation and increase your win rate several times over.

Listen on Apple Podcasts, Stitcher, Spotify, Google Podcasts, or right here👇

Listen on SoundCloud

SaaS Experiments · Discovery-based optimization with Andrew Anderson

… or on YouTube

Quotes

“You’re always testing things you don’t think are gonna work”
“You’re not creating a winner, all you’re doing is discovering what the truth is”
“You don’t get gold stars for your opinions being validated”
“I don’t control what the users react to, I only control my ability to find it out”
“If I’m not including one or two options on a test that … you really hate, I’m not doing it right”
“You should never be able to say that this is a test I’m gonna run 4 months from now … because that test should be dependent on every test that ran before”
“It doesn’t matter what wins a test, as long as something wins and you can act on it”
“How much time does it take everyone to agree on the perfect execution … of a single design? Where, if you’re focusing on a pool … how much time is saved by not doing that?”
“Your job is to tell people they’re wrong … and get them to see that’s a good thing”

Transcript

⚠️ This transcript was generated by a 🤖 - there will be errors.

Brian:
Listen, y’all, I’m here with Andrew Anderson. He has done optimization with over 400 companies and he’s currently leading growth at Merchbar. Andrew, welcome.

Andrew:
Thanks for having me on looking forward to talking with you. Yeah.

Brian:
Yeah. Super excited. So that, that resume 400 companies. That’s that, that is a big number. And we were talking before we started recording. And you, you mentioned that you’re a proponent of discovery based optimization. Did I get that right?

Andrew:
Yeah. I think that most people in the space kind of get stuck on validation. They have concepts or ideas, and I just want to see what the impact of, but really that’s making a massive mistake and resources and really leaving so much of the impact on the table. So if you take a different mindset and use testing as a means to explore and discover what works and to really go with the data tells you, you get so much more out of it. So it just requires a different way to think about things. Okay.

Brian:
So can you say a little bit more about what it looks like to do the one approach versus the other, the idea validation approach versus discovery based? Well, how would that change my day to day as an optimization manager, CRO consultant.

Andrew:
Absolutely. So if you think about the classic validation standard testing practice, it’s a lot of, Hey, I have this idea, well, why don’t we throw that up and see what the impact is, or really focusing on people come with ideas and you’re just kind of an execution arm. Exploration is taking a very different view. It’s not about hypothesis being incorrectly used in terms, but it’s about where do we want to focus and how do we explore what the best option is? You can have an idea about, Hey, I think this is a great way to do this. You know, someone comes to you with an idea, but really what exploration about is deconstructing, what assumptions go into that idea and how you need to discover that it’s about focusing on as many different options and the pool of options and not any specific idea.

Andrew:
The ideas themselves are almost the least important. Part of really optimization optimization is about, can I find the core pieces and the number of things that work best and go where that takes you? What that means is you’re always testing things that you don’t think are going to work or that go, you don’t really have anyone pushing for. It means that you’re always trying to find ways to prove yourself wrong, not right. And that’s where all the value is. If you think about classic testing, the best case scenario is you’re right? Like you got something that performed how you want. I mean, everything else, you just kind of, nothing wanted nothing happens. But in exploration thing, the absolute worst case scenario is you’re right? Because either you’ve explored an idea and proven that there’s no value there, which beyond a single data point is incredibly valuable for future resources or you’ve found something that outperforms what you thought was going to happen.

Andrew:
In both cases, you learned something new and you get a better outcome, but if I’m just testing one or two versions of an idea, then all I can do is say that idea was good or bad, but an exploration, I can see what the pool of options are. And I can kind of go wherever that goes. And the most important thing to remember in both cases, but especially exploration is you’re not creating a winner. You’re not creating this is better than that. All you’re doing is discovering what the truth is out there. And so if you only limit yourself to looking at one thing, that’s very little can happen, but if you’re exploring the realm of possibilities, there’s so many more possibilities and so many more things than we pretend to even understand and being able to limit your ego and using that to drive everything really opens up all these possibilities that you learn so much more and you find so many new answers that you would’ve never guessed.

Brian:
So the realm of possibilities versus I have this idea, I want to add this element to a page. I want to reword this copy in a certain way. I want to switch out this image on this particular piece of the website, because I have a theory in my head that says, this is going to be better. If that’s where I’m starting. If that’s where I’m at, if I’m in this idea of validation mode, I’ve got a big pile of ideas and maybe a framework for prioritizing them and we all voted. And if that’s what I’m doing, what, what does it look like to go from my pile of ideas to start probing the realm of possibilities? How, how should I think about that?

Andrew:
Well, you take any of those individual ideas and you deconstruct the assumptions that go into it. So if we talk about copy, does copy even matter? What are you even focusing on? What are you writing about to who, what, why all those things are assumptions built him. If you think about the image example, does images matter? Where is it on the page? How does it interact with other sections? And so you start to constructing things into the core components and state start tackling those at every level that you can. I usually break stuff down into kind of the four types of changes can go on the site. Usually there’s real estate, which is, has nonexistence re relative position size of items. And that has trumped every other type of page on every site I’ve ever seen over time. Like it’s not even close. The better way to think about that is it doesn’t matter what a Lego piece looks like or what’s written on it, or all those pieces of Lego piece doesn’t belong, or if no one’s seen it.

Andrew:
Right. Okay. So if you take that, then the other three types of changes key and being in the order, and part of the first things you want to discover is what the priority is for a different site, but they generally follow a presentation. What it looks like function, how it’s programmed and how you interact with it, or copy obviously what it says. And the reality is that function tends to be a higher impact, but it’s also a much higher cost that requires a lot more dev resources and things like that. So part of this was also discovering what the efficiency of changes, right? If I can write five copy tests and get a 5% of attempts that when each time that’s going to end up trumping over time, a 30% function when just because of the cost and the resources. The other thing to keep in mind is that by going this route and breaking things under those assumptions you’ve given yourself so many more options to win.

Andrew:
So if I think that this one image needs to change from blue to red, most of all, if I’m looking at what parts of the page even matter, there’s more than just that image, right? There’s other parts of the page. There’s other layouts, there’s other relative positions. And if I go from one option to the seven or eight options, well in a one option that can to be better or worse, but even at the third option, I’ve now added those same two outcomes for each of them. But also the third one, one could be better than both the one worse than the other or worse, nimble. And so you keep opening those up and it allows you to really look at the factors of success rate and scale of impact. And so what you can very easily do, as long as you manage your resources in that way is you can go from a industry which is somewhere between seven and 12% at a success rate for tests to 80, 90, 95% success rate, especially with much harder rules because you’re giving yourself so many more ways to really impact the site.

Andrew:
These are experience all those pieces with each individual action.

Brian:
So if I go back to my genius idea to reword the copy on a particular element on a particular page, the there’s the possibility that I’m right and it wins and it’s a better experience and we’re making more money and I’m a hero. And there’s also the possibility that it’s worse or that it doesn’t matter. And if it doesn’t matter, it’s possibly because it’s just equivalent to the previous copy that I tested against, or also it’s possible that this element within which I’m testing this piece of real estate just doesn’t matter as well. This is all this.

Andrew:
Now you’ve only tested one execution of the copy in most cases or one or two. I mean, I don’t even have enough data at that point to say, copy doesn’t matter. But if I think about other tests I could run instead, what if I do a four by three partial factorial test of the copy of the presentation and the copy and say the background of it, or what happens if I look at relative changes of four or five sections with a couple of executions, the beach, like I’m still testing the copy and all these examples, but I’ve taken the test and added so many mother options that I can have, and it can go in ways that I would’ve never guessed. It’s unbelievable to me, how many times the thing that people don’t like or the thing that people think has no matter is the most important.

Andrew:
And the thing that people think is the most important isn’t I used to have a running joke with every e-com site that on their homepage, you know, they always have that big promo section. There’s a type of test where you call the inclusion exclusion, where you just take each of the main elements that remove them for each experience. And if it’s positive, that means that item is negative to the page, right? We’re moving to improve performance. And if it’s negative, that means it was adding performance. I yet to run that test where the main promo image was a positive factor on the page, it’s been neutral, but it’s almost always negative, but how much time is spent trying to figure out what image goes there and what promotion and all those pieces, you know, that’s a classic example. You could be arguing over ideas of something that just doesn’t matter.

Andrew:
And also by taking this approach, you get rid of the, I feel, I think I believe conversations because that’s what a lot of business ends up being. Gee, I think this was a, well, I really feel our users. You know, I believe that this is the best way to do that. Like, those are just opinions. Great. You can have them, you can not have them. You know, the classic thing I always tell people is I can believe I can fly, but until I jump off the roof, it doesn’t really matter. The instant I jump off the roof until I make an action based off of that. That’s what matters. And so you take those beliefs, but you don’t take the actions solely based off that you find what the right answer is.

Brian:
So all the assumptions that go into my pile of hypothesis, it starts with, I think you mentioned the, the real estate as being kind of the primary, that the factor of the type of change or the type of thing to test where you start.

Andrew:
Yeah. I mean, if you think about it, you could have the perfect piece of copy, but if no one sees it, or if it’s the perfect piece of copy about the wrong type of offer or wrong type of thing, it doesn’t matter. I’m currently going through a process of looking at different types of offers for sales going into, you know Christmas season, holiday season. And if you think about it, I can have the perfect free shipping offer, but a free shipping, isn’t the right thing to do. What does that do for me? You know? And so you always start with what the higher element is. There’s also pieces of that, which is a lot of copier, contextual changes tend to be very dependent on other factors. So they don’t tend to be as permanent as long lasting. Whereas real estate tends to be more of a primary factor and tends to be much longer lasting, especially if you see results over time.

Andrew:
So you’re getting kind of a double benefit there. And again, you can use the concept of the copy to say, where should this be? Or what should I even focus on? What matters most here is a copy. Great. Let’s find out and I can still test that idea. I can still include it in that test, this alternative that you want to do, but that’s the least important part of that test. Yeah, you’re right. Great. But if three other options would have been better, even if that’s positive, you’re the other options are more important, right? And so it’s about what’s best, not just, what’s better, what your opinion is. Don’t get gold stars for your opinions being validated.

Brian:
So in your experience, the question of does copy even matter. I think I heard you say for a given site, for a given site, you will find that among the other three areas, the presentation slash design, the copy and the functionality that they’ll matter, or they won’t and the relative ranking of them

Andrew:
Relative. So a copy change might be worth a 10% gain over a few weeks where a function change might be worth on average, a 20% gain over a few months. Doesn’t mean, copy doesn’t have impact. It’s just relatively, it’s nowhere near the same scale. Right? And then you also have to look at a probability of an impact. In other words, how often do I copy to get something out and how much cost and time does it take? Because there’s opportunity costs. If I run a test for copy, I’m not running another test. Right? And so all those factors kind of give you the efficiency of any type of change. You learn that over time, the other thing keep in mind, those sites change users change. And so you’ve got to always be evaluated now just because something was true four years ago, doesn’t mean it’s true today.

Andrew:
Just because something was true six months ago, it may not be true today, but that’s also, if you’re constantly testing and you’re constantly going in every direction, your site’s constantly adapting to those new needs. It’s when you just make a single choice between things and push it. And that’s the end of the thought like things change that environment even from was the best they had. It may not be the best now, but if you’re constantly in a process of evaluation for what matters and figuring out that, looking at the best options, you’re constantly adapting to your users and your site needs.

Brian:
So early days, if we talk about starting from zero or from the current state of of a site, you, you begin with the real estate, you begin with experiments that tell you where to focus, where which elements matter, which ones don’t, you mentioned the, just the show hide show hide kind of up and down the page is that, is that how you land at that understanding of what matters

Andrew:
There? There’s a couple of different tools you can do in VTS, depending on the page layout inclusion, exclusion is almost always the easiest. And the reason it’s called inclusion and exclusion is this the same test, add something to the page, just because I add something, I still need to know who the relative performance of it. The thing that those type of tests will give you is this thing matters, this thing doesn’t. And what that means is you can then apply your resources towards that. I’m going to go a little different and use a classic button MPT. So the three main factors that people test on buttons are size, color, and copy. Some sites color matters, site size matters. So likes cocky matter, but if I’m only focusing on color and it doesn’t matter on, that’s fine, it doesn’t matter how good a color I choose.

Andrew:
And so figuring that out helps me focus on that. A simple three by two partial factorial test is for experiences. I can add even other things on there, but that piece of it, but knowing that copying matters most, that’s the test they’re just impacted with and color doesn’t matter, shapes everything we do in the future, right? So I’ve learned a bunch there and I’ve gotten a win. And so the same thing can be true if I’m discovering a page, Hey, these three sections of the page have no impact. These two sections of the page, just remove them from get a win. And these two sections on a page are really important. Well, I’ve completely shaped my page based on some basic information, but I’ve also shaped my next four or five actions. Right? So let’s take that element that matters. What factors of it matter what other concepts can I do with that space? What other things can I do? You have all these big assumptions you can tackle them at what makes the most sense. And so you just, you always taking things to a level or a couple levels above, and you’re still tackling the same question. You just tackle it in a very different way.

Brian:
Got it. So not starting with, what do I think what might be better, but starting with what, what even matters on this page, on this site and in the process of learning that and the process of figuring that out, you get some wins just by removing some stuff that was actually net negative impact just by being there.

Andrew:
Sure. But if you go back to that button thing, once you figure out color matters, you can do the classic 41 shades of blue test or whatever matters you can take inputs. Right. it just, as long as it doesn’t limit your pool, if I figure out color matters. And I think that red matters most grant number include red, but I’m also gonna include yellow, black, green, blue, and you know, other options that if I’m right, I’m right. If I’m wrong, great, something else out performed. And so that’s what the core there is.

Brian:
It’s kind of, I see it like a sort of honing in on the, the exact experiment that you need to run first, where which elements were on the page were on the site. And then what kind of change. And after a series of experiments, measuring the relative impacts of different kinds of changes, right? Presentation changes, copy changes, functionality changes you, you land at, okay. We know which of those types of changes is going to be most impactful relative to the effort it takes to test it. Now we test.

Andrew:
Yeah. But there’s another piece in there too, which is really important that gets left out of this, which is if you did only what you said, you’re still going to arrive in a local max month. So the goal in every test is to include at least one option a it’s just a challenger, here’s a new layout or here’s something else. So I’m always bringing in outside inputs to possibly change the system. And so you will incorporate those different pieces of it. You noticing another, this I’m not specified which experience or this specific idea, the support part, all I care about is the pool, the pool and how independent and how wide those options are. Only part of it matters. It doesn’t matter. Blue, red, yellow, green. It doesn’t matter this section, that section because everything is evaluated the same way. And if you’re right, you’re right. And if you’re wrong, great, we get a better outcome.

Brian:
So in the case of this, this challenger challenger variant, or whatever we want to say, if we have landed at the point where we decided, okay, it’s this element on the page and it’s this button and we are going to chest color. And so therefore we’re throwing in several colors, not just the one I like and the one you like, but a spectrum of colors did, what is the challenger look like in that context is a crazy color. Are we throwing in something that is outside of a color change?

Andrew:
So let me give a example from a company I used to work for way back in the day. So I used to work for Malwarebytes helping onsite and in that optimization for both B2B of B to C type business. And so our primary goal on the site was to get people, to download the original tool. And so we’d looked at the main download page and came up with as many different options and executions of those concepts. Some of it was changing out the bane interaction piece, someone with a different value prop, some of our was different layouts, but according to that, we also said, Hey, what happens if we take our B2B piece? And it could in this, how much does it cost us? How much they gained? It had nothing to do with that core concept, but it was a left field idea, but it was able to discover that, and the key here is not just taking that idea, but we did two executions of it, right?

Andrew:
Cause I think, well, execution, doesn’t tell you if it’s execution or the concept. And what we actually discovered was on the B2B piece included on the B-to-C type test a specific type of offer in their, in this case, a free trial actually increased the B to C downloads, but also blew up our B2B side of things so much that we had to double our sales staff. So we took this outside thing learned. I never done to get me here and why I’m not gonna explain why your guess is as good as mine. It also doesn’t matter. None of this matters online. Why is the least important part, but we got both outcomes, but we took this idea that had nothing to do with what we’re doing, but it was another thing that was feasible in that spot. And another thing is relevant to the business and we brought it in. And by going through that discovery process, we ended up in a place and the impact of the business. We never imagined. But the key with that wasn’t that, that one, the chemo, if it was included in the first place, right. I don’t control it. I do not control what the users react to. I only control my ability to find it out.

Brian:
So let’s dig into this example because this, this sounds like it kind of frames a lot of the discussion we’ve had about that was kind of theoretical, starting with malware bikes. Can you, can you talk a little bit more? So there’s a B2C and there’s a B2B line. Can you talk about sort of, I guess the well briefly who’s it for? What do they get out of it? And a little bit about the model and how it’s monetized

Andrew:
So-So Malwarebytes is anti-malware security software. That also is part of a larger suite that does anti exploit and a lot of business tools pieces. Prior to this time, the business side of thing had always been initial sales interaction, manual touch to a sales process where free B2B to B to C users for more of a freemium model, right? Download it, try it for free upgrade. But those lines had never really crossed at that point. It wasn’t like people were getting the B2B, the B to C offers and then going, which I’m no mention of the business side instead of one or two messages that wasn’t like tied together. So what we said was, if our goal is to increase, downloads or increase the value to a business, we thought downloads were the way it was going to work. Let’s look at the main interactions points of that and that download page as a lobbyist point, right?

Andrew:
Highest population, high scale. So then we looked at what types of changes we can do on that page. We looked at the structure and looked at what our resources were and were climbed out. I think in this case, we did a pool of 12 ideas. We narrowed it down to, I believe, five ideas, couple executions of each. And then we took whatever other concepts that could fit in there. We took the B2B side of things, which is not something that was even thought up on that page and test it out. Now, I’m also gonna point out when we tried B2B messaging and other parts of this user flow, and it didn’t work. In fact, in this test, I mentioned that there was two B2B things. The other one was very negative, but this case, the specific execution, that concept, and we test it out after to prove these things tied in and they increase not just the B2B side, but then total downloads everything.

Andrew:
And that’s also why it’s important that you know how you’re going to make a decision on these things going in. It’s not just goes up. This goes down, we had a way to value every action a user could do that made sense the business and which is what was the best option. This case happened to be the one that tied to B2B. It did help B to C, but that wasn’t the part of the decision that was improve the overall performance. And so we took start with concepts, filtered down the pool into the widest beta. In other words, the widest range of things that we could feasibly do, took other input, added it to that same thing, give it the exact same work and treatment executions of the other concepts. And then let the data discover what worked for us. And we got an answer that blew our minds.

Andrew:
It also had a massive impact to the business and those are the best moments I live for moments when everything I believe is wrong. And that’s great. I hate when a test wins, I’ll commonly get people to vote on what’s going to win. And the thing that people want wanted to win wins, but I’ll also be honest. I’ve never had that happen more than two times in a row. And I’m running hundreds of tests a year. I’ve never had someone over a year be more than 18% success rate on picking winners, which again, industry average is 10. Let’s think about that. That means 90% of the time or 82% of the time they were wrong. So why am I trying to use them to pick the thing that matters? So I’m taking those inputs, but that’s not the system. It’s not the process. It’s not the thing that I’m evaluating.

Brian:
Well, I’m just to see. There’s just so many ways to be wrong. That it’s actually not that surprising you, it’s not necessarily that your idea is bad. It’s just that it isn’t different enough to make a difference.

Andrew:
It just could not be the best option if it’s a 5% win, but I could do just as much work and get a 50% win. Well, the 5% is a waste of time, right? So it doesn’t matter. This was better. This was worse. It doesn’t have any impacts on the business. It’s, what’s the best use of resources. What’s the best answer for our users, for our business. How do I discover that?

Brian:
I love the story. I’m not, I’ve not seen a lot of people testing like this. If I heard you, right? You said you started with a dozen, 14 concepts and narrowed it down to what six,

Andrew:
I believe we ended up with a control for rural Maine alternatives to a B to C alternatives plus the B2B alternatives. And then we did two executions of each. So I believe the test ended up being 11 experiences of 12. Okay. Start again. You start with the concept phase, you’re narrowed down, then you do the work to build it out. And the concept for chosen some by, Hey, I think this is works, looks great. And the scooter, but it’s also just, what’s the pool. Like a lot of the concepts end up being very, just small changes on the same thing. Right. I can write a value prop. I can do 50 different versions of a value prop of a value prop doesn’t matter who cares. And so got to start at those core assumption pieces. Can you talk a little

Brian:
Bit about concepts versus execution?

Andrew:
Sure. So I think that I should promote a sale on a specific type of product on my e-comm site. Okay. That’s a concept. That’s a test on you. Someone comes to you. Well, the concepts in there is sale product, right? Where’s the assumption in there there’s other factors, but let’s just take those two core ones. Well, what other types of things could I do? I could do a main promotion of value prop, a merge brand all these types of concepts that concluding that. Right. Well, to figure that out, I include all those as much as I can. And then the execution of those, in other words, not just this specific sale, but a couple of different types of sales, not just a value prop, but a couple types of value props and pretty good range in between them. It’s not about picking them at that point that fails matter or value props matter, the specific one sustain, Hey, let’s go deep, dive deeper into this and get a win.

Andrew:
And so if I explore, say that download page and I want to do a direct sales message versus the standard freemium download versus Hey, a value breakdown or other other products break down. Those are all concepts, right? And I can execute those in a million different ways. And so that’s what we did. We cooked those high level concepts and then executed them differently. Now I do want to point out, ideally, you’re doing more than two executions because it’s only two data points, but it’s certainly better than one, but let’s say we do it three or four, which I’ve done, or you do a couple, it’s just giving you a data point to say, I’m also going to do this from the future. It’s not only do you have a breakdown of the core concepts here, but every test has a natural next step. Now you may choose to go in a different direction, but this is showing optimization as a process, as an ongoing thing. It’s not, I ran a test. The test is just the means to me. And it’s all about how do I evolve this and how do I constantly build this in and constantly keep adapting as the systems change as the inputs change as the users change. And it’s an ongoing thing. As I tell my teams all the time, the grind never ends,

Brian:
But it sounds like you are learning a lot more per experiment than the folks over in idea validation, land, where the only way.

Andrew:
Yeah, but I’m also just winning a lot more. I last four years, I had over 90% success rate. I went six years without a failed test. You know those seem insane to people and I’m not like I will break down every type of statistical breakdown you want. I’m a math nerd at heart. And I’m have much harder rules than what people rely on confidence stuff, but I’m learning, I’m getting a win and I’m allowing myself to get a higher scale win because I can have multiple winners by doing this process. Another way to look at this is you can model out, let’s say that I run a hundred tests with two experiences, 20 tests with 10 experiences, 10 tests with 20 experiments, right? I’m testing 200 options, every single and I can model out based on expected outcomes for most businesses. Just fantastic.

Andrew:
Just what normal industry, what the best mix is for them, because there’s a happy medium there. Right? And what you’ll find with most sites is it’s either 10, 12, or 15 experiences as the ideal. Now you may not be able to run that minute, but it’s still showing that the most you can do. There’s also too many, right? I could have 50 experiences and only run two tests and it takes forever. And the data is useless. So it’s finding what the happy medium is for you and maximizing that each time. So Susan learned the rules of how to act on the test and what matters on study and how to learn on a site. These things, all feed each other, right? It all improves the optimization process and the business outcomes, right? And so it’s all built on the same concept of how do I explore? How do I discover and how do I exploit information?

Brian:
I want to go back and I think you’ve explained this, but I just want to hit this again. You mentioned the widest beta, and that’s not a phrase that’s super familiar to me. So just want to be sure that it’s clear to anybody tuning in. Why does beta refers to the, the range that the different NUS among experiences?

Andrew:
So if you think of Delta, Delta is the difference between two things, right? So the Justin, if you think about Lyft, it’s a Delta between two points. Shana is looking at a pool, a series, and what the range is within their right. If I’m testing out, Hey, I’m going to do a, a direct message, copy a personal drug message copy and an impersonal direct message copy during a small rate. But if I’m going to do a value proposition versus an offer versus a generic message versus nothing versus other things, much wider range, right? It’s still including the same option on there, but it’s just how you tackle that. And so what you’re looking for is a way to get your team to go where they’re uncomfortable. I always tell people if I’m not including one or two options on a test that make you uncomfortable, or you really hate, I’m not doing it.

Andrew:
Right. So you’re just trying to create rules to make people think in terms of how do I get ideas that are independent of each other. So they’re not all just pinned on opinion and how do I get them to be as well as different from each other as possible. Mostly gives me more signals. I can refine down. It’s really hard to refine out, right? If I think that color matters on the button, only test is color. If copy matter most I’m just wasting time. I’m just spinning my wheels. But if I could figure out not just that copy matters, but I test out 12 different options of the copy or 41 shades of blue that I’m going to get the right answer. But that’s the last step. It’s not the first step. The first step is trying to get to the most learning and the most things I can go towards, I should, you should never be able to say that this is a test of a run four months from now because that test should be dependent on every test that we had before. So you don’t

Brian:
Operate from a six month testing roadmap.

Andrew:
I may have concepts. I want to tackle and places and locations and things I want to focus on, but every test should be, should be relevant or interact with the test prior to it, even if it’s just discovering a new part of the user experience and figuring out how that prioritizes against other efforts, every test is a choice not to do other things. Every test is also a chance to improve other things. And so you got to think of that as an evolving adaptive system. It’s not a series of actions where I test this versus that. Like, that’s not how you should ever think about testing. Testing is the constant process of discovering how to improve the user experience and to improve the business outcomes.

Brian:
So let’s go back to this download page for Malwarebytes and I, if I’m to tie in what you were just talking about, which is sort of getting, getting out of the team’s comfort zone and including experiences or options that don’t immediately seem obvious or necessarily comfortable. Is it fair to say that this, this download page is sort of thought of as a B2C asset and that throwing in the B2B offer was a little out there?

Andrew:
Our head of sales absolutely hated us including in that test and was adamantly opposed to it.

Brian:
Okay. But you read it anyway. Yeah. Okay. Okay. That healthy, healthy decision.

Andrew:
What’s the worst case. He’s right. Great. It’s just one of six or seven other experiences, worst case, best case he’s wrong, which he was, and we’ve now found a way to completely change our business. It’s when you start going with the assumptions in just opinions is when you, you get all the prompts, you can take it as an input, absolutely. An input. You may even have a higher bar for action for that thing to happen. Right. If it’s equal to a B to C option in this case where you can go with the lesser option, but if it’s dramatically better and you have ability to act on it, why in God’s name? Would you not do that? Yeah,

Brian:
Let’s, let’s talk about that too. The, the success metric or the bar that you have to cross to declare a winner, because you do have two very different types of actions that you’re measuring. And I think I heard you say you have a way to, to measure the value of every action or every key action

Andrew:
Action that is relevant to business outcomes. So here’s a classic example, a balm trader can people to go to the shopping cart or getting people interact with search or interact with us, the recommended items, module, getting people to click on a button. None of those are outcomes. Those are just means to an end. But if I know what my average value of a download is, and my average value of a trial is, or my average value of people fund up for a newsletter or average value of people reaching out and downloading a white paper, whatever those is, if I can aggregate those, then I can monetize the value of the session or the user. And then I have a common language to, to evaluate things. And it’s not, does not matter if it’s perfectly 37 cents or 38 cents. Like that is an arbitrary decision. That’s going to change over time.

Andrew:
What matters is that it’s equal and it’s representative of the business, right? If I know that a B2B download is worth 10 B to C downloads, then it doesn’t matter if I value at one in 10 or 10 and a hundred or five and 50, it’s all the same value. Right? And so I just need to be able to, to have a way to measure that and validated over time. So the key here is that you constantly viewing that and adapting it. You don’t just set a rule and come back to it. Five years later, you’re constantly seeing if it does have that same value.

Brian:
So it doesn’t always play out the way it played out in this particular instance where on the download page, by bringing in another type of conversion, both went up, if both go up. Well, that’s easier to analyze, I assume, but in the case where one goes up and the other goes down, it’s a matter of degree. It’s a matter of, well, how much up was up. And

Andrew:
That’s the entire point. If I’ve made it into a monetization, it’s not two things I’m comparing this one, right? It could be the neutral physical variant. It’s one. What is the value of this interaction? Did it go up? Did it go down period? It’s not, does it help this group or help that group, or does it help this one person hit his goal? Or, Hey, this one person really, really wants to focus on what he’s focusing on this month. It’s what is best for the business. And again, one of the biggest problems is people always think they are great at it. And people suck at making rational decisions. So you’ve got to make it as easy and painful or painless as possible. Always think of this success. And failure is determined before you launch, not after if you don’t have a way to act on it, it doesn’t matter. It doesn’t matter what wins the test as long as somebody wins and you can act on it. So you gotta make sure you can do those two things.

Brian:
Got it. Well, while we’re on revenue, I guess, and value to the business, I want to talk a little bit about resources because I think maybe one thing that leads to the idea of validation, school of optimization, where it’s just have an idea, implement it, run a test. Repeat is the maybe relative simplicity of the design implementation analysis. Can you talk about the team behind this? What did you say? Maybe refined down to half a dozen concepts and multiple executions up to 10 to 12 variations. That’s a bigger test than a lot of people are running. How does that come together? How do you need more of a team or just a different mindset of what

Andrew:
You make plans to the resource that you have not getting resources to the plan yet, but let’s be clear. The same test is executed by the same resources and almost the same time. So think about this, how much time does it take everyone to agree on the perfect execution and get signed off and do revision forms and to focus on being Zack execution at every detail of a single where if you’re focusing on a pool and I always tell people 80% is better than a hundred, how much time is saved by not doing that. Most of these things aren’t reinventing the wheel, you’re taking something and making a slight tweak to it. The thing you actually can’t get back is time. And when, I mean time, it’s time for a test to be live. So if I have two weeks, this test has to run.

Andrew:
I want to get the most out of that two weeks. Cause I can’t do other things during that time. And so I make plans based on the resources I have, the what traffic allows and those things. But the piece that everyone thinks is it’s so hard to get a single thing up. And it has to go through this validation because of all the fine off there is, should be as little sign off as possible on a test because worst case it wins and you keep refining. It it’s live for a day before you start running the next test on it, or a couple of weeks, if you want to run a test in between. But the thing that everyone does is they get stuck in this world of, I have to go through all this work to get the single thing and it lives forever. So it has to be perfect. And so you’re changing just the mentality of how the resource you use, not necessarily the matter of resource, but that being said, even if you have to run fewer tests as I did before mathematically, it’s always in your favor,

Brian:
Right? If you count experiences, you’re running fewer tests with more experiences protests. So you’re not actually testing less. You’re just giving yourself better odds.

Andrew:
It’s not like it’s linear. It doesn’t take 10 times as much effort to run a 10 experience test than it does. A two experience does true. Well, in that case, it would be a and expense with what you’re looking at is what do I do to maximize the time I have, and also to make sure that something can happen as soon as something else has done, because there was other health problems. We ran a test. We’re not thinking about it. And it ends, we execute it. Okay. What are we going to do next? You can never get that time back. So I’ve watched the task. Great. Let’s work on the next one. Great. Let’s work on the next one. Great. Let’s work on the next one. The grind never ends Tesco’s life. That’s just the next action in the day. And so if you constantly doing this, it frees you up.

Andrew:
It lets those resources to be used in a way that they find most efficient. We’ll ask people to explore new ideas that last people to do things. I’ll be honest sometimes during this thing work over and over again, but at least it’s value out of it. So imagine doing that same thing and for six months and nothing actually changes, you just made a new color, a new button, new function. And that’s really what happens when you think about the classic product roadmap. That’s what that is. Sometimes it works. Everyone has a belief that’s going to work, but as we’ve seen before, belief is a scratcher tickets. One to 10. Let’s not focus on you know, dumb luck.

Brian:
Okay. That’s a surprise. I really, I didn’t see this answer coming, but it does make sense to me that I think I come from an execution background and I used to just build tests and JavaScript all day. And so I think of that phase of the test life cycle as being very, very time constraint, deadline driven, and I guess optimized bubble. But when you look at the full test life cycle, you’re right, approvals and conversations are the bulk of that time. And if you’re adding another 10 variants, those don’t get any,

Andrew:
How much is your life simpler? If you just code tend to start with and say, you killed two, there’s no review processes back then just kill to go to least favorite skill. Is that really more time? That’s a simple way to execute. It takes what you’ll think. And you get to be freed up. Here’s the thing that everyone’s missing. The designer, the engineer, the product people, the senior executives, they get the freed up. They’re no longer stuck on this. I have to defend this point and validate the point and find ways to prove this point. It’s what’s possible. Let’s think let’s go outside the box, the other one right or wrong, but it frees up all that time. It’s not using more time. It’s just using time differently.

Brian:
All right. Well, so this, I love the story. Again, it’s different from a lot of the stories that I hear because the typical narrative is there was this great idea. And then we tested it at one or some, some variation of that story. And this one, we we’ve heard very little about the actual winning experience. We know that you are on the download page and you brought in the B2B free trial call to action in some format. But the actual details of it, I mean, it was one of 11 experiences that we’re running. And it’s the one that just,

Andrew:
I could not care less. What wins, what wins the least interesting part of the entire process. What matters is did I do all the work to get there? And that work is more mental than it is technical. Not to evaluate technical in any way. I know this works, but again, freeing people up to think differently and to use their time differently and use their skills differently while still getting an output just makes everyone’s life better. And it makes the business go

Brian:
As someone who’s done a lot of technical work in this field. I personally appreciate the forethought. I think, I don’t guess every developer cares per se, but I think on average, if you’re going to spend your time to build out experiment experiences, you want to think that that matters. You want to think that some forethought went into the experiment design and that a winner will come out and that your code will live on versus just being tasked to

Andrew:
No, just that if you have an idea, we include it. The ideas come from anywhere. It’s not just bottom, top down. You’re working on six of these. Hey, this other idea, let’s just do it, right? There’s no difference there frees you up. It makes you as much of the thought process as the senior executive or the optimization manager or anyone. It just makes it so that everyone’s focused on the same outcomes. It’s not just, you know, handoff, handoff, handoff.

Brian:
We would like to take this. If, if I just stepped into a role as I’m running growth for some sort of a SAS company, and I just walked in the door and it’s my first week, I’ve got to get this stuff off the ground. I don’t where to start. What would you tell me? How, how would you tell me to approach? Maybe just thinking in terms of the website, maybe we just gloss over things like the, the team and whatnot. Assume I can get stuff done, but just how do I, where do I start? I’ve got ideas, but I’m not just going to start testing ideas.

Andrew:
Well, so let’s take a step back. What are the big things that you have to tackle to make a test work? I need to make sure I have an infrastructure to track and test something. I need to make sure it’s executionable, which means the organization. And then I need to figure out what matters and I need to make sure people know what matters. So those are the things I’m gonna tackle. I mean, look at my infrastructure while helping to educate people. And then also trying to maximize the amount of work for and does in the short term, for what I mean is nothing teaches us better than doing it. So if I can kind of get a team to even force its way through one or two of these actions, it becomes natural, but it just seems so forth. So I used to joke that whenever I come in or I start things, I’m doing a lot of whiplash because I want people to think differently.

Andrew:
I want people to act differently. So it might be as simple as, Hey, I want to see if we can execute on this. How about we run an inclusion and exclusion test, which is code wise, almost nothing like display. But let’s see if we can make sure everyone knows this is what we’re trying to accomplish. Here’s how we’re going to measure it. Here’s the tools we’re going to use. Let’s execute on it. And that takes so little thing because, okay, I think this matters most great. Let’s include it. I don’t think this matters. Great. Let’s include it. Right. Everyone has opinions there. You’re not saying that this goes away and it’s never coming back. You’re just saying insecure execution. This is positive or negative. Let’s focus. What matters. Let’s get rid of things that don’t, you know, and so it’s a very simple concept, but it allows people to kind of see all these pieces and I’m always trying to optimize, can I act on stuff? Can I execute stuff? Can I get the bid or pool? And all three of those I’ve learned through that process.

Brian:
Okay. So let’s, let’s say we do have good enough analytics and we are able to execute on this include exclude tests. Let’s if we can’t, let’s, let’s start to design it here just to make sure that I, I, I can see what this is going to go, how this is going to go.

Andrew:
Sure. So every page I used to joke, I make pastel boxes of them. If you take a big screenshot of an entire webpage for more than whatever, take blocks sections of the key functional areas. And it will literally look like Legos. I used to work for Adobe. I don’t know if they still do it while the Adobe headquarters right behind the front door. If you look back up, it had all the main webpages by blocks and color pieces. Let’s see, figure out what all those big blocks are. And then each experience just removing each of the blocks one at a time. So the concepts were really easy for experience. Yeah. And then I figured out what the most I could do from a data standpoint. Ideally, a test is running two to three weeks, sometimes a little longer, sometimes a little shorter, but never less than about 10 days at the festival over do someone.

Andrew:
But what can I maximize the learning that time? Can I only test six? Things are going to test eight? Can I test nine? As soon as I have that number, it’s just picking the ones that make most sense and I’m picking them up based on what I think matters. But what’s the biggest swings, right? If I’m only gonna remove these little tiny sections, that’s not gonna have as big an impact as were moving them tire promo section. Right? So I’m just picking up the biggest pieces or, yeah, there’s a control. Everyone knows that. That’s what makes clear. The only thing that’s confusing at that point is you have to think backwards. If I get a positive outcome, it’s negative. If I get a negative outcome, it’s possible. That’s the hardest challenge honestly. But the other piece of this is can you get people to a line on how to define on that is a positive for this group or that group is a positive for the business?

Andrew:
So that’s actually usually a hardest work, very rarely a tactical problem. It’s almost always an organization defining a success metric that everyone agrees on, or at least valuing actions, right? Because everyone has the difference between what is good for them. And what’s good for the company, right? Are you trying to make someone happy or trying to grow the company? And it puts you in some very weird spots at the same time, it allows you to cross teams in a way that no other piece of mission, can you gotta be everyone’s best friend and everyone’s worst enemy. You gotta be the guy that speaks truth to power. Your job is to tell people the wrong period and get them to see that’s a good thing. The best case scenarios, you’re wrong, worst cases. You’re right.

Andrew:
And so education starts from before day one. It’s part of every conversation. It’s the hardest part, but you’re educating the executives. You’re educating the design team. You’re educating the engineers. You’re educating your analytics team. You’re educating everyone because you’re asking them to go in the same path. You’re asking him to never stop it. Doesn’t end. That test goes live. We designed that test. You know what I’m doing that next second, it goes, live, getting the next test. Ready. It never stopped. I’m also educating people. And one of our do based on that result, but that test ends. I got to be able to do something. And so what you end up, cause there’s a bunch of tests or things that are happening sequentially. And how do you execute on those? That makes sense. Yeah. And so the other part is you need to think about how to value actions differently.

Andrew:
So you mentioned everyone has their, you know, ice is common or there’s things. I break it down into the most mathematical place, possible scale population, times the number of options and the beta of those options, right? How big a pool do I have? And can I actually get them independent? Divided by cost cost is really easy. It’s either time, money or resources, some combination in the risk. It’s not, I have to go buy it as is. I need eight dev hours versus 10 dev hours versus 87 dev hours. Remember I talked about functional before and it comes down to the option. If I think I’m really confident that bread is better than blue and I’m only gonna test red versus blue. It fails this test, but a farmer test, every color goes way up. Red happens to be one of them. You might be right. And so it’s how do I maximize those resources? How do I prioritize those resources? The same thought process.

Brian:
Got it. Can you run through the formula one more time for this is for prioritizing different tests that you might run.

Andrew:
I think about any action, any part of the business scale times influence, influence in this case is going to be the number of options in the beta of those options. Right? Okay. Over cost.

Brian:
Okay. So you, you lose points. The idea loses points. If it, if the scale is low, as in, it’s only going to be seen by a small subset of, of visitors, for example, loses. If you don’t have very many options or if those options aren’t very different from one another. Yes. And then it also loses points. If it’s just going to cost a bunch of time resources to actually execute.

Andrew:
But remember that includes things like review process and approval and decisioning, and indicating that the same time, your time is not less valuable or more valuable than the engineer or the designer or the executive. I tell everyone they have the same job title or everyone’s job titles, the janitor, we all clean up messes. And so valuing everyone’s time in there. How much effort is it gonna take to get this going? You know, so there’s no confidence in there. There’s no voting, there’s no opinions. There’s all that thing.

Brian:
Okay. Well, so this I’m about to have a very different first week at this startup, after this conversation, rather than come up with some ideas and start throwing tests out there and just hope that something wins.

Andrew:
I understand this all probability. There’s a chance that I buy a lottery ticket and I win the lottery. It doesn’t mean it’s a good investment strategy. So you’re just trying to improve your odds and constantly get the best outcome. Think differently, execute differently, speak differently. It starts from day one and it never ends that you’re doing everything part of this process differently. And that’s why you have to start from day one. You can’t go suddenly, Hey, you’ve been doing this thing right for the last three years, but I think I’m gonna try something different. You gotta be able to tackle yourself first and set that groundwork first. And it starts before you walk in that door and it ends well after you leave that door, you’re doing everything you can to make sure that everyone’s on that same page.

Brian:
We could go on and on. We’ll have to get back together and go on and on. But I think that’s a lot for now. That’s a lot for me to digest. So thank you so much for your time. If somebody is listening or watching and they want to find you on the internet, where should they go?

Andrew:
Yeah. Thanks you again for having me here. You can find a lot of my writing on sites like conversion Xcel’s blogs. I have my own personal blog called testing, discipline.com. I’m an active on Twitter under @antfoodz, but really feel free to reach out. You can find me on LinkedIn, Twitter, or any other thing, happy to help, help you to help fill anyone in the industry. That’s trying to get better. So really appreciate you giving me the time to talk to you and talk to everyone.

Brian:
Thank you so much.

Andrew Anderson on SaaS Experiments

Listen on SoundCloud

… or on YouTube

Links

Quotes

Transcript

Next time I write something, I'll let you know.