Andrew Anderson on SaaS Experiments
February 2021
In this episode, I talk to one of my optimization heroes, Andrew Anderson.
We cover discovery-based optimization - how to move beyond idea validation and increase your win rate several times over.
Listen on Apple Podcasts, Stitcher, Spotify, Google Podcasts, or right hereš
Listen on SoundCloud
SaaS Experiments Ā· Discovery-based optimization with Andrew Anderson
ā¦ or on YouTube
Links
- The Discipline Based Testing Methodology - an article on the CXL blog that outlines some of the ideas from this podcast, and includes screenshots from a homepage redesign experiment
- Andrew on Twitter and LinkedIn
- Andrewās blog
Quotes
- āYouāre always testing things you donāt think are gonna workā
- āYouāre not creating a winner, all youāre doing is discovering what the truth isā
- āYou donāt get gold stars for your opinions being validatedā
- āI donāt control what the users react to, I only control my ability to find it outā
- āIf Iām not including one or two options on a test that ā¦ you really hate, Iām not doing it rightā
- āYou should never be able to say that this is a test Iām gonna run 4 months from now ā¦ because that test should be dependent on every test that ran beforeā
- āIt doesnāt matter what wins a test, as long as something wins and you can act on itā
- āHow much time does it take everyone to agree on the perfect execution ā¦ of a single design? Where, if youāre focusing on a pool ā¦ how much time is saved by not doing that?ā
- āYour job is to tell people theyāre wrong ā¦ and get them to see thatās a good thingā
Transcript
ā ļø This transcript was generated by a š¤ - there will be errors.
Brian:
Listen, yāall, Iām here with Andrew Anderson. He has done optimization with over 400 companies and heās currently leading growth at Merchbar. Andrew, welcome.
Andrew:
Thanks for having me on looking forward to talking with you. Yeah.
Brian:
Yeah. Super excited. So that, that resume 400 companies. Thatās that, that is a big number. And we were talking before we started recording. And you, you mentioned that youāre a proponent of discovery based optimization. Did I get that right?
Andrew:
Yeah. I think that most people in the space kind of get stuck on validation. They have concepts or ideas, and I just want to see what the impact of, but really thatās making a massive mistake and resources and really leaving so much of the impact on the table. So if you take a different mindset and use testing as a means to explore and discover what works and to really go with the data tells you, you get so much more out of it. So it just requires a different way to think about things. Okay.
Brian:
So can you say a little bit more about what it looks like to do the one approach versus the other, the idea validation approach versus discovery based? Well, how would that change my day to day as an optimization manager, CRO consultant.
Andrew:
Absolutely. So if you think about the classic validation standard testing practice, itās a lot of, Hey, I have this idea, well, why donāt we throw that up and see what the impact is, or really focusing on people come with ideas and youāre just kind of an execution arm. Exploration is taking a very different view. Itās not about hypothesis being incorrectly used in terms, but itās about where do we want to focus and how do we explore what the best option is? You can have an idea about, Hey, I think this is a great way to do this. You know, someone comes to you with an idea, but really what exploration about is deconstructing, what assumptions go into that idea and how you need to discover that itās about focusing on as many different options and the pool of options and not any specific idea.
Andrew:
The ideas themselves are almost the least important. Part of really optimization optimization is about, can I find the core pieces and the number of things that work best and go where that takes you? What that means is youāre always testing things that you donāt think are going to work or that go, you donāt really have anyone pushing for. It means that youāre always trying to find ways to prove yourself wrong, not right. And thatās where all the value is. If you think about classic testing, the best case scenario is youāre right? Like you got something that performed how you want. I mean, everything else, you just kind of, nothing wanted nothing happens. But in exploration thing, the absolute worst case scenario is youāre right? Because either youāve explored an idea and proven that thereās no value there, which beyond a single data point is incredibly valuable for future resources or youāve found something that outperforms what you thought was going to happen.
Andrew:
In both cases, you learned something new and you get a better outcome, but if Iām just testing one or two versions of an idea, then all I can do is say that idea was good or bad, but an exploration, I can see what the pool of options are. And I can kind of go wherever that goes. And the most important thing to remember in both cases, but especially exploration is youāre not creating a winner. Youāre not creating this is better than that. All youāre doing is discovering what the truth is out there. And so if you only limit yourself to looking at one thing, thatās very little can happen, but if youāre exploring the realm of possibilities, thereās so many more possibilities and so many more things than we pretend to even understand and being able to limit your ego and using that to drive everything really opens up all these possibilities that you learn so much more and you find so many new answers that you wouldāve never guessed.
Brian:
So the realm of possibilities versus I have this idea, I want to add this element to a page. I want to reword this copy in a certain way. I want to switch out this image on this particular piece of the website, because I have a theory in my head that says, this is going to be better. If thatās where Iām starting. If thatās where Iām at, if Iām in this idea of validation mode, Iāve got a big pile of ideas and maybe a framework for prioritizing them and we all voted. And if thatās what Iām doing, what, what does it look like to go from my pile of ideas to start probing the realm of possibilities? How, how should I think about that?
Andrew:
Well, you take any of those individual ideas and you deconstruct the assumptions that go into it. So if we talk about copy, does copy even matter? What are you even focusing on? What are you writing about to who, what, why all those things are assumptions built him. If you think about the image example, does images matter? Where is it on the page? How does it interact with other sections? And so you start to constructing things into the core components and state start tackling those at every level that you can. I usually break stuff down into kind of the four types of changes can go on the site. Usually thereās real estate, which is, has nonexistence re relative position size of items. And that has trumped every other type of page on every site Iāve ever seen over time. Like itās not even close. The better way to think about that is it doesnāt matter what a Lego piece looks like or whatās written on it, or all those pieces of Lego piece doesnāt belong, or if no oneās seen it.
Andrew:
Right. Okay. So if you take that, then the other three types of changes key and being in the order, and part of the first things you want to discover is what the priority is for a different site, but they generally follow a presentation. What it looks like function, how itās programmed and how you interact with it, or copy obviously what it says. And the reality is that function tends to be a higher impact, but itās also a much higher cost that requires a lot more dev resources and things like that. So part of this was also discovering what the efficiency of changes, right? If I can write five copy tests and get a 5% of attempts that when each time thatās going to end up trumping over time, a 30% function when just because of the cost and the resources. The other thing to keep in mind is that by going this route and breaking things under those assumptions youāve given yourself so many more options to win.
Andrew:
So if I think that this one image needs to change from blue to red, most of all, if Iām looking at what parts of the page even matter, thereās more than just that image, right? Thereās other parts of the page. Thereās other layouts, thereās other relative positions. And if I go from one option to the seven or eight options, well in a one option that can to be better or worse, but even at the third option, Iāve now added those same two outcomes for each of them. But also the third one, one could be better than both the one worse than the other or worse, nimble. And so you keep opening those up and it allows you to really look at the factors of success rate and scale of impact. And so what you can very easily do, as long as you manage your resources in that way is you can go from a industry which is somewhere between seven and 12% at a success rate for tests to 80, 90, 95% success rate, especially with much harder rules because youāre giving yourself so many more ways to really impact the site.
Andrew:
These are experience all those pieces with each individual action.
Brian:
So if I go back to my genius idea to reword the copy on a particular element on a particular page, the thereās the possibility that Iām right and it wins and itās a better experience and weāre making more money and Iām a hero. And thereās also the possibility that itās worse or that it doesnāt matter. And if it doesnāt matter, itās possibly because itās just equivalent to the previous copy that I tested against, or also itās possible that this element within which Iām testing this piece of real estate just doesnāt matter as well. This is all this.
Andrew:
Now youāve only tested one execution of the copy in most cases or one or two. I mean, I donāt even have enough data at that point to say, copy doesnāt matter. But if I think about other tests I could run instead, what if I do a four by three partial factorial test of the copy of the presentation and the copy and say the background of it, or what happens if I look at relative changes of four or five sections with a couple of executions, the beach, like Iām still testing the copy and all these examples, but Iāve taken the test and added so many mother options that I can have, and it can go in ways that I wouldāve never guessed. Itās unbelievable to me, how many times the thing that people donāt like or the thing that people think has no matter is the most important.
Andrew:
And the thing that people think is the most important isnāt I used to have a running joke with every e-com site that on their homepage, you know, they always have that big promo section. Thereās a type of test where you call the inclusion exclusion, where you just take each of the main elements that remove them for each experience. And if itās positive, that means that item is negative to the page, right? Weāre moving to improve performance. And if itās negative, that means it was adding performance. I yet to run that test where the main promo image was a positive factor on the page, itās been neutral, but itās almost always negative, but how much time is spent trying to figure out what image goes there and what promotion and all those pieces, you know, thatās a classic example. You could be arguing over ideas of something that just doesnāt matter.
Andrew:
And also by taking this approach, you get rid of the, I feel, I think I believe conversations because thatās what a lot of business ends up being. Gee, I think this was a, well, I really feel our users. You know, I believe that this is the best way to do that. Like, those are just opinions. Great. You can have them, you can not have them. You know, the classic thing I always tell people is I can believe I can fly, but until I jump off the roof, it doesnāt really matter. The instant I jump off the roof until I make an action based off of that. Thatās what matters. And so you take those beliefs, but you donāt take the actions solely based off that you find what the right answer is.
Brian:
So all the assumptions that go into my pile of hypothesis, it starts with, I think you mentioned the, the real estate as being kind of the primary, that the factor of the type of change or the type of thing to test where you start.
Andrew:
Yeah. I mean, if you think about it, you could have the perfect piece of copy, but if no one sees it, or if itās the perfect piece of copy about the wrong type of offer or wrong type of thing, it doesnāt matter. Iām currently going through a process of looking at different types of offers for sales going into, you know Christmas season, holiday season. And if you think about it, I can have the perfect free shipping offer, but a free shipping, isnāt the right thing to do. What does that do for me? You know? And so you always start with what the higher element is. Thereās also pieces of that, which is a lot of copier, contextual changes tend to be very dependent on other factors. So they donāt tend to be as permanent as long lasting. Whereas real estate tends to be more of a primary factor and tends to be much longer lasting, especially if you see results over time.
Andrew:
So youāre getting kind of a double benefit there. And again, you can use the concept of the copy to say, where should this be? Or what should I even focus on? What matters most here is a copy. Great. Letās find out and I can still test that idea. I can still include it in that test, this alternative that you want to do, but thatās the least important part of that test. Yeah, youāre right. Great. But if three other options would have been better, even if thatās positive, youāre the other options are more important, right? And so itās about whatās best, not just, whatās better, what your opinion is. Donāt get gold stars for your opinions being validated.
Brian:
So in your experience, the question of does copy even matter. I think I heard you say for a given site, for a given site, you will find that among the other three areas, the presentation slash design, the copy and the functionality that theyāll matter, or they wonāt and the relative ranking of them
Andrew:
Relative. So a copy change might be worth a 10% gain over a few weeks where a function change might be worth on average, a 20% gain over a few months. Doesnāt mean, copy doesnāt have impact. Itās just relatively, itās nowhere near the same scale. Right? And then you also have to look at a probability of an impact. In other words, how often do I copy to get something out and how much cost and time does it take? Because thereās opportunity costs. If I run a test for copy, Iām not running another test. Right? And so all those factors kind of give you the efficiency of any type of change. You learn that over time, the other thing keep in mind, those sites change users change. And so youāve got to always be evaluated now just because something was true four years ago, doesnāt mean itās true today.
Andrew:
Just because something was true six months ago, it may not be true today, but thatās also, if youāre constantly testing and youāre constantly going in every direction, your siteās constantly adapting to those new needs. Itās when you just make a single choice between things and push it. And thatās the end of the thought like things change that environment even from was the best they had. It may not be the best now, but if youāre constantly in a process of evaluation for what matters and figuring out that, looking at the best options, youāre constantly adapting to your users and your site needs.
Brian:
So early days, if we talk about starting from zero or from the current state of of a site, you, you begin with the real estate, you begin with experiments that tell you where to focus, where which elements matter, which ones donāt, you mentioned the, just the show hide show hide kind of up and down the page is that, is that how you land at that understanding of what matters
Andrew:
There? Thereās a couple of different tools you can do in VTS, depending on the page layout inclusion, exclusion is almost always the easiest. And the reason itās called inclusion and exclusion is this the same test, add something to the page, just because I add something, I still need to know who the relative performance of it. The thing that those type of tests will give you is this thing matters, this thing doesnāt. And what that means is you can then apply your resources towards that. Iām going to go a little different and use a classic button MPT. So the three main factors that people test on buttons are size, color, and copy. Some sites color matters, site size matters. So likes cocky matter, but if Iām only focusing on color and it doesnāt matter on, thatās fine, it doesnāt matter how good a color I choose.
Andrew:
And so figuring that out helps me focus on that. A simple three by two partial factorial test is for experiences. I can add even other things on there, but that piece of it, but knowing that copying matters most, thatās the test theyāre just impacted with and color doesnāt matter, shapes everything we do in the future, right? So Iāve learned a bunch there and Iāve gotten a win. And so the same thing can be true if Iām discovering a page, Hey, these three sections of the page have no impact. These two sections of the page, just remove them from get a win. And these two sections on a page are really important. Well, Iāve completely shaped my page based on some basic information, but Iāve also shaped my next four or five actions. Right? So letās take that element that matters. What factors of it matter what other concepts can I do with that space? What other things can I do? You have all these big assumptions you can tackle them at what makes the most sense. And so you just, you always taking things to a level or a couple levels above, and youāre still tackling the same question. You just tackle it in a very different way.
Brian:
Got it. So not starting with, what do I think what might be better, but starting with what, what even matters on this page, on this site and in the process of learning that and the process of figuring that out, you get some wins just by removing some stuff that was actually net negative impact just by being there.
Andrew:
Sure. But if you go back to that button thing, once you figure out color matters, you can do the classic 41 shades of blue test or whatever matters you can take inputs. Right. it just, as long as it doesnāt limit your pool, if I figure out color matters. And I think that red matters most grant number include red, but Iām also gonna include yellow, black, green, blue, and you know, other options that if Iām right, Iām right. If Iām wrong, great, something else out performed. And so thatās what the core there is.
Brian:
Itās kind of, I see it like a sort of honing in on the, the exact experiment that you need to run first, where which elements were on the page were on the site. And then what kind of change. And after a series of experiments, measuring the relative impacts of different kinds of changes, right? Presentation changes, copy changes, functionality changes you, you land at, okay. We know which of those types of changes is going to be most impactful relative to the effort it takes to test it. Now we test.
Andrew:
Yeah. But thereās another piece in there too, which is really important that gets left out of this, which is if you did only what you said, youāre still going to arrive in a local max month. So the goal in every test is to include at least one option a itās just a challenger, hereās a new layout or hereās something else. So Iām always bringing in outside inputs to possibly change the system. And so you will incorporate those different pieces of it. You noticing another, this Iām not specified which experience or this specific idea, the support part, all I care about is the pool, the pool and how independent and how wide those options are. Only part of it matters. It doesnāt matter. Blue, red, yellow, green. It doesnāt matter this section, that section because everything is evaluated the same way. And if youāre right, youāre right. And if youāre wrong, great, we get a better outcome.
Brian:
So in the case of this, this challenger challenger variant, or whatever we want to say, if we have landed at the point where we decided, okay, itās this element on the page and itās this button and we are going to chest color. And so therefore weāre throwing in several colors, not just the one I like and the one you like, but a spectrum of colors did, what is the challenger look like in that context is a crazy color. Are we throwing in something that is outside of a color change?
Andrew:
So let me give a example from a company I used to work for way back in the day. So I used to work for Malwarebytes helping onsite and in that optimization for both B2B of B to C type business. And so our primary goal on the site was to get people, to download the original tool. And so weād looked at the main download page and came up with as many different options and executions of those concepts. Some of it was changing out the bane interaction piece, someone with a different value prop, some of our was different layouts, but according to that, we also said, Hey, what happens if we take our B2B piece? And it could in this, how much does it cost us? How much they gained? It had nothing to do with that core concept, but it was a left field idea, but it was able to discover that, and the key here is not just taking that idea, but we did two executions of it, right?
Andrew:
Cause I think, well, execution, doesnāt tell you if itās execution or the concept. And what we actually discovered was on the B2B piece included on the B-to-C type test a specific type of offer in their, in this case, a free trial actually increased the B to C downloads, but also blew up our B2B side of things so much that we had to double our sales staff. So we took this outside thing learned. I never done to get me here and why Iām not gonna explain why your guess is as good as mine. It also doesnāt matter. None of this matters online. Why is the least important part, but we got both outcomes, but we took this idea that had nothing to do with what weāre doing, but it was another thing that was feasible in that spot. And another thing is relevant to the business and we brought it in. And by going through that discovery process, we ended up in a place and the impact of the business. We never imagined. But the key with that wasnāt that, that one, the chemo, if it was included in the first place, right. I donāt control it. I do not control what the users react to. I only control my ability to find it out.
Brian:
So letās dig into this example because this, this sounds like it kind of frames a lot of the discussion weāve had about that was kind of theoretical, starting with malware bikes. Can you, can you talk a little bit more? So thereās a B2C and thereās a B2B line. Can you talk about sort of, I guess the well briefly whoās it for? What do they get out of it? And a little bit about the model and how itās monetized
Andrew:
So-So Malwarebytes is anti-malware security software. That also is part of a larger suite that does anti exploit and a lot of business tools pieces. Prior to this time, the business side of thing had always been initial sales interaction, manual touch to a sales process where free B2B to B to C users for more of a freemium model, right? Download it, try it for free upgrade. But those lines had never really crossed at that point. It wasnāt like people were getting the B2B, the B to C offers and then going, which Iām no mention of the business side instead of one or two messages that wasnāt like tied together. So what we said was, if our goal is to increase, downloads or increase the value to a business, we thought downloads were the way it was going to work. Letās look at the main interactions points of that and that download page as a lobbyist point, right?
Andrew:
Highest population, high scale. So then we looked at what types of changes we can do on that page. We looked at the structure and looked at what our resources were and were climbed out. I think in this case, we did a pool of 12 ideas. We narrowed it down to, I believe, five ideas, couple executions of each. And then we took whatever other concepts that could fit in there. We took the B2B side of things, which is not something that was even thought up on that page and test it out. Now, Iām also gonna point out when we tried B2B messaging and other parts of this user flow, and it didnāt work. In fact, in this test, I mentioned that there was two B2B things. The other one was very negative, but this case, the specific execution, that concept, and we test it out after to prove these things tied in and they increase not just the B2B side, but then total downloads everything.
Andrew:
And thatās also why itās important that you know how youāre going to make a decision on these things going in. Itās not just goes up. This goes down, we had a way to value every action a user could do that made sense the business and which is what was the best option. This case happened to be the one that tied to B2B. It did help B to C, but that wasnāt the part of the decision that was improve the overall performance. And so we took start with concepts, filtered down the pool into the widest beta. In other words, the widest range of things that we could feasibly do, took other input, added it to that same thing, give it the exact same work and treatment executions of the other concepts. And then let the data discover what worked for us. And we got an answer that blew our minds.
Andrew:
It also had a massive impact to the business and those are the best moments I live for moments when everything I believe is wrong. And thatās great. I hate when a test wins, Iāll commonly get people to vote on whatās going to win. And the thing that people want wanted to win wins, but Iāll also be honest. Iāve never had that happen more than two times in a row. And Iām running hundreds of tests a year. Iāve never had someone over a year be more than 18% success rate on picking winners, which again, industry average is 10. Letās think about that. That means 90% of the time or 82% of the time they were wrong. So why am I trying to use them to pick the thing that matters? So Iām taking those inputs, but thatās not the system. Itās not the process. Itās not the thing that Iām evaluating.
Brian:
Well, Iām just to see. Thereās just so many ways to be wrong. That itās actually not that surprising you, itās not necessarily that your idea is bad. Itās just that it isnāt different enough to make a difference.
Andrew:
It just could not be the best option if itās a 5% win, but I could do just as much work and get a 50% win. Well, the 5% is a waste of time, right? So it doesnāt matter. This was better. This was worse. It doesnāt have any impacts on the business. Itās, whatās the best use of resources. Whatās the best answer for our users, for our business. How do I discover that?
Brian:
I love the story. Iām not, Iāve not seen a lot of people testing like this. If I heard you, right? You said you started with a dozen, 14 concepts and narrowed it down to what six,
Andrew:
I believe we ended up with a control for rural Maine alternatives to a B to C alternatives plus the B2B alternatives. And then we did two executions of each. So I believe the test ended up being 11 experiences of 12. Okay. Start again. You start with the concept phase, youāre narrowed down, then you do the work to build it out. And the concept for chosen some by, Hey, I think this is works, looks great. And the scooter, but itās also just, whatās the pool. Like a lot of the concepts end up being very, just small changes on the same thing. Right. I can write a value prop. I can do 50 different versions of a value prop of a value prop doesnāt matter who cares. And so got to start at those core assumption pieces. Can you talk a little
Brian:
Bit about concepts versus execution?
Andrew:
Sure. So I think that I should promote a sale on a specific type of product on my e-comm site. Okay. Thatās a concept. Thatās a test on you. Someone comes to you. Well, the concepts in there is sale product, right? Whereās the assumption in there thereās other factors, but letās just take those two core ones. Well, what other types of things could I do? I could do a main promotion of value prop, a merge brand all these types of concepts that concluding that. Right. Well, to figure that out, I include all those as much as I can. And then the execution of those, in other words, not just this specific sale, but a couple of different types of sales, not just a value prop, but a couple types of value props and pretty good range in between them. Itās not about picking them at that point that fails matter or value props matter, the specific one sustain, Hey, letās go deep, dive deeper into this and get a win.
Andrew:
And so if I explore, say that download page and I want to do a direct sales message versus the standard freemium download versus Hey, a value breakdown or other other products break down. Those are all concepts, right? And I can execute those in a million different ways. And so thatās what we did. We cooked those high level concepts and then executed them differently. Now I do want to point out, ideally, youāre doing more than two executions because itās only two data points, but itās certainly better than one, but letās say we do it three or four, which Iāve done, or you do a couple, itās just giving you a data point to say, Iām also going to do this from the future. Itās not only do you have a breakdown of the core concepts here, but every test has a natural next step. Now you may choose to go in a different direction, but this is showing optimization as a process, as an ongoing thing. Itās not, I ran a test. The test is just the means to me. And itās all about how do I evolve this and how do I constantly build this in and constantly keep adapting as the systems change as the inputs change as the users change. And itās an ongoing thing. As I tell my teams all the time, the grind never ends,
Brian:
But it sounds like you are learning a lot more per experiment than the folks over in idea validation, land, where the only way.
Andrew:
Yeah, but Iām also just winning a lot more. I last four years, I had over 90% success rate. I went six years without a failed test. You know those seem insane to people and Iām not like I will break down every type of statistical breakdown you want. Iām a math nerd at heart. And Iām have much harder rules than what people rely on confidence stuff, but Iām learning, Iām getting a win and Iām allowing myself to get a higher scale win because I can have multiple winners by doing this process. Another way to look at this is you can model out, letās say that I run a hundred tests with two experiences, 20 tests with 10 experiences, 10 tests with 20 experiments, right? Iām testing 200 options, every single and I can model out based on expected outcomes for most businesses. Just fantastic.
Andrew:
Just what normal industry, what the best mix is for them, because thereās a happy medium there. Right? And what youāll find with most sites is itās either 10, 12, or 15 experiences as the ideal. Now you may not be able to run that minute, but itās still showing that the most you can do. Thereās also too many, right? I could have 50 experiences and only run two tests and it takes forever. And the data is useless. So itās finding what the happy medium is for you and maximizing that each time. So Susan learned the rules of how to act on the test and what matters on study and how to learn on a site. These things, all feed each other, right? It all improves the optimization process and the business outcomes, right? And so itās all built on the same concept of how do I explore? How do I discover and how do I exploit information?
Brian:
I want to go back and I think youāve explained this, but I just want to hit this again. You mentioned the widest beta, and thatās not a phrase thatās super familiar to me. So just want to be sure that itās clear to anybody tuning in. Why does beta refers to the, the range that the different NUS among experiences?
Andrew:
So if you think of Delta, Delta is the difference between two things, right? So the Justin, if you think about Lyft, itās a Delta between two points. Shana is looking at a pool, a series, and what the range is within their right. If Iām testing out, Hey, Iām going to do a, a direct message, copy a personal drug message copy and an impersonal direct message copy during a small rate. But if Iām going to do a value proposition versus an offer versus a generic message versus nothing versus other things, much wider range, right? Itās still including the same option on there, but itās just how you tackle that. And so what youāre looking for is a way to get your team to go where theyāre uncomfortable. I always tell people if Iām not including one or two options on a test that make you uncomfortable, or you really hate, Iām not doing it.
Andrew:
Right. So youāre just trying to create rules to make people think in terms of how do I get ideas that are independent of each other. So theyāre not all just pinned on opinion and how do I get them to be as well as different from each other as possible. Mostly gives me more signals. I can refine down. Itās really hard to refine out, right? If I think that color matters on the button, only test is color. If copy matter most Iām just wasting time. Iām just spinning my wheels. But if I could figure out not just that copy matters, but I test out 12 different options of the copy or 41 shades of blue that Iām going to get the right answer. But thatās the last step. Itās not the first step. The first step is trying to get to the most learning and the most things I can go towards, I should, you should never be able to say that this is a test of a run four months from now because that test should be dependent on every test that we had before. So you donāt
Brian:
Operate from a six month testing roadmap.
Andrew:
I may have concepts. I want to tackle and places and locations and things I want to focus on, but every test should be, should be relevant or interact with the test prior to it, even if itās just discovering a new part of the user experience and figuring out how that prioritizes against other efforts, every test is a choice not to do other things. Every test is also a chance to improve other things. And so you got to think of that as an evolving adaptive system. Itās not a series of actions where I test this versus that. Like, thatās not how you should ever think about testing. Testing is the constant process of discovering how to improve the user experience and to improve the business outcomes.
Brian:
So letās go back to this download page for Malwarebytes and I, if Iām to tie in what you were just talking about, which is sort of getting, getting out of the teamās comfort zone and including experiences or options that donāt immediately seem obvious or necessarily comfortable. Is it fair to say that this, this download page is sort of thought of as a B2C asset and that throwing in the B2B offer was a little out there?
Andrew:
Our head of sales absolutely hated us including in that test and was adamantly opposed to it.
Brian:
Okay. But you read it anyway. Yeah. Okay. Okay. That healthy, healthy decision.
Andrew:
Whatās the worst case. Heās right. Great. Itās just one of six or seven other experiences, worst case, best case heās wrong, which he was, and weāve now found a way to completely change our business. Itās when you start going with the assumptions in just opinions is when you, you get all the prompts, you can take it as an input, absolutely. An input. You may even have a higher bar for action for that thing to happen. Right. If itās equal to a B to C option in this case where you can go with the lesser option, but if itās dramatically better and you have ability to act on it, why in Godās name? Would you not do that? Yeah,
Brian:
Letās, letās talk about that too. The, the success metric or the bar that you have to cross to declare a winner, because you do have two very different types of actions that youāre measuring. And I think I heard you say you have a way to, to measure the value of every action or every key action
Andrew:
Action that is relevant to business outcomes. So hereās a classic example, a balm trader can people to go to the shopping cart or getting people interact with search or interact with us, the recommended items, module, getting people to click on a button. None of those are outcomes. Those are just means to an end. But if I know what my average value of a download is, and my average value of a trial is, or my average value of people fund up for a newsletter or average value of people reaching out and downloading a white paper, whatever those is, if I can aggregate those, then I can monetize the value of the session or the user. And then I have a common language to, to evaluate things. And itās not, does not matter if itās perfectly 37 cents or 38 cents. Like that is an arbitrary decision. Thatās going to change over time.
Andrew:
What matters is that itās equal and itās representative of the business, right? If I know that a B2B download is worth 10 B to C downloads, then it doesnāt matter if I value at one in 10 or 10 and a hundred or five and 50, itās all the same value. Right? And so I just need to be able to, to have a way to measure that and validated over time. So the key here is that you constantly viewing that and adapting it. You donāt just set a rule and come back to it. Five years later, youāre constantly seeing if it does have that same value.
Brian:
So it doesnāt always play out the way it played out in this particular instance where on the download page, by bringing in another type of conversion, both went up, if both go up. Well, thatās easier to analyze, I assume, but in the case where one goes up and the other goes down, itās a matter of degree. Itās a matter of, well, how much up was up. And
Andrew:
Thatās the entire point. If Iāve made it into a monetization, itās not two things Iām comparing this one, right? It could be the neutral physical variant. Itās one. What is the value of this interaction? Did it go up? Did it go down period? Itās not, does it help this group or help that group, or does it help this one person hit his goal? Or, Hey, this one person really, really wants to focus on what heās focusing on this month. Itās what is best for the business. And again, one of the biggest problems is people always think they are great at it. And people suck at making rational decisions. So youāve got to make it as easy and painful or painless as possible. Always think of this success. And failure is determined before you launch, not after if you donāt have a way to act on it, it doesnāt matter. It doesnāt matter what wins the test as long as somebody wins and you can act on it. So you gotta make sure you can do those two things.
Brian:
Got it. Well, while weāre on revenue, I guess, and value to the business, I want to talk a little bit about resources because I think maybe one thing that leads to the idea of validation, school of optimization, where itās just have an idea, implement it, run a test. Repeat is the maybe relative simplicity of the design implementation analysis. Can you talk about the team behind this? What did you say? Maybe refined down to half a dozen concepts and multiple executions up to 10 to 12 variations. Thatās a bigger test than a lot of people are running. How does that come together? How do you need more of a team or just a different mindset of what
Andrew:
You make plans to the resource that you have not getting resources to the plan yet, but letās be clear. The same test is executed by the same resources and almost the same time. So think about this, how much time does it take everyone to agree on the perfect execution and get signed off and do revision forms and to focus on being Zack execution at every detail of a single where if youāre focusing on a pool and I always tell people 80% is better than a hundred, how much time is saved by not doing that. Most of these things arenāt reinventing the wheel, youāre taking something and making a slight tweak to it. The thing you actually canāt get back is time. And when, I mean time, itās time for a test to be live. So if I have two weeks, this test has to run.
Andrew:
I want to get the most out of that two weeks. Cause I canāt do other things during that time. And so I make plans based on the resources I have, the what traffic allows and those things. But the piece that everyone thinks is itās so hard to get a single thing up. And it has to go through this validation because of all the fine off there is, should be as little sign off as possible on a test because worst case it wins and you keep refining. It itās live for a day before you start running the next test on it, or a couple of weeks, if you want to run a test in between. But the thing that everyone does is they get stuck in this world of, I have to go through all this work to get the single thing and it lives forever. So it has to be perfect. And so youāre changing just the mentality of how the resource you use, not necessarily the matter of resource, but that being said, even if you have to run fewer tests as I did before mathematically, itās always in your favor,
Brian:
Right? If you count experiences, youāre running fewer tests with more experiences protests. So youāre not actually testing less. Youāre just giving yourself better odds.
Andrew:
Itās not like itās linear. It doesnāt take 10 times as much effort to run a 10 experience test than it does. A two experience does true. Well, in that case, it would be a and expense with what youāre looking at is what do I do to maximize the time I have, and also to make sure that something can happen as soon as something else has done, because there was other health problems. We ran a test. Weāre not thinking about it. And it ends, we execute it. Okay. What are we going to do next? You can never get that time back. So Iāve watched the task. Great. Letās work on the next one. Great. Letās work on the next one. Great. Letās work on the next one. The grind never ends Tescoās life. Thatās just the next action in the day. And so if you constantly doing this, it frees you up.
Andrew:
It lets those resources to be used in a way that they find most efficient. Weāll ask people to explore new ideas that last people to do things. Iāll be honest sometimes during this thing work over and over again, but at least itās value out of it. So imagine doing that same thing and for six months and nothing actually changes, you just made a new color, a new button, new function. And thatās really what happens when you think about the classic product roadmap. Thatās what that is. Sometimes it works. Everyone has a belief thatās going to work, but as weāve seen before, belief is a scratcher tickets. One to 10. Letās not focus on you know, dumb luck.
Brian:
Okay. Thatās a surprise. I really, I didnāt see this answer coming, but it does make sense to me that I think I come from an execution background and I used to just build tests and JavaScript all day. And so I think of that phase of the test life cycle as being very, very time constraint, deadline driven, and I guess optimized bubble. But when you look at the full test life cycle, youāre right, approvals and conversations are the bulk of that time. And if youāre adding another 10 variants, those donāt get any,
Andrew:
How much is your life simpler? If you just code tend to start with and say, you killed two, thereās no review processes back then just kill to go to least favorite skill. Is that really more time? Thatās a simple way to execute. It takes what youāll think. And you get to be freed up. Hereās the thing that everyoneās missing. The designer, the engineer, the product people, the senior executives, they get the freed up. Theyāre no longer stuck on this. I have to defend this point and validate the point and find ways to prove this point. Itās whatās possible. Letās think letās go outside the box, the other one right or wrong, but it frees up all that time. Itās not using more time. Itās just using time differently.
Brian:
All right. Well, so this, I love the story. Again, itās different from a lot of the stories that I hear because the typical narrative is there was this great idea. And then we tested it at one or some, some variation of that story. And this one, we weāve heard very little about the actual winning experience. We know that you are on the download page and you brought in the B2B free trial call to action in some format. But the actual details of it, I mean, it was one of 11 experiences that weāre running. And itās the one that just,
Andrew:
I could not care less. What wins, what wins the least interesting part of the entire process. What matters is did I do all the work to get there? And that work is more mental than it is technical. Not to evaluate technical in any way. I know this works, but again, freeing people up to think differently and to use their time differently and use their skills differently while still getting an output just makes everyoneās life better. And it makes the business go
Brian:
As someone whoās done a lot of technical work in this field. I personally appreciate the forethought. I think, I donāt guess every developer cares per se, but I think on average, if youāre going to spend your time to build out experiment experiences, you want to think that that matters. You want to think that some forethought went into the experiment design and that a winner will come out and that your code will live on versus just being tasked to
Andrew:
No, just that if you have an idea, we include it. The ideas come from anywhere. Itās not just bottom, top down. Youāre working on six of these. Hey, this other idea, letās just do it, right? Thereās no difference there frees you up. It makes you as much of the thought process as the senior executive or the optimization manager or anyone. It just makes it so that everyoneās focused on the same outcomes. Itās not just, you know, handoff, handoff, handoff.
Brian:
We would like to take this. If, if I just stepped into a role as Iām running growth for some sort of a SAS company, and I just walked in the door and itās my first week, Iāve got to get this stuff off the ground. I donāt where to start. What would you tell me? How, how would you tell me to approach? Maybe just thinking in terms of the website, maybe we just gloss over things like the, the team and whatnot. Assume I can get stuff done, but just how do I, where do I start? Iāve got ideas, but Iām not just going to start testing ideas.
Andrew:
Well, so letās take a step back. What are the big things that you have to tackle to make a test work? I need to make sure I have an infrastructure to track and test something. I need to make sure itās executionable, which means the organization. And then I need to figure out what matters and I need to make sure people know what matters. So those are the things Iām gonna tackle. I mean, look at my infrastructure while helping to educate people. And then also trying to maximize the amount of work for and does in the short term, for what I mean is nothing teaches us better than doing it. So if I can kind of get a team to even force its way through one or two of these actions, it becomes natural, but it just seems so forth. So I used to joke that whenever I come in or I start things, Iām doing a lot of whiplash because I want people to think differently.
Andrew:
I want people to act differently. So it might be as simple as, Hey, I want to see if we can execute on this. How about we run an inclusion and exclusion test, which is code wise, almost nothing like display. But letās see if we can make sure everyone knows this is what weāre trying to accomplish. Hereās how weāre going to measure it. Hereās the tools weāre going to use. Letās execute on it. And that takes so little thing because, okay, I think this matters most great. Letās include it. I donāt think this matters. Great. Letās include it. Right. Everyone has opinions there. Youāre not saying that this goes away and itās never coming back. Youāre just saying insecure execution. This is positive or negative. Letās focus. What matters. Letās get rid of things that donāt, you know, and so itās a very simple concept, but it allows people to kind of see all these pieces and Iām always trying to optimize, can I act on stuff? Can I execute stuff? Can I get the bid or pool? And all three of those Iāve learned through that process.
Brian:
Okay. So letās, letās say we do have good enough analytics and we are able to execute on this include exclude tests. Letās if we canāt, letās, letās start to design it here just to make sure that I, I, I can see what this is going to go, how this is going to go.
Andrew:
Sure. So every page I used to joke, I make pastel boxes of them. If you take a big screenshot of an entire webpage for more than whatever, take blocks sections of the key functional areas. And it will literally look like Legos. I used to work for Adobe. I donāt know if they still do it while the Adobe headquarters right behind the front door. If you look back up, it had all the main webpages by blocks and color pieces. Letās see, figure out what all those big blocks are. And then each experience just removing each of the blocks one at a time. So the concepts were really easy for experience. Yeah. And then I figured out what the most I could do from a data standpoint. Ideally, a test is running two to three weeks, sometimes a little longer, sometimes a little shorter, but never less than about 10 days at the festival over do someone.
Andrew:
But what can I maximize the learning that time? Can I only test six? Things are going to test eight? Can I test nine? As soon as I have that number, itās just picking the ones that make most sense and Iām picking them up based on what I think matters. But whatās the biggest swings, right? If Iām only gonna remove these little tiny sections, thatās not gonna have as big an impact as were moving them tire promo section. Right? So Iām just picking up the biggest pieces or, yeah, thereās a control. Everyone knows that. Thatās what makes clear. The only thing thatās confusing at that point is you have to think backwards. If I get a positive outcome, itās negative. If I get a negative outcome, itās possible. Thatās the hardest challenge honestly. But the other piece of this is can you get people to a line on how to define on that is a positive for this group or that group is a positive for the business?
Andrew:
So thatās actually usually a hardest work, very rarely a tactical problem. Itās almost always an organization defining a success metric that everyone agrees on, or at least valuing actions, right? Because everyone has the difference between what is good for them. And whatās good for the company, right? Are you trying to make someone happy or trying to grow the company? And it puts you in some very weird spots at the same time, it allows you to cross teams in a way that no other piece of mission, can you gotta be everyoneās best friend and everyoneās worst enemy. You gotta be the guy that speaks truth to power. Your job is to tell people the wrong period and get them to see thatās a good thing. The best case scenarios, youāre wrong, worst cases. Youāre right.
Andrew:
And so education starts from before day one. Itās part of every conversation. Itās the hardest part, but youāre educating the executives. Youāre educating the design team. Youāre educating the engineers. Youāre educating your analytics team. Youāre educating everyone because youāre asking them to go in the same path. Youāre asking him to never stop it. Doesnāt end. That test goes live. We designed that test. You know what Iām doing that next second, it goes, live, getting the next test. Ready. It never stopped. Iām also educating people. And one of our do based on that result, but that test ends. I got to be able to do something. And so what you end up, cause thereās a bunch of tests or things that are happening sequentially. And how do you execute on those? That makes sense. Yeah. And so the other part is you need to think about how to value actions differently.
Andrew:
So you mentioned everyone has their, you know, ice is common or thereās things. I break it down into the most mathematical place, possible scale population, times the number of options and the beta of those options, right? How big a pool do I have? And can I actually get them independent? Divided by cost cost is really easy. Itās either time, money or resources, some combination in the risk. Itās not, I have to go buy it as is. I need eight dev hours versus 10 dev hours versus 87 dev hours. Remember I talked about functional before and it comes down to the option. If I think Iām really confident that bread is better than blue and Iām only gonna test red versus blue. It fails this test, but a farmer test, every color goes way up. Red happens to be one of them. You might be right. And so itās how do I maximize those resources? How do I prioritize those resources? The same thought process.
Brian:
Got it. Can you run through the formula one more time for this is for prioritizing different tests that you might run.
Andrew:
I think about any action, any part of the business scale times influence, influence in this case is going to be the number of options in the beta of those options. Right? Okay. Over cost.
Brian:
Okay. So you, you lose points. The idea loses points. If it, if the scale is low, as in, itās only going to be seen by a small subset of, of visitors, for example, loses. If you donāt have very many options or if those options arenāt very different from one another. Yes. And then it also loses points. If itās just going to cost a bunch of time resources to actually execute.
Andrew:
But remember that includes things like review process and approval and decisioning, and indicating that the same time, your time is not less valuable or more valuable than the engineer or the designer or the executive. I tell everyone they have the same job title or everyoneās job titles, the janitor, we all clean up messes. And so valuing everyoneās time in there. How much effort is it gonna take to get this going? You know, so thereās no confidence in there. Thereās no voting, thereās no opinions. Thereās all that thing.
Brian:
Okay. Well, so this Iām about to have a very different first week at this startup, after this conversation, rather than come up with some ideas and start throwing tests out there and just hope that something wins.
Andrew:
I understand this all probability. Thereās a chance that I buy a lottery ticket and I win the lottery. It doesnāt mean itās a good investment strategy. So youāre just trying to improve your odds and constantly get the best outcome. Think differently, execute differently, speak differently. It starts from day one and it never ends that youāre doing everything part of this process differently. And thatās why you have to start from day one. You canāt go suddenly, Hey, youāve been doing this thing right for the last three years, but I think Iām gonna try something different. You gotta be able to tackle yourself first and set that groundwork first. And it starts before you walk in that door and it ends well after you leave that door, youāre doing everything you can to make sure that everyoneās on that same page.
Brian:
We could go on and on. Weāll have to get back together and go on and on. But I think thatās a lot for now. Thatās a lot for me to digest. So thank you so much for your time. If somebody is listening or watching and they want to find you on the internet, where should they go?
Andrew:
Yeah. Thanks you again for having me here. You can find a lot of my writing on sites like conversion Xcelās blogs. I have my own personal blog called testing, discipline.com. Iām an active on Twitter under @antfoodz, but really feel free to reach out. You can find me on LinkedIn, Twitter, or any other thing, happy to help, help you to help fill anyone in the industry. Thatās trying to get better. So really appreciate you giving me the time to talk to you and talk to everyone.
Brian:
Thank you so much.