by Selena Blue
While at Optimization Summit 2013 in Boston, multiple marketers said while they thought they were testing correctly, Summit had left them thinking otherwise. Some might believe testing two treatments is easy to do: Just create two treatments, set them and collect the data. However, many pieces need thoughtful construction to achieve a valid and valuable test. To complete a successful A/B test, or any marketing test, planning must be the foremost step of the process.
So what better way to show attendees how to properly run an online test than to perform one live during Summit? While we normally feature case studies from brand-side marketers, this week we turned a reporter loose on one of our own efforts — the Optimization Summit 2013 live test.
Read on for a behind-the-scenes look at of a test you
were the target customer for, and perhaps even helped to create.
During the two-day event, MarketingSherpa and MarketingExperiments partnered with Hoover’s, a Dun & Bradstreet company, to run a live A/B split test.
"The purpose of the Optimization Summit live test is to show participants the process of what a live test is all about and how to run one," said Spencer Whiting, Senior Research Manager, MECLABS, and lead researcher of the live test.
The plan was to present the Summit audience with a webpage where a few key variables could be changed and tested. Then, the audience would vote on options for each variable to create one treatment to run against the existing control.
Whiting said the most pressing challenge was the timing issue. The team had a little over 24 hours to select the final treatment, launch the test and reach validity in the results. To pull it off, Whiting and the MECLABS team conducted many steps of careful preparation.
Step #1: Select a page and product to test
The MECLABS researchers worked with the team at Hoover's to select a product page to use for the test. Whiting said you must understand your ideal prospect and what their perceived needs are.
For the Optimization Summit 2013 live test, the team selected a 30-Minute Marketer publication, which has a $47 value, as the lead generation incentive.
"Once that product choice was chosen, we used the MECLABS sales page as the basis of our control," Whiting explained.
Step #2: Decide testing values and variables through collaboration
"Once we had that, we started collaborating with our team as far as what were the variables and values we work from," Whiting said.
Peer review sessions
For the live test, Whiting presented in two peer review sessions (PRSs) to generate ideas about different variables to test on the selected the page. PRS is where MECLABS researchers, data scientists and other teams come together twice a week for internal strategy sessions. These can include getting testing suggestions, presenting test results for conclusion feedback or just some last-minute treatment design advice.
During one PRS, Whiting recruited 10 to 12 researchers to partner up with one other person they don't normally work with on a day-to-day basis. These two-person teams completed analysis on the control and designed wireframes with their suggested changes integrated into potential treatment pages.
Call for ideas on blog
Sometimes marketers can find themselves working in a bubble, especially when a particular methodology or set of best practices guides your team or organization. That's why sometimes going beyond your team can produce the best ideas.
For the live test, the MECLABS team turned to the MarketingExperiments Blog audience for additional ideas. In a blog post in the weeks leading up to the Summit, Daniel Burstein, Director of Editorial Content, MECLABS, provided the audience a little background and a screenshot of the two-step control. The audience then left their test ideas in the comment section
for the chance to win free enrollment in the online, on-demand MECLABS Landing Page Optimization Online Course.
"We got good response from those folks," Whiting said. "That's why we do the blog post participation and the voting — to get people involved in understanding what's going on.
Put it all together
In addition to the resources above, the team also brought in its copywriter to help with headline and copy issues.
The test ideas funneled into four primary variables:
- Headline, which would also include sub-headline values to test
- Right column
- Page layout
"Then, once we had all that input, we got it down to seven or eight different layouts, probably a dozen headlines, and a dozen calls to action," Whiting said.
The team narrowed those options down to four or five of each variable to put to the audience vote. Whiting said the team "wanted to give people a real difference in that situation." The live test is more than just showing the audience how to run a test; it's also about engaging the audience and bringing them into the process.
Step #3: Work with IT on quality assurance of treatments
Because of the time limit of the test, the team prepared as much as possible before Optimization Summit ever began.
To create a treatment from scratch after the vote would have been too time-consuming. The solution was to build out the different page options ahead of time, with the layout and right-column content as the two larger parts of the treatment design.
"The development team did an excellent job at getting that stuff up and running. We had four different treatments and layouts set to go with four variables within them. We had them fully tested out before [Summit]," Whiting explained.
IT also completed quality assurance on the potential treatment pages to ensure as many issues as possible were found and correct before Summit.
"From my experience of seeing people out there doing testing is that pages aren't necessarily fully QA'd [quality assurance]. There is a lot of detail work from the Dev end to ensure there are no validity threats if the page doesn't work, or if Firefox isn't tracking. There are lots of things like that we just have to be aware of and make sure things are working correctly."
Step #4: Choose treatment options with audience vote
Shortly after 9:00 a.m., Flint McGlaughlin, Manager Director, MECLABS, presented the live test to the Optimization Summit audience.
Attendees were given a testing brief worksheet
with all the variable options listed and background about the test. McGlaughlin encouraged attendees to work in small groups, where they could discuss with other marketers their thoughts on the best option for each variable. Then, using an online survey, via either the Op Summit mobile app or a URL, attendees submitted their vote for each variable. The online voting platform sped the voting process up and made it easier to determine a winner.
"It was interesting seeing the voting. Frankly, between all the different demographics, there weren't that many differences in what people voted on," Whiting said.
What didn't surprise Whiting was each audience-selected option winning by a significant amount. At MECLABS, a particular methodology has been found to work, and he said many of the attendees have been exposed to that methodology, leaving them with similar conclusions about the best option for each variable.
All four selected options won with 50% of the vote:
Variable #1: Layout
- Option 1:15% of vote
- Option 2: 50% of vote
- Option 3: 15% of vote
- Option 4: 20% of vote
Variable #2: Headlines
- Option 1: 13% of vote
- Option 2: 10% of vote
- Option 3: 50% of vote
- Option 4: 10% of vote
- Option 5: 17% of vote
Variable #3: Call-to-action
- Option 1: 50% of vote
- Option 2: 13% of vote
- Option 3: 10% of vote
- Option 4: 10% of vote
- Option 5: 17% of vote
Variable #4: Right column
- Option 1: 25% of vote
- Option 2: 50% of vote
- Option 3: 25% of vote
Final approval by senior management
While the collaborative nature is at the heart of this and other tests completed at MECLABS, a potential issue does lie in a committee developed idea. Whiting said McGlaughlin uses the maxim, "a camel is a horse designed by committee," to remind the research team to look at the whole picture when designing a treatment.
The entire page must make sense, and that can be a challenge when voting on elements individually.
Whiting said, "In the way the voting went, there was a possibility that the headline that won and the selected call-to-action would not match up. They would be discordant."
Fortunately, the audience voted well together, and the winning options supported each other well.
However, Whiting explained, "As the research manager, if the vote had gone in a way that just did not work, I would have gone with the strongest call-to-action or headline that won and put the appropriate messaging in for the other content. Obviously we want people's input, but it really comes down to the fact that it's one person's choice."
When testing, you want to take into account all opinions and suggestions, but ultimately, one or two people must decide what will be tested and ensure it all makes sense together.
Step #5: Construct final treatment
After the audience voted, Whiting said, "Basically, all we needed to do at that point was change the headline of the chosen layout because the right-column was already there with the testimonial, and the call-to-action was the stock one that was already on the control."
Development finalized the treatment
and did a final check on it and the control
before launching the test.
Even after taking proper precautions, Whiting said, "Once we got ready to launch [the live test], we had an Internet Explorer issue that we had to correct at the last moment. As far as the quality assurance piece, I think that was probably the biggest challenge involved."
The MECLABS team overcame that challenge, fixed the IE problem and was ready to launch the test just four hours after the vote. This was only possible due to the close working relationship and alignment with the development team, as well as careful planning prior to launch. If the test had been launched without proper QA testing, the results could have been affected and the live test might not have reached validity.
Step #6: Launch the test
After more than three weeks of careful planning, collaboration and preparation, the test launched at 1:00 p.m. on the first day of Summit.
"We drove traffic through three difference channels: the MECLABS lists, for MarketingSherpa and MarketingExperiments, and also the Hoover's prospect list. We sent out around 300,000 emails. We also sent out a sponsored tweet," Whiting said.
Step #7: Pull data hourly
"We pulled data once an hour, mostly — not in the middle of the night. Most emails we do an aggregate type level of confidence. But for this," Whiting explained, "we looked at hourly [data] to have a little more real life as far as demonstration of daily tracking of confidence."
While using aggregate data can provide you a good look at the overall results of your test, looking at daily, or hourly data in the case of the live test, can provide a deeper look. Daily data can also shed light on discrepancies in the results.
"You just have to know that you're looking for it and see what happens," Whiting said. Daily, or hourly, data collection lets you have a clearer view at which to look for validity threats.
"The 6:30 p.m. and 7:30 p.m. data points for [Tuesday night] show a slight variance
in the conversion pattern. This may indicate history effect [a validity threat]," Whiting reported to the Summit audience Wednesday.
While the shift wasn't large enough to affect the aggregate level of confidence, he said an external event, such as dinner, may have affected the data. But, had the shift in results been larger, it could have affected the validity of the test. That's why it's important to meticulous track your data.
About 24 hours after launching, the treatment outperformed the control with a statistical level of confidence of 95%:
- Control: 66.0% lead rate
- Treatment: 71.5% lead rate
When revealing the results, Whiting told the audience, "By simplifying the layout and adding value messaging, 8.1% more people said "'yes' to your
treatment than the control."
The major change in the layout was going from a two-step process to just one. This reduced friction in the process, making it easier for visitors to identify what action they were to take on the page to get their free report.
The headline and sub-headline in the treatment increased the value of the offer. As McGlaughlin said on day 2 of Summit, "A title names something. A headline tells something." The treatment headline communicates the true value of the product rather just the names of it, which they can read elsewhere on the page.
Anxiety was also reduced in two ways: by immediately showing what information was required and providing a glimpse into the report with the pop-up box.
The team also tracked the results by channel for further evaluation.
MECLABS email list:
- Control: 72.9% lead rate
- Treatment: 77.4% lead rate
Dun & Bradstreet email list:
- Control: 60.7% lead rate
- Treatment: 68.2% lead rate
- Control: 14.29% lead rate
- Treatment: 17.0% lead rate
Both email lists reached a level of confidence above 99%. However, the social media channel only reached 27.2%.
"Looking at different traffic channels, the difference between channels was pretty significant. And due to the high trust and low anxiety involved with it, the conversion was very, very high — 70% conversion once opened. That was pretty incredible," Whiting said.
Because those on the MECLABS lists already have an established relationship with MarketingSherpa, the higher conversion level could be attributed to their higher level of trust with the company.
Whiting said that if further steps were taken, the next step he would want to test next is the email and how to drive more traffic into the page.
- Blog post comment section
- Headline options
- Right column options
- Call-to-action options
- Page layout options
- Testing brief
- Data analysis chart
SourcesMECLABSDun & Bradstreet
— Optimization Summit 2013 live test partner
Related ResourcesMarketingSherpa Lead Gen Summit 2013 — Sept. 30 - Oct. 3, San Francisco[MarketingSherpa Webinar] Email Optimization: A discussion about how A/B testing generated $500 million in donations — Wednesday, June 19, 2:00 p.m. - 2:30 p.m.Optimization Summit 2013 Wrap-up: Top 5 takeaways for testing websites, pay-per-click ads and emailLanding Page Optimization: Help improve this page for a chance to win an LPO Online CourseA/B Testing: How a landing page test yielded a 6% increase in leadsEmail Optimization and Testing: 800,000 emails sent, 1 major lesson learned