April 01, 2009
How To

How to Refine Your Database: 4 Important Variables to Track for Ultimate Segmentation Strategy

SUMMARY: Data is the lifeblood of a proper email segmentation strategy. But with so many database variables at your fingertips, how do you decide which ones to track?

This article outlines the four most important variables to accommodate in your database structure. It also includes tips on establishing processes to collect and use that information.
Your database holds the key to sending the right email to the right person at the right time, in the right combination with other marketing efforts. Consolidated data from CRM, Web analytics and your email system allows you to perform the right segmentation and make informed campaign decisions.

But even experienced database managers admit that it’s hard to figure out which variables to track in your database. “It’s the nature of the beast; I tend to go overboard and want to examine everything,” said Katie Cole, Director of Analysis & Learning, Quris (now Merkle).

The last thing your CTO wants to do is change the database structure to accommodate even more variables. Your best bet is to anticipate the most important variables to include in each subscriber’s record.

This article, adapted from a chapter in MarketingSherpa’s new Best Practices in Email Marketing Handbook outlines four major types of database variables to track.

Data Type #1: Endemic Data

Endemic data -- one that is truly unique to a particular record -- is frequently collected at opt-in:
o Contact information
o Demographic and geographic data, such as name, location, title and vertical industry (for B2B marketers), age and income level (for B2C marketers)

Endemic data also includes data that the opt-in volunteers as part of the opt-in process, often as part of supplying information via a preference center, such as:
o Projected date of a buying decision
o Budget
o Brand and format preferences
o Preferred language for communication
o Contact preference and frequency

Endemic data may be collected in several ways:

On the first opt-in page

MarketingSherpa and most landing page experts advise against collecting demographic data on the initial opt-in page; multiple tests have found that asking for more than an email address decreases the opt-in conversion rate. However, some marketers still choose to do so because of their need to obtain qualifying information to make the resulting opt-in useful.

For example, Intrawest Resorts’ Email Marketing Director Randy Cuff told us: “We would not bring into our datamart an email address by itself without a birth year or zip code. Our datamart is based on the individual customer. Without certain pieces, you're not a person -- you're just an email address.”

Quick tip for collecting endemic data: instead of (or in addition to) providing a free-form box for job title (for B2B marketers) or for income (for B2C marketers), offer a drop-down menu or checklist of titles. That way, you will be able to sort leads much more easily.

On pages following the opt-in page

Many marketers have found that they can successfully populate their database by asking for demographic information on the Web page immediately following the initial opt-in page. If you can reasonably infer information from the database, avoid inadvertently annoying the responder by asking for the data that should already be available to you.

For example, you don’t need to ask if the responder is a new customer or not. You should already have that data in your database. The only exception would be if you are seeking to learn how responders see themselves. In the example of new vs. returning customer, you’d ask only if you wanted to gauge the responder’s perception of their status as new or returning customers.

Continuous data update

You can gather demographic data throughout the opt-in’s lifetime by instituting touch-points designed to capture additional endemic information.

Touch-points include:
o Preference centers
o Surveys
o Webinar responses
o Inbound customer service
o Outbound telemarketing
o Your sales force

Vericept, a provider of compliance and data loss prevention solutions, found that 50% of webinar attendees answered a quick survey giving feedback on the webinar topic, as well as the attendees’ own budget and goals. They also found that, while an extremely qualified lead could be sent immediately to sales, most leads didn't reveal enough information in their contact data. To get that additional information, Vericept hired a telemarketing team to pre-qualify leads.

Third-party data compilers

You can augment your demographic data by overlaying your database with information purchased from third-party compilers and data services, such as:
o B2B Data: Dun & Bradstreet, Experian, infoGroup (formerly InfoUSA)
o B2C/Lifestyle/Psychographic Data: Acxiom, Experian, Equifax, TransUnion, Claritas

Some data is more easily obtained from third-party sources than other data. For example, SIC (Standard Industrial Classification) code, corporate ownership and number of employees (globally or at a particular location) is commonly appended to B2B data.

The ability of the third party to append the data you want to your file is called “match rate”. For example, if you had 100,000 records of pet owners, and you found that a third-party data service could match dog ownership to 20,000 records, it would be a match rate of 20%.

You can ask a data compiler what the typical match rate is for a file of your type before you decide to make the investment in the data overlay.

If your database is light on endemic information, overlaying your database with third-party information is generally a recommended step prior to performing any kind of regression modeling. Do consult with both your modeler as well as the database overlay service for costs, timeframes, and availability of ancillary variables.

Data Type #2: Transactional Data

Transactional data is any data pertaining to transactions that your opt-in has had with your brand. It’s worthwhile to note that a transaction does not need to be financial in nature to be considered a transaction.

For example, the process of opting-in produces transactional data, including:
o Date of opt-in
o Date of opt-in confirmation (if double opt-in)
o IP Address of opt-in
o Web Page or other channel of opt-in
o MP3 files of verbal opt-in confirmations (yes, some database builders do keep voice files as proof of opt-in. Archiving opt-in information is also useful if you ever need to prove opt-in status to an ISP.)

*RFM: The mother lode of transactional data*

While financial data is not the only source of transactional data, it’s the most important one, particularly for B2C marketers.

Long before email marketing, there was database marketing. And database marketers know that the higher the RFM, the more valuable the customer.

Be sure that the following information is being captured by your database:
o Recency: How recently did the opt-in transact with/purchase from your brand?
o Frequency: How frequently does the opt-in transact with/purchase from your brand?
o Monetary: Value of the customer in financial terms, i.e., money spent?

Data Type #3: Behavioral Data

As the saying goes “Actions speak louder than words.” And that’s especially true for your prospects’ and customers’ online and offline actions.

Behavioral data is perhaps the most actionable of all the database information and can come from a tremendous number of sources. If you use a multichannel approach to marketing, you’ll want to define and capture potential behavioral variables carefully.

Examples of behavioral data include:
o Products put into an abandoned shopping cart
o Recent repeatedly reviewed items
o Clicking on a website
o Opening an email
o Clicking on an email offer (and the type of offer clicked on)
o Clicking on editorial within a newsletter or alert
o Webinar registration and/or attendance
o Live event registration and/or attendance
o Participating in surveys
o Calling customer service
o Interacting with a field rep
o Redeeming a coupon in a retail store

By combining behavioral data with other data, you can develop extremely targeted, micro-segmented campaigns. For example, perhaps you have a segment of customers who click only if they get special offers via email. You may have another contingent of customers who browse your website on a regular basis but are immune to email. You may discover a segment of prospects that will forever remain just that -- prospects.

--> SherpaTip: Productivity and Micro-segmentation

If you think that the permutations of database information are endless, you are right. But does it pay to market to all those micro-segments differently? The answer is -- it depends, primarily on the payoff generated by the micro-segmentation.

For example, Hewlett-Packard (HP) decided to send different versions of HP's Technology at Work email newsletter to 13 segments of their house file, including recipients in big vertical industries and employees at major HP clients. While response rates rose somewhat, so did editorial costs.

HP conducted eyetracking usability tests, and found that customers didn’t pay nearly as much attention to the newsletter nuances as was anticipated. HP decided to collapse the segmented newsletters back down to the five versions that got the best response. Additionally, instead of investing more in newsletter creative, they decided to switch to segmented demand-generation.

At the other end of the spectrum is Cetaphil skin care. They invented a relationship email campaign so finely targeted that they sent out 400-3,000 *different* versions of their monthly newsletter to consumers.

While the Cetaphil marketing team planned on sending two emails per month (a newsletter and a sales alert), the team made a significant investment in collecting and storing a huge amount of information about each participating consumer on the house list. Even a full-time copywriter was hired to keep up with the volume of content needed for that degree of customization.

Why did Cetaphil go through this effort and expense? They were able to demonstrate (by testing against control panels) that their hyper-customized, database-driven approach worked to convince prospects to join the Cetaphil Skin Care Club, and that the club’s members were worth an extra $22.19 of brand sales per year.

So, in the end, micro-segment if it’s productive to do so.

Data Type #4: Computed Data

Computed data is the outcome when one or more variables are used to create a third variable. The resulting variable is usually expressed as a difference between two variables, or as a ratio. For example, the variable of “number of miles from a retail store” is computed data; it’s the difference in distance between the customer’s and the store’s address.

For example, HP assigns each customer a Customer Category Code based on past purchasing behavior (general store, business employee and academic). HP also computes a projected LifeTime Profit score based on a customer’s past purchases and HP’s investment in the customer.

Another form of computed data is a gender code. You can use tables of common first names to predict gender (although some brands will simply ask for this information at some point in the registration process).

Computed data can be extremely powerful in predicting outcomes, and modelers commonly use it for regression analysis. Even without complex data analysis, you can use simple ratios and differences to determine when and how to trigger marketing campaigns to segments of your file.

An analysis of MarketingSherpa Case Studies reveals that both B2C and B2B marketers use computed data in making marketing database and campaign-management decisions. For example, in building their regression model, Schwab computed a Web activity variable that was based on the ratio of Web trades to total trades and trade recency. Microstrategy recognized a relationship between the type of offer responded to (white paper vs. webinar vs. demo) and number of days until purchase.

Here are a few other examples of computed/derived data that strongly influence outcomes:

- Member benefit usage analysis for membership retention campaign
o Computation = (# of times member benefits used/# of months of usage) * 100

- Proximity analysis for event invitation campaign
o Computation = distance in miles between prospect location and event location

Useful links related to this article

MarketingSherpa’s “Best Practices in Email Marketing Handbook”:

Third-party data services:


Dun & Bradstreet






Companies mentioned in this article:







Improve Your Marketing

Join our thousands of weekly case study readers.

Enter your email below to receive MarketingSherpa news, updates, and promotions:

Note: Already a subscriber? Want to add a subscription?
Click Here to Manage Subscriptions