Imagine, for a moment, that everyone on Earth was a customer of yours.
While this dream scenario would leave you richer than Jeff Bezos, Bill Gates and Warren Buffett combined, you’d likely run into a few problems along the way. Among these various problems is the fact that it’d be literally impossible to survey your entire audience base and learn more about your numerous customer segments – let alone your individual customers.
Of course, even if your customer base “only” numbers in the thousands – or even hundreds – generating survey responses and other such feedback from all of your customers is no easy feat.
And even if you somehow do manage to get all of them to respond, you’ll still have to invest a ton of time, money and energy into making sense of all the data you’ve collected.
Luckily, there’s a much easier way to go about the process of gaining deeper insight into your customers’ needs and expectations – no matter how heterogeneous your audience is.
Which is exactly what we’ll dig into throughout this article.
A representative sample is, well, pretty much exactly what it sounds like:
A selection of a population (i.e., your customer base) that can objectively be said to reflect the mindset, views, thoughts, etc. of the population as a whole.
Here are a few rather simple examples:
- If a brand caters specifically to “20-somethings,” a representative sample would include any individual aged 20-29, regardless of gender, ethnicity, income or education level, etc.
- For a surfboard company that targets individuals living in coastal states, a representative sample would need to include customers living in each of these state (rather than, for example, only those living in California).
- If a B2B company provides services to clients operating within five different industries, representative sampling dictates that surveys be delivered to companies within each of these industries.
Now, as we’ll get to later on in this article, this does not mean that surveys should be distributed equally among each of your customer segments. Rather, your surveys should be distributed accordingly and appropriately. Again, we’ll get more into what this means in a bit.
But first, we need to discuss why using representative sampling is essential in the first place.
Based on everything we’ve said thus far, you likely have a pretty good idea of why it’s important to use representative sampling when surveying your customer base.
Still, it’s definitely worth clarifying some of main the benefits of doing so.
As we said in the intro, surveying your entire customer base would take a huge investment on your part in terms of time, money and manpower.
And, of course, the larger your audience grows, the more resources you’d need to pour into collecting this data. At some point, doing so wouldn’t just become difficult – it’d be nearly impossible.
Not only that, but, as we’ll dive into momentarily, you’ll also reach a point as your customer base grows in which surveying all (or even most) of your audience simply isn’t necessary. Failure to recognize this can, unfortunately, cause you to invest too many resources into the venture. Again, we’ll revisit this in a bit.
While this probably sounds like a no-brainer, let’s unpack this statement a bit further.
To start, customer base is almost certainly made up of individuals from many different walks of life. While they all have at least one thing in common (i.e., a need for your product/service), they’re probably quite different in many other ways.
(To be clear, these differences are why you’d be conducting the survey in the first place; if your customers were all the same, and you knew everything about them, there’d be no need to do any of this at all!)
That said, using representative sampling allows you to intentionally select individuals from your various customer segments to deliver your survey to. In doing so, you increase the chances of generating responses and gaining insight from a wide spectrum of individuals – instead of receiving dozens, or even hundreds, of responses that essentially all say the same thing.
Now, there’s a key word in that last paragraph that we absolutely need to emphasize once more:
Perhaps you’ve toyed with the idea of simply choosing customers at random to deliver your surveys to. The problem here is that there’s no guarantee that this random sampling will truly be representative of your entire customer base.
As a simple example, if you choose ten individuals at random from a group of ten boys and ten girls, you potentially will end up with a group in which one gender is underrepresented (there’s even a chance that each of the ten people you pick are of the same gender).
Using this same example – but with a focus on creating a representative sample – you’d want to randomly pick five individuals from the group of ten boys, then five individuals from the group of ten girls. In doing so, you can be certain that you’ll end up with a representative sample of boys and girls – but will still have chosen each person at random.
Once again, we’ll revisit this a bit later on when we discuss sample selection bias.
This goes along with responsiveness, but it goes a bit further:
The more representative of your audience base a sample is, the more reliable the responses will typically be.
The more reliable your customers’ responses are, the more valuable the information you glean from them will be.
The more valuable the information you collect on your customers, the more able you’ll be to make improvements to your processes that actually matter to your entire audience base.
In contrast, if you were to only survey a single segment of your customer base – or if you were to randomly select respondents, and thus cause some segments to be underrepresented – you run the risk of alienating a large portion of your audience while attempting to cater to a relatively small portion of the whole that may appear larger than it actually is.
To summarize, representative sampling increases your chances of gleaning highly-valuable and useful information from individuals of various walks of life – all while keeping the costs of doing so as low as possible.
If you’re even remotely familiar with the study of statistics, you’ve likely heard the term “statistical significance” at some point in your life.
Without diving too deep into the mathematical explanation, here, let’s just say that statistical significance refers to the idea that the sample size of a given population must amount to a certain percentage of the population as a whole, in order for the sample to provide an accurate representation of the whole.
For example, a sample of 50 individuals out of a population of 100 would likely provide an accurate understanding of the entire population. However, a sample of 50 individuals out of a population of 5,000 would hardly be able to truly represent the population as a whole.
That said, you certainly want to be sure that your sample size is large enough to meet the standards of statistical significance.
However, as we alluded to earlier, this doesn’t mean that more is necessarily better. That is, once you’ve determined a significant sample size, there’s no need for you to add more customers to the sample; you’ll already have enough responses to gain an accurate understanding of your entire customer base as it is.
To quickly illustrate this point, you’d need the same sample size (600) to accurately represent a population of 500,000 as you would to represent a population of 10,000,000 – all other factors being equal, that is.
Now, assuming you’ve pinpointed a statistically significant sample size, your next order of business will be to ensure that the sample population is made up of individuals from each of your customer segments.
However, as we said earlier, this does not mean that you should necessarily pick an equal amount of individuals from each of your segments. Instead, you should aim to pick an amount of individuals from each segment that’s representative of that segment in relation to your entire audience.
Going back to our hypothetical surfboard company, it wouldn’t make much sense to survey the same amount of people in California and in New Jersey, right? While you’re bound to have a respectable number of customers living in the New Jersey area, this number almost certainly pales in comparison to your numbers in California.
In this scenario, if you were to survey the same amount of people in both states, you’d essentially end up placing more value on the responses of your New Jersey customers than your Californians – despite the fact that individuals within California provide you with much more business overall.
With this in mind, then, you want to be sure that the makeup of your sample population is consistent with the makeup of your entire audience. For example, if 65% of your customers are located in California, and 10% are located in New Jersey, you want your sample size to consist of 65% Californians, and 10% New Jersey residents – and so on up to 100%.
To reiterate, sample size definitely is important up to a certain point. However, it’s more important that the constitution of your sample population is as close to a true representation of your overall customer base as possible.
The original title of this section was “Avoiding Sample Selection Bias,” which would have been a bit misleading.
In a perfect world, we’d be able to anticipate any and all contingent circumstances that may cause our samples to be “contaminated” – and our results to become skewed.
Unfortunately, in the real world, this simply isn’t possible; there will always be something in the way of obtaining a true representative sample of the population.
That said, you definitely do want to take every precaution you possibly can in order to minimize bias within your sample selection.
The first step in doing so is to recognize how and why such bias tends to occur in the first place.
Here, we’ll quickly go through some of the most common ways in which selection bias rears its ugly head and causes sample populations to become skewed.
One quick note before we dive in:
While no one would actively and intentionally want to create a biased sample population, such skewed samples often come about due to actions taken and decisions made by the surveyor – who likely didn’t realize the implications their actions and decisions would have.
With this in mind, let’s take a look at some of the common selection bias “culprits.”
As the name implies, undercoverage bias occurs when certain segments of a population are underrepresented – or inadequately represented – within a selected sample.
Recall our previous example in which a surveyor mistakenly selects the same number of Californians and New Jerseyans for their sample, rather than selecting a representative percentage of each.
A real world example occured back in 1936, when The Literary Digest predicted that Alfred Landon would win the presidency over FDR based on survey results that ended up being incredibly biased. The problem arose due to the fact that the magazine had generated its sample population (of 10 million people) from its own subscriber list, as well as a list of people who owned an automobile and those who owned a telephone. Of course, this essentially led to the magazine polling only wealthy individuals at the time, and ignoring those who had been suffering through the Great Depression – the very people FDR had promised to help.
While not always the case, undercoverage bias is typically caused by another type of bias…
Convenience bias occurs when a surveyor takes the “easy route” in selecting a sample population, rather than digging in and ensuring accuracy.
Looking back on the Literary Digest example, it was undoubtedly much easier for the magazine to use the data it did than it would have been for the team to create a true representative sample population. But, by taking the easy way out, the magazine ended up collecting survey data that was flat out inaccurate.
A more modern example would be if a company decided to post a link to a survey on its Facebook page, and not post it elsewhere. Again, it’s pretty easy to go this route – but it essentially means that only the company’s followers on Facebook will be able to respond.
Nonresponse bias can come about in one of two ways:
- Distributing a survey in a way that makes certain segments unlikely to respond
- Assuming how an individual would have responded if they were to have responded
Again, we can go back to the Literary Digest example, here: not only did the company fail to recognize that people who were struggling to make ends meet didn’t have the time, money, or energy to fill out a survey, but it also assumed they would have responded in favor of Landon (just as most of its respondents did).
Another way in which nonresponse bias comes into play is in a “meta” sort of way.
For a rather extreme example, let’s say a company sends out a survey asking people to respond to the question “Have you ever cheated on your taxes and gotten away with it?” While some people (who would answer “yes” to the question) might not respond for fear of being investigated, it’s wrong to assume that all non-respondents would answer “yes” (as some may simply have chosen not to respond at all).
While this likely goes without saying, surveyors need to do everything they can to ensure that survey data is collected, stored, and analyzed in a proper manner.
Additionally, surveyors need to confirm that respondents are actually eligible to complete the survey in the first place, and that they indeed belong to a certain segment of a population.
Using our California/New Jersey example, let’s say a resident of California was visiting family in New Jersey, and purchased an item to be sent to their relative’s address during their stay. Though the transaction did occur in New Jersey, this individual realistically belongs to the California customer segment.
Now, a single instance such as the above scenario probably won’t affect the survey all that much. However, multiple instances like this, if left unchecked, certainly will cause the end results to be skewed much further than what is typically accepted.
In other words, it’s in the surveyor’s best interest to ensure that all collected data is as accurate as possible; a single piece of compromised data could end up causing an entire data set to be rendered unusable.
In addition to consciously working to avoid instances such as those mentioned above, there are a few main things you can do in order to minimize the chances of falling prey to sample selection bias.
We’ve talked about the concept of stratified sampling a few times in this article, but have yet to speak of it by name until this point.
Stratified sampling is the process of determining the percentage of your entire population that each segment takes up, so that you’re able to maintain this ratio within your sample size.
While this does involve prequalifying your sample population, you’re only doing so up to a certain point in order to ensure accuracy. Again, once you determine how many respondents you’ll need to survey from each segment, you’ll then randomly choose this amount from each given segment in order to ensure objectivity.
As we said earlier, it may not always be easy to generate responses from your entire sample population.
(For example, it can be easy to focus on delivering a survey to customers who have provided their email address, and ignoring those who haven’t.)
But, again, focusing on the “easy to reach” customers completely eliminates the possibility of receiving responses from these “hard to reach” customers.
That said, you’ll want to do as much as you can to ensure that every member of your sample population has an equal opportunity to respond to your surveys. This entails:
- Ensuring they understand the survey’s directions
- Ensuring they can easily access the survey in a way that works best for them
- Ensuring they can respond to and deliver their responses, again in a way that works best for them
While it will certainly take more effort on your end to make all this happen, these efforts will pay off in the form of receiving accurate and reliable data from your respondents.
Though we’ve mentioned objectivity a few times throughout this article, it’s worth calling attention to it once more as we wrap things up.
Not only is it important to remain objective when selecting representative sample populations and delivering surveys, but it’s also essential to do so when collecting and analyzing responses. Failure to do so can, in a retroactive manner, skew your sample population as to not be representative of your entire audience base.
For example, say a surveyor receives some responses via email, and some via physical mail. Whether consciously or not, the surveyor, for a slight moment, assumes that the emailed responses were from younger customers, while the physically mailed responses were from those belonging to older generations. Whether or not this is actually true, the surveyor may unwittingly interpret each set of responses differently based on their initial assumptions.
To mitigate such instances from occurring, you’ll want to ensure that nonessential information within all responses remain hidden at all times, and also randomize the order in which you collect and analyze the responses. In doing so, you’ll be better able to read your responses for what they actually mean – rather than what you think they mean.
Now that you understand just how important it is to create a representative sample of your target audience, you should be well on your way to generating reliable and valid responses that provide your company with valuable information regarding your customers’ needs, desires, and expectations.
Throughout this article I’ve talked about surveying your customers. If you need a way to create surveys, send them to your customers and analyze the results, Fieldboom can help.