Surveys (I): general

In my post of two days ago, I referenced an academic article based on an internet survey conducted by three professors.  In this post, I want to write about the quirks of internet surveying, as far as I know them.  Internet surveying is in its infancy, however, and, as I see it, most of the craft skill involved in it is still kept as trade secrets in the firms doing this cutting-edge survey work.

(My own experience with surveying comes from a couple of years I spent as an business school adjunct, working on a course whose heart was traditional and internet surveying.  I was lucky enough to work with colleagues had many years of practical experience in surveying, so I learned a lot. Unfortunately, that’s all in the past.  From an academic point of view, my area–although very popular with students–had several defects:

–we had on average maybe twenty years of actual business experience

–we did more teaching (for much less money) than tenured professors

–we were unique in producing an operating profit for a business school awash in a sea of red ink.

What happened?  As a “cost-cutting” measure, the school discontinued the program and laid us all off.)

Anyway, I think the best way to understand internet surveying is to contrast it with traditional surveying, done through the mail, by phone or in-person interviews.  A thumbnail sketch of the latter is what I’m going to write about in this post.  Tomorrow’s will talk about the internet.  Here goes:

traditional surveys

Traditional surveying is a little more than a century old.  Its model is the government census that countries periodically perform, although surveyers rapidly expanded its use into such diverse areas as political polling, including election-day exit polls, and divining consumer attitudes, either consumers’ general frame of mind, or the attributes of specific products they like and dislike.

Researchers assume, with a lot of historical justification, that standard statistical methods can be used to draw reliable quantitative conclusions about the data.

their structure

Every survey starts with information that the surveyer wants to find out about a target population.  Let’s say the trade association for American cereal manufacturers wants to know what people eat for breakfast in the US and how it might get non-cereal eaters to switch to cereal.

There’s a whole subsector of the surveying industry whose job is to turn that desire for information into a specific survey instrument, whose questions are designed to get the required information.  A lot of effort will go into designing questions that minimize the chances that the respondent will misunderstand them, and crafting answer choices that minimize the possibility that the respondent ends up picking the wrong choice by mistake.

Tons of research has been complied over the years, a lot of it the result of trial and error, about how to do this.  There’s even more about how to follow up and how to persuade people to become respondents.

There are many tricks of the trade.  Other than to point out that sometimes small changes to a question’s phrasing or to a survey’s layout on paper can make a big difference to the answers respondents give, I’m going to skip over this.

steps in conducting the survey

target population

In the cereal survey I mentioned above, the target population is everybody in the US.  But, as the periodic government censuses show, even the government isn’t going to get to communicate with everybody.  And who else has the money to try?

sampling frame

Even the government has to select a sampling frame, that is, a collection of members of the target population who actually have a chance of being surveyed.  Our cereal trade group might decide, for example, that it will take the set of people who are listed in all the telephone books in the US as its sampling frame.  Or it could take the set of all people with street addresses.  Or, at the other end of the spectrum, it could decide to purchase the one-time use of the contact lists of a number of newspapers and magazines.

Clearly, the sampling frame and the target population are not the same thing.  Squatters or migrant workers probably won’t have street addresses.  A potentially more serious problem, a large percentage of Americans under thirty don’t have fixed-line phones, but use cellphones instead.  Among the complications with this group, interviewers are legally barred from using computer dialing machines to access cellphones.  And many people aren’t happy about interviewers using up their minutes.  There are workarounds, but for how long?

The selection of the frame is obviously also bound up with the type of survey you decide to do.  If it’s a phone survey, you’ll only be able to contact people with phones that work–most likely landlines.  If it’s a mail survey, you’re limited to the names and addresses you have access to.

The potential mismatch between the target population and the set of people you can actually reach with a survey instrument is called coverage error. It’s becoming an ever bigger issue, I think.

sample

Let’s say the cereal group decides to do a telephone survey and has access to a database with 50 million phone numbers.  Instead of calling everyone, it will select a random sample from the 50 million.  The sample size can be quite small, say, a thousand or two numbers.  There are well-established conventions for selecting the sample–that dictate the minimum size and govern now the individual numbers are picked (usually computer-generated, and checked against phone databases).

respondents

Not everyone in the sample will respond.  Non-response comes in two flavors:  a refusal to answer a specific question or a refusal to answer the entire survey.  Non-response rates are the lowest for face-to-face interviews, which are also by far the most expensive to administer.  They are higher for telephone interviews and the highest for survey instruments sent through the mail.

Men tend to decline to answer more frequently than women.  City dwellers decline more often than their country cousins.  For many populations, a request from a university or from the government yields higher response rates.

Non-response rates have been steadily rising over the years, however.  In fact, response rates for very recent mail surveys may be as low as 1% or 2%.  Response rates for phone interviews may be 25%-30%.

Nonresponse error is an increasingly serious problem.  If response rates are low, say 10% or 20% (and even these levels may be hard to achieve), you have to at least worry that only the lunatic fringe has responded to your survey and their responses are in no way indicative of what the sample as a whole is thinking. Traditionally, this is the single biggest headache for surveyers.

post-survey adjustments

Statisticians may adjust the actual responses to make them more meaningful, in either of two ways:

–if a respondent hasn’t answered a particular question, like family income, an estimate based on past experience may be substituted, and

–the survey may be reweighted to adjust for known differences in sub-group response rates, such as the tendency for urban response rates to be lower than rural ones.

Three types of traditional survey

mail

The greatest bulk of past research work has been done for mail surveys.  Respondents have historically been more truthful in mail surveys than in phone or person-to-person interviews.  But partly because of changes in the way people communicate with each other in the post-internet world, partly because junk mail companies have become increasingly clever in disguising their offerings as “legitimate” mail, response rates have dropped to very low levels.

Paper surveys are a thing of the past for everyone except the government.

phone

Historically, other than the issue of higher cost, the biggest risk with phone surveys is that people tend to be less than truthful.  For example, if the interviewer asks for the head of the household, the person who answers the phone is likely to say he/she is that person, whether this is true or not.  Also, interviewees tend not to give answers they regard as socially unacceptable.

In today’s world, however, the overwhelming problem with phone interviews is the inability to reach cellphone-armed twenty-and thirty-somethings, as well as households that have switched to cable or other phone providers.

For political polls, this may not be a burning issue, since younger people tend to vote less than older citizens.  There’s also some evidence that young landline users, in political polls anyway, may be an adequate substitute for their untethered peers.

But it would be one for our cereal trade group.

in person

This gathers the most information, but it’s expensive and time-consuming.  It’s harder to train and supervise face-to-face interviewers than telephone workers.  And computer dialing machines can let a phone interviewer race from number to number, while an in-person interviewer has a lot of transit time getting from one interview to the next.

summary

That’s the traditional survey world.  Statistically valid conclusions drawn from data derived from surveying small samples of target frames that most of the time pretty accurately represent the target population.  Going on for over a century, most of the kinks have been worked out.

Two problem areas:  declining response rates across all survey types, and, in the case of phone surveys, the worry that the target frame of landline users, the meat and potatoes of this kind of survey, may not accurately represent the underlying population, which includes a growing number of cellphone-only people as well.

That’s it for now.  Internet surveying tomorrow.