### Sampling: Who do You Want to Learn More About?

These days, if we want to know more about someone, we often need look no further than social media sites or do a basic Google search. But what if we want to know more about a group or a population?

Let’s say you were interested in learning about the student body of a school to find out how they are managing the COVID-19 crisis. We decide an Internet-based survey is the way to go. How do we go about doing this?

*1) Should we survey the whole school?*

If we want to know about a whole population, we would need to survey the entire school. But surveying a large population is expensive and time-consuming; even a school of less than 1,000 students can be challenging.

Because we are so overloaded with emails and requests to complete surveys for every product we buy and every customer service interaction we have, many of us ignore or delete these requests. Our response rate might be low.

We may also systematically undercount certain groups, and then our results can be misleading. Who might be undercounted? Perhaps those with limited access to a device to complete a survey, especially if they are sharing with siblings. Unstable Internet access would also be a problem. But perhaps the biggest issue might be students who are experiencing homelessness. Even before the pandemic, more than 1.5 million students were homeless in the 2017–2018 school year, __the most recent year for which national data are available__.

If we miss entire groups, we will have a distorted picture of how students are doing during the pandemic, particularly the most vulnerable groups.

*2) Should we conduct a probability sample?*

The gold standard of research is the probability sample, where every member of the population has an equal chance of being selected to participate, and we can use the results to make generalizations about the population. We would need to come up with a technique for making sure participant selection is completely randomized, such as using an algorithm that chooses people from a list of students.

Here is a video that explains probability sampling in more detail:

Of course, people might still ignore the request to participate, and those who might be undercounted in a census might still be undercounted in a probability sample. If we estimated a 20 percent response rate, and we want at least 100 responses, we would have to select about 500 students to participate.

*3) Should we make sure we have representation of various types of students?*

Maybe our school is quite diverse in ways that we think might impact how people are coping with the pandemic. Perhaps there is significant economic diversity and we want to make sure we have representation across the socioeconomic spectrum in our survey.

We could conduct a stratified random sample, where we divide the group by these markers. Of course, we would need to have some way of knowing a student’s economic background. Perhaps in a local community we might have data on median household income by zip code. Or for college students, we might have data on who is receiving a Federal Pell Grant, other sources of financial aid, or no financial aid.

Using the second example, we would have to draw three samples at random. If we knew that 5 percent of our students received Pell Grants, 50 percent received other forms of financial aid, and 45 percent received no financial aid, we would need about 5, 50, and 45 percent of our sample to come from each group, for instance.

If we couldn’t match these percentages, we could always use statistical weighting in our analysis of the results. This means we create a mathematical formula that counts the responses—underrepresented groups slightly more, and overrepresented groups less—to better mirror their representation in the population.

*4) Should we conduct a convenience sample?*

If all of this sounds too challenging, like how to get information on students who receive financial aid since it is a private matter, you might consider a convenience sample. A convenience sample would involve taking whoever from the population is willing to participate, and even doing that isn’t necessarily easy.

How would you go about inviting everyone in the school to participate? Maybe if you could get email addresses for everyone, you could send a link for a survey and see who responds. If possible, that might approach the census solution, but it would still have the major drawbacks of limited participation. And it can be hard to get email addresses unless you are part of the school’s administration.

What about a social media post? Asking friends and classmates to share with their friends and classmates? You might get a fair amount of responses this way. But keep in mind you could not generalize your results to the whole school. Maybe those who choose to participate are from a particular group that is very similar to yours (your friends and classmates) and thus other perspectives will be left out of the results.

This is all to say that getting good samples is challenging! But even a convenience sample can give us some information. We just have to be clear on its limitations.

