Logo for Open Educational Resources

Chapter 5. Sampling

Introduction.

Most Americans will experience unemployment at some point in their lives. Sarah Damaske ( 2021 ) was interested in learning about how men and women experience unemployment differently. To answer this question, she interviewed unemployed people. After conducting a “pilot study” with twenty interviewees, she realized she was also interested in finding out how working-class and middle-class persons experienced unemployment differently. She found one hundred persons through local unemployment offices. She purposefully selected a roughly equal number of men and women and working-class and middle-class persons for the study. This would allow her to make the kinds of comparisons she was interested in. She further refined her selection of persons to interview:

I decided that I needed to be able to focus my attention on gender and class; therefore, I interviewed only people born between 1962 and 1987 (ages 28–52, the prime working and child-rearing years), those who worked full-time before their job loss, those who experienced an involuntary job loss during the past year, and those who did not lose a job for cause (e.g., were not fired because of their behavior at work). ( 244 )

The people she ultimately interviewed compose her sample. They represent (“sample”) the larger population of the involuntarily unemployed. This “theoretically informed stratified sampling design” allowed Damaske “to achieve relatively equal distribution of participation across gender and class,” but it came with some limitations. For one, the unemployment centers were located in primarily White areas of the country, so there were very few persons of color interviewed. Qualitative researchers must make these kinds of decisions all the time—who to include and who not to include. There is never an absolutely correct decision, as the choice is linked to the particular research question posed by the particular researcher, although some sampling choices are more compelling than others. In this case, Damaske made the choice to foreground both gender and class rather than compare all middle-class men and women or women of color from different class positions or just talk to White men. She leaves the door open for other researchers to sample differently. Because science is a collective enterprise, it is most likely someone will be inspired to conduct a similar study as Damaske’s but with an entirely different sample.

This chapter is all about sampling. After you have developed a research question and have a general idea of how you will collect data (observations or interviews), how do you go about actually finding people and sites to study? Although there is no “correct number” of people to interview, the sample should follow the research question and research design. You might remember studying sampling in a quantitative research course. Sampling is important here too, but it works a bit differently. Unlike quantitative research, qualitative research involves nonprobability sampling. This chapter explains why this is so and what qualities instead make a good sample for qualitative research.

Quick Terms Refresher

  • The population is the entire group that you want to draw conclusions about.
  • The sample is the specific group of individuals that you will collect data from.
  • Sampling frame is the actual list of individuals that the sample will be drawn from. Ideally, it should include the entire target population (and nobody who is not part of that population).
  • Sample size is how many individuals (or units) are included in your sample.

The “Who” of Your Research Study

After you have turned your general research interest into an actual research question and identified an approach you want to take to answer that question, you will need to specify the people you will be interviewing or observing. In most qualitative research, the objects of your study will indeed be people. In some cases, however, your objects might be content left by people (e.g., diaries, yearbooks, photographs) or documents (official or unofficial) or even institutions (e.g., schools, medical centers) and locations (e.g., nation-states, cities). Chances are, whatever “people, places, or things” are the objects of your study, you will not really be able to talk to, observe, or follow every single individual/object of the entire population of interest. You will need to create a sample of the population . Sampling in qualitative research has different purposes and goals than sampling in quantitative research. Sampling in both allows you to say something of interest about a population without having to include the entire population in your sample.

We begin this chapter with the case of a population of interest composed of actual people. After we have a better understanding of populations and samples that involve real people, we’ll discuss sampling in other types of qualitative research, such as archival research, content analysis, and case studies. We’ll then move to a larger discussion about the difference between sampling in qualitative research generally versus quantitative research, then we’ll move on to the idea of “theoretical” generalizability, and finally, we’ll conclude with some practical tips on the correct “number” to include in one’s sample.

Sampling People

To help think through samples, let’s imagine we want to know more about “vaccine hesitancy.” We’ve all lived through 2020 and 2021, and we know that a sizable number of people in the United States (and elsewhere) were slow to accept vaccines, even when these were freely available. By some accounts, about one-third of Americans initially refused vaccination. Why is this so? Well, as I write this in the summer of 2021, we know that some people actively refused the vaccination, thinking it was harmful or part of a government plot. Others were simply lazy or dismissed the necessity. And still others were worried about harmful side effects. The general population of interest here (all adult Americans who were not vaccinated by August 2021) may be as many as eighty million people. We clearly cannot talk to all of them. So we will have to narrow the number to something manageable. How can we do this?

Null

First, we have to think about our actual research question and the form of research we are conducting. I am going to begin with a quantitative research question. Quantitative research questions tend to be simpler to visualize, at least when we are first starting out doing social science research. So let us say we want to know what percentage of each kind of resistance is out there and how race or class or gender affects vaccine hesitancy. Again, we don’t have the ability to talk to everyone. But harnessing what we know about normal probability distributions (see quantitative methods for more on this), we can find this out through a sample that represents the general population. We can’t really address these particular questions if we only talk to White women who go to college with us. And if you are really trying to generalize the specific findings of your sample to the larger population, you will have to employ probability sampling , a sampling technique where a researcher sets a selection of a few criteria and chooses members of a population randomly. Why randomly? If truly random, all the members have an equal opportunity to be a part of the sample, and thus we avoid the problem of having only our friends and neighbors (who may be very different from other people in the population) in the study. Mathematically, there is going to be a certain number that will be large enough to allow us to generalize our particular findings from our sample population to the population at large. It might surprise you how small that number can be. Election polls of no more than one thousand people are routinely used to predict actual election outcomes of millions of people. Below that number, however, you will not be able to make generalizations. Talking to five people at random is simply not enough people to predict a presidential election.

In order to answer quantitative research questions of causality, one must employ probability sampling. Quantitative researchers try to generalize their findings to a larger population. Samples are designed with that in mind. Qualitative researchers ask very different questions, though. Qualitative research questions are not about “how many” of a certain group do X (in this case, what percentage of the unvaccinated hesitate for concern about safety rather than reject vaccination on political grounds). Qualitative research employs nonprobability sampling . By definition, not everyone has an equal opportunity to be included in the sample. The researcher might select White women they go to college with to provide insight into racial and gender dynamics at play. Whatever is found by doing so will not be generalizable to everyone who has not been vaccinated, or even all White women who have not been vaccinated, or even all White women who have not been vaccinated who are in this particular college. That is not the point of qualitative research at all. This is a really important distinction, so I will repeat in bold: Qualitative researchers are not trying to statistically generalize specific findings to a larger population . They have not failed when their sample cannot be generalized, as that is not the point at all.

In the previous paragraph, I said it would be perfectly acceptable for a qualitative researcher to interview five White women with whom she goes to college about their vaccine hesitancy “to provide insight into racial and gender dynamics at play.” The key word here is “insight.” Rather than use a sample as a stand-in for the general population, as quantitative researchers do, the qualitative researcher uses the sample to gain insight into a process or phenomenon. The qualitative researcher is not going to be content with simply asking each of the women to state her reason for not being vaccinated and then draw conclusions that, because one in five of these women were concerned about their health, one in five of all people were also concerned about their health. That would be, frankly, a very poor study indeed. Rather, the qualitative researcher might sit down with each of the women and conduct a lengthy interview about what the vaccine means to her, why she is hesitant, how she manages her hesitancy (how she explains it to her friends), what she thinks about others who are unvaccinated, what she thinks of those who have been vaccinated, and what she knows or thinks she knows about COVID-19. The researcher might include specific interview questions about the college context, about their status as White women, about the political beliefs they hold about racism in the US, and about how their own political affiliations may or may not provide narrative scripts about “protective whiteness.” There are many interesting things to ask and learn about and many things to discover. Where a quantitative researcher begins with clear parameters to set their population and guide their sample selection process, the qualitative researcher is discovering new parameters, making it impossible to engage in probability sampling.

Looking at it this way, sampling for qualitative researchers needs to be more strategic. More theoretically informed. What persons can be interviewed or observed that would provide maximum insight into what is still unknown? In other words, qualitative researchers think through what cases they could learn the most from, and those are the cases selected to study: “What would be ‘bias’ in statistical sampling, and therefore a weakness, becomes intended focus in qualitative sampling, and therefore a strength. The logic and power of purposeful sampling like in selecting information-rich cases for study in depth. Information-rich cases are those from which one can learn a great deal about issues of central importance to the purpose of the inquiry, thus the term purposeful sampling” ( Patton 2002:230 ; emphases in the original).

Before selecting your sample, though, it is important to clearly identify the general population of interest. You need to know this before you can determine the sample. In our example case, it is “adult Americans who have not yet been vaccinated.” Depending on the specific qualitative research question, however, it might be “adult Americans who have been vaccinated for political reasons” or even “college students who have not been vaccinated.” What insights are you seeking? Do you want to know how politics is affecting vaccination? Or do you want to understand how people manage being an outlier in a particular setting (unvaccinated where vaccinations are heavily encouraged if not required)? More clearly stated, your population should align with your research question . Think back to the opening story about Damaske’s work studying the unemployed. She drew her sample narrowly to address the particular questions she was interested in pursuing. Knowing your questions or, at a minimum, why you are interested in the topic will allow you to draw the best sample possible to achieve insight.

Once you have your population in mind, how do you go about getting people to agree to be in your sample? In qualitative research, it is permissible to find people by convenience. Just ask for people who fit your sample criteria and see who shows up. Or reach out to friends and colleagues and see if they know anyone that fits. Don’t let the name convenience sampling mislead you; this is not exactly “easy,” and it is certainly a valid form of sampling in qualitative research. The more unknowns you have about what you will find, the more convenience sampling makes sense. If you don’t know how race or class or political affiliation might matter, and your population is unvaccinated college students, you can construct a sample of college students by placing an advertisement in the student paper or posting a flyer on a notice board. Whoever answers is your sample. That is what is meant by a convenience sample. A common variation of convenience sampling is snowball sampling . This is particularly useful if your target population is hard to find. Let’s say you posted a flyer about your study and only two college students responded. You could then ask those two students for referrals. They tell their friends, and those friends tell other friends, and, like a snowball, your sample gets bigger and bigger.

Researcher Note

Gaining Access: When Your Friend Is Your Research Subject

My early experience with qualitative research was rather unique. At that time, I needed to do a project that required me to interview first-generation college students, and my friends, with whom I had been sharing a dorm for two years, just perfectly fell into the sample category. Thus, I just asked them and easily “gained my access” to the research subject; I know them, we are friends, and I am part of them. I am an insider. I also thought, “Well, since I am part of the group, I can easily understand their language and norms, I can capture their honesty, read their nonverbal cues well, will get more information, as they will be more opened to me because they trust me.” All in all, easy access with rich information. But, gosh, I did not realize that my status as an insider came with a price! When structuring the interview questions, I began to realize that rather than focusing on the unique experiences of my friends, I mostly based the questions on my own experiences, assuming we have similar if not the same experiences. I began to struggle with my objectivity and even questioned my role; am I doing this as part of the group or as a researcher? I came to know later that my status as an insider or my “positionality” may impact my research. It not only shapes the process of data collection but might heavily influence my interpretation of the data. I came to realize that although my inside status came with a lot of benefits (especially for access), it could also bring some drawbacks.

—Dede Setiono, PhD student focusing on international development and environmental policy, Oregon State University

The more you know about what you might find, the more strategic you can be. If you wanted to compare how politically conservative and politically liberal college students explained their vaccine hesitancy, for example, you might construct a sample purposively, finding an equal number of both types of students so that you can make those comparisons in your analysis. This is what Damaske ( 2021 ) did. You could still use convenience or snowball sampling as a way of recruitment. Post a flyer at the conservative student club and then ask for referrals from the one student that agrees to be interviewed. As with convenience sampling, there are variations of purposive sampling as well as other names used (e.g., judgment, quota, stratified, criterion, theoretical). Try not to get bogged down in the nomenclature; instead, focus on identifying the general population that matches your research question and then using a sampling method that is most likely to provide insight, given the types of questions you have.

There are all kinds of ways of being strategic with sampling in qualitative research. Here are a few of my favorite techniques for maximizing insight:

  • Consider using “extreme” or “deviant” cases. Maybe your college houses a prominent anti-vaxxer who has written about and demonstrated against the college’s policy on vaccines. You could learn a lot from that single case (depending on your research question, of course).
  • Consider “intensity”: people and cases and circumstances where your questions are more likely to feature prominently (but not extremely or deviantly). For example, you could compare those who volunteer at local Republican and Democratic election headquarters during an election season in a study on why party matters. Those who volunteer are more likely to have something to say than those who are more apathetic.
  • Maximize variation, as with the case of “politically liberal” versus “politically conservative,” or include an array of social locations (young vs. old; Northwest vs. Southeast region). This kind of heterogeneity sampling can capture and describe the central themes that cut across the variations: any common patterns that emerge, even in this wildly mismatched sample, are probably important to note!
  • Rather than maximize the variation, you could select a small homogenous sample to describe some particular subgroup in depth. Focus groups are often the best form of data collection for homogeneity sampling.
  • Think about which cases are “critical” or politically important—ones that “if it happens here, it would happen anywhere” or a case that is politically sensitive, as with the single “blue” (Democratic) county in a “red” (Republican) state. In both, you are choosing a site that would yield the most information and have the greatest impact on the development of knowledge.
  • On the other hand, sometimes you want to select the “typical”—the typical college student, for example. You are trying to not generalize from the typical but illustrate aspects that may be typical of this case or group. When selecting for typicality, be clear with yourself about why the typical matches your research questions (and who might be excluded or marginalized in doing so).
  • Finally, it is often a good idea to look for disconfirming cases : if you are at the stage where you have a hypothesis (of sorts), you might select those who do not fit your hypothesis—you will surely learn something important there. They may be “exceptions that prove the rule” or exceptions that force you to alter your findings in order to make sense of these additional cases.

In addition to all these sampling variations, there is the theoretical approach taken by grounded theorists in which the researcher samples comparative people (or events) on the basis of their potential to represent important theoretical constructs. The sample, one can say, is by definition representative of the phenomenon of interest. It accompanies the constant comparative method of analysis. In the words of the funders of Grounded Theory , “Theoretical sampling is sampling on the basis of the emerging concepts, with the aim being to explore the dimensional range or varied conditions along which the properties of the concepts vary” ( Strauss and Corbin 1998:73 ).

When Your Population is Not Composed of People

I think it is easiest for most people to think of populations and samples in terms of people, but sometimes our units of analysis are not actually people. They could be places or institutions. Even so, you might still want to talk to people or observe the actions of people to understand those places or institutions. Or not! In the case of content analyses (see chapter 17), you won’t even have people involved at all but rather documents or films or photographs or news clippings. Everything we have covered about sampling applies to other units of analysis too. Let’s work through some examples.

Case Studies

When constructing a case study, it is helpful to think of your cases as sample populations in the same way that we considered people above. If, for example, you are comparing campus climates for diversity, your overall population may be “four-year college campuses in the US,” and from there you might decide to study three college campuses as your sample. Which three? Will you use purposeful sampling (perhaps [1] selecting three colleges in Oregon that are different sizes or [2] selecting three colleges across the US located in different political cultures or [3] varying the three colleges by racial makeup of the student body)? Or will you select three colleges at random, out of convenience? There are justifiable reasons for all approaches.

As with people, there are different ways of maximizing insight in your sample selection. Think about the following rationales: typical, diverse, extreme, deviant, influential, crucial, or even embodying a particular “pathway” ( Gerring 2008 ). When choosing a case or particular research site, Rubin ( 2021 ) suggests you bear in mind, first, what you are leaving out by selecting this particular case/site; second, what you might be overemphasizing by studying this case/site and not another; and, finally, whether you truly need to worry about either of those things—“that is, what are the sources of bias and how bad are they for what you are trying to do?” ( 89 ).

Once you have selected your cases, you may still want to include interviews with specific people or observations at particular sites within those cases. Then you go through possible sampling approaches all over again to determine which people will be contacted.

Content: Documents, Narrative Accounts, And So On

Although not often discussed as sampling, your selection of documents and other units to use in various content/historical analyses is subject to similar considerations. When you are asking quantitative-type questions (percentages and proportionalities of a general population), you will want to follow probabilistic sampling. For example, I created a random sample of accounts posted on the website studentloanjustice.org to delineate the types of problems people were having with student debt ( Hurst 2007 ). Even though my data was qualitative (narratives of student debt), I was actually asking a quantitative-type research question, so it was important that my sample was representative of the larger population (debtors who posted on the website). On the other hand, when you are asking qualitative-type questions, the selection process should be very different. In that case, use nonprobabilistic techniques, either convenience (where you are really new to this data and do not have the ability to set comparative criteria or even know what a deviant case would be) or some variant of purposive sampling. Let’s say you were interested in the visual representation of women in media published in the 1950s. You could select a national magazine like Time for a “typical” representation (and for its convenience, as all issues are freely available on the web and easy to search). Or you could compare one magazine known for its feminist content versus one antifeminist. The point is, sample selection is important even when you are not interviewing or observing people.

Goals of Qualitative Sampling versus Goals of Quantitative Sampling

We have already discussed some of the differences in the goals of quantitative and qualitative sampling above, but it is worth further discussion. The quantitative researcher seeks a sample that is representative of the population of interest so that they may properly generalize the results (e.g., if 80 percent of first-gen students in the sample were concerned with costs of college, then we can say there is a strong likelihood that 80 percent of first-gen students nationally are concerned with costs of college). The qualitative researcher does not seek to generalize in this way . They may want a representative sample because they are interested in typical responses or behaviors of the population of interest, but they may very well not want a representative sample at all. They might want an “extreme” or deviant case to highlight what could go wrong with a particular situation, or maybe they want to examine just one case as a way of understanding what elements might be of interest in further research. When thinking of your sample, you will have to know why you are selecting the units, and this relates back to your research question or sets of questions. It has nothing to do with having a representative sample to generalize results. You may be tempted—or it may be suggested to you by a quantitatively minded member of your committee—to create as large and representative a sample as you possibly can to earn credibility from quantitative researchers. Ignore this temptation or suggestion. The only thing you should be considering is what sample will best bring insight into the questions guiding your research. This has implications for the number of people (or units) in your study as well, which is the topic of the next section.

What is the Correct “Number” to Sample?

Because we are not trying to create a generalizable representative sample, the guidelines for the “number” of people to interview or news stories to code are also a bit more nebulous. There are some brilliant insightful studies out there with an n of 1 (meaning one person or one account used as the entire set of data). This is particularly so in the case of autoethnography, a variation of ethnographic research that uses the researcher’s own subject position and experiences as the basis of data collection and analysis. But it is true for all forms of qualitative research. There are no hard-and-fast rules here. The number to include is what is relevant and insightful to your particular study.

That said, humans do not thrive well under such ambiguity, and there are a few helpful suggestions that can be made. First, many qualitative researchers talk about “saturation” as the end point for data collection. You stop adding participants when you are no longer getting any new information (or so very little that the cost of adding another interview subject or spending another day in the field exceeds any likely benefits to the research). The term saturation was first used here by Glaser and Strauss ( 1967 ), the founders of Grounded Theory. Here is their explanation: “The criterion for judging when to stop sampling the different groups pertinent to a category is the category’s theoretical saturation . Saturation means that no additional data are being found whereby the sociologist can develop properties of the category. As he [or she] sees similar instances over and over again, the researcher becomes empirically confident that a category is saturated. [They go] out of [their] way to look for groups that stretch diversity of data as far as possible, just to make certain that saturation is based on the widest possible range of data on the category” ( 61 ).

It makes sense that the term was developed by grounded theorists, since this approach is rather more open-ended than other approaches used by qualitative researchers. With so much left open, having a guideline of “stop collecting data when you don’t find anything new” is reasonable. However, saturation can’t help much when first setting out your sample. How do you know how many people to contact to interview? What number will you put down in your institutional review board (IRB) protocol (see chapter 8)? You may guess how many people or units it will take to reach saturation, but there really is no way to know in advance. The best you can do is think about your population and your questions and look at what others have done with similar populations and questions.

Here are some suggestions to use as a starting point: For phenomenological studies, try to interview at least ten people for each major category or group of people . If you are comparing male-identified, female-identified, and gender-neutral college students in a study on gender regimes in social clubs, that means you might want to design a sample of thirty students, ten from each group. This is the minimum suggested number. Damaske’s ( 2021 ) sample of one hundred allows room for up to twenty-five participants in each of four “buckets” (e.g., working-class*female, working-class*male, middle-class*female, middle-class*male). If there is more than one comparative group (e.g., you are comparing students attending three different colleges, and you are comparing White and Black students in each), you can sometimes reduce the number for each group in your sample to five for, in this case, thirty total students. But that is really a bare minimum you will want to go. A lot of people will not trust you with only “five” cases in a bucket. Lareau ( 2021:24 ) advises a minimum of seven or nine for each bucket (or “cell,” in her words). The point is to think about what your analyses might look like and how comfortable you will be with a certain number of persons fitting each category.

Because qualitative research takes so much time and effort, it is rare for a beginning researcher to include more than thirty to fifty people or units in the study. You may not be able to conduct all the comparisons you might want simply because you cannot manage a larger sample. In that case, the limits of who you can reach or what you can include may influence you to rethink an original overcomplicated research design. Rather than include students from every racial group on a campus, for example, you might want to sample strategically, thinking about the most contrast (insightful), possibly excluding majority-race (White) students entirely, and simply using previous literature to fill in gaps in our understanding. For example, one of my former students was interested in discovering how race and class worked at a predominantly White institution (PWI). Due to time constraints, she simplified her study from an original sample frame of middle-class and working-class domestic Black and international African students (four buckets) to a sample frame of domestic Black and international African students (two buckets), allowing the complexities of class to come through individual accounts rather than from part of the sample frame. She wisely decided not to include White students in the sample, as her focus was on how minoritized students navigated the PWI. She was able to successfully complete her project and develop insights from the data with fewer than twenty interviewees. [1]

But what if you had unlimited time and resources? Would it always be better to interview more people or include more accounts, documents, and units of analysis? No! Your sample size should reflect your research question and the goals you have set yourself. Larger numbers can sometimes work against your goals. If, for example, you want to help bring out individual stories of success against the odds, adding more people to the analysis can end up drowning out those individual stories. Sometimes, the perfect size really is one (or three, or five). It really depends on what you are trying to discover and achieve in your study. Furthermore, studies of one hundred or more (people, documents, accounts, etc.) can sometimes be mistaken for quantitative research. Inevitably, the large sample size will push the researcher into simplifying the data numerically. And readers will begin to expect generalizability from such a large sample.

To summarize, “There are no rules for sample size in qualitative inquiry. Sample size depends on what you want to know, the purpose of the inquiry, what’s at stake, what will be useful, what will have credibility, and what can be done with available time and resources” ( Patton 2002:244 ).

How did you find/construct a sample?

Since qualitative researchers work with comparatively small sample sizes, getting your sample right is rather important. Yet it is also difficult to accomplish. For instance, a key question you need to ask yourself is whether you want a homogeneous or heterogeneous sample. In other words, do you want to include people in your study who are by and large the same, or do you want to have diversity in your sample?

For many years, I have studied the experiences of students who were the first in their families to attend university. There is a rather large number of sampling decisions I need to consider before starting the study. (1) Should I only talk to first-in-family students, or should I have a comparison group of students who are not first-in-family? (2) Do I need to strive for a gender distribution that matches undergraduate enrollment patterns? (3) Should I include participants that reflect diversity in gender identity and sexuality? (4) How about racial diversity? First-in-family status is strongly related to some ethnic or racial identity. (5) And how about areas of study?

As you can see, if I wanted to accommodate all these differences and get enough study participants in each category, I would quickly end up with a sample size of hundreds, which is not feasible in most qualitative research. In the end, for me, the most important decision was to maximize the voices of first-in-family students, which meant that I only included them in my sample. As for the other categories, I figured it was going to be hard enough to find first-in-family students, so I started recruiting with an open mind and an understanding that I may have to accept a lack of gender, sexuality, or racial diversity and then not be able to say anything about these issues. But I would definitely be able to speak about the experiences of being first-in-family.

—Wolfgang Lehmann, author of “Habitus Transformation and Hidden Injuries”

Examples of “Sample” Sections in Journal Articles

Think about some of the studies you have read in college, especially those with rich stories and accounts about people’s lives. Do you know how the people were selected to be the focus of those stories? If the account was published by an academic press (e.g., University of California Press or Princeton University Press) or in an academic journal, chances are that the author included a description of their sample selection. You can usually find these in a methodological appendix (book) or a section on “research methods” (article).

Here are two examples from recent books and one example from a recent article:

Example 1 . In It’s Not like I’m Poor: How Working Families Make Ends Meet in a Post-welfare World , the research team employed a mixed methods approach to understand how parents use the earned income tax credit, a refundable tax credit designed to provide relief for low- to moderate-income working people ( Halpern-Meekin et al. 2015 ). At the end of their book, their first appendix is “Introduction to Boston and the Research Project.” After describing the context of the study, they include the following description of their sample selection:

In June 2007, we drew 120 names at random from the roughly 332 surveys we gathered between February and April. Within each racial and ethnic group, we aimed for one-third married couples with children and two-thirds unmarried parents. We sent each of these families a letter informing them of the opportunity to participate in the in-depth portion of our study and then began calling the home and cell phone numbers they provided us on the surveys and knocking on the doors of the addresses they provided.…In the end, we interviewed 115 of the 120 families originally selected for the in-depth interview sample (the remaining five families declined to participate). ( 22 )

Was their sample selection based on convenience or purpose? Why do you think it was important for them to tell you that five families declined to be interviewed? There is actually a trick here, as the names were pulled randomly from a survey whose sample design was probabilistic. Why is this important to know? What can we say about the representativeness or the uniqueness of whatever findings are reported here?

Example 2 . In When Diversity Drops , Park ( 2013 ) examines the impact of decreasing campus diversity on the lives of college students. She does this through a case study of one student club, the InterVarsity Christian Fellowship (IVCF), at one university (“California University,” a pseudonym). Here is her description:

I supplemented participant observation with individual in-depth interviews with sixty IVCF associates, including thirty-four current students, eight former and current staff members, eleven alumni, and seven regional or national staff members. The racial/ethnic breakdown was twenty-five Asian Americans (41.6 percent), one Armenian (1.6 percent), twelve people who were black (20.0 percent), eight Latino/as (13.3 percent), three South Asian Americans (5.0 percent), and eleven people who were white (18.3 percent). Twenty-nine were men, and thirty-one were women. Looking back, I note that the higher number of Asian Americans reflected both the group’s racial/ethnic composition and my relative ease about approaching them for interviews. ( 156 )

How can you tell this is a convenience sample? What else do you note about the sample selection from this description?

Example 3. The last example is taken from an article published in the journal Research in Higher Education . Published articles tend to be more formal than books, at least when it comes to the presentation of qualitative research. In this article, Lawson ( 2021 ) is seeking to understand why female-identified college students drop out of majors that are dominated by male-identified students (e.g., engineering, computer science, music theory). Here is the entire relevant section of the article:

Method Participants Data were collected as part of a larger study designed to better understand the daily experiences of women in MDMs [male-dominated majors].…Participants included 120 students from a midsize, Midwestern University. This sample included 40 women and 40 men from MDMs—defined as any major where at least 2/3 of students are men at both the university and nationally—and 40 women from GNMs—defined as any may where 40–60% of students are women at both the university and nationally.… Procedure A multi-faceted approach was used to recruit participants; participants were sent targeted emails (obtained based on participants’ reported gender and major listings), campus-wide emails sent through the University’s Communication Center, flyers, and in-class presentations. Recruitment materials stated that the research focused on the daily experiences of college students, including classroom experiences, stressors, positive experiences, departmental contexts, and career aspirations. Interested participants were directed to email the study coordinator to verify eligibility (at least 18 years old, man/woman in MDM or woman in GNM, access to a smartphone). Sixteen interested individuals were not eligible for the study due to the gender/major combination. ( 482ff .)

What method of sample selection was used by Lawson? Why is it important to define “MDM” at the outset? How does this definition relate to sampling? Why were interested participants directed to the study coordinator to verify eligibility?

Final Words

I have found that students often find it difficult to be specific enough when defining and choosing their sample. It might help to think about your sample design and sample recruitment like a cookbook. You want all the details there so that someone else can pick up your study and conduct it as you intended. That person could be yourself, but this analogy might work better if you have someone else in mind. When I am writing down recipes, I often think of my sister and try to convey the details she would need to duplicate the dish. We share a grandmother whose recipes are full of handwritten notes in the margins, in spidery ink, that tell us what bowl to use when or where things could go wrong. Describe your sample clearly, convey the steps required accurately, and then add any other details that will help keep you on track and remind you why you have chosen to limit possible interviewees to those of a certain age or class or location. Imagine actually going out and getting your sample (making your dish). Do you have all the necessary details to get started?

Table 5.1. Sampling Type and Strategies

Type Used primarily in... Strategies  
Probabilistic Quantitative research
Simple random Each member of the population has an equal chance at being selected
Stratified The sample is split into strata; members of each strata are selected in proportion to the population at large
Non-probabilistic Qualitative research
Convenience Simply includes the individuals who happen to be most accessible to the researcher
Snowball Used to recruit participants via other participants. The number of people you have access to “snowballs” as you get in contact with more people
Purposive Involves the researcher using their expertise to select a sample that is most useful to the purposes of the research; An effective purposive sample must have clear criteria and rationale for inclusion (e.g., )
Quota Set quotas to ensure that the sample you get represents certain characteristics in proportion to their prevalence in the population

Further Readings

Fusch, Patricia I., and Lawrence R. Ness. 2015. “Are We There Yet? Data Saturation in Qualitative Research.” Qualitative Report 20(9):1408–1416.

Saunders, Benjamin, Julius Sim, Tom Kinstone, Shula Baker, Jackie Waterfield, Bernadette Bartlam, Heather Burroughs, and Clare Jinks. 2018. “Saturation in Qualitative Research: Exploring Its Conceptualization and Operationalization.”  Quality & Quantity  52(4):1893–1907.

  • Rubin ( 2021 ) suggests a minimum of twenty interviews (but safer with thirty) for an interview-based study and a minimum of three to six months in the field for ethnographic studies. For a content-based study, she suggests between five hundred and one thousand documents, although some will be “very small” ( 243–244 ). ↵

The process of selecting people or other units of analysis to represent a larger population. In quantitative research, this representation is taken quite literally, as statistically representative.  In qualitative research, in contrast, sample selection is often made based on potential to generate insight about a particular topic or phenomenon.

The actual list of individuals that the sample will be drawn from. Ideally, it should include the entire target population (and nobody who is not part of that population).  Sampling frames can differ from the larger population when specific exclusions are inherent, as in the case of pulling names randomly from voter registration rolls where not everyone is a registered voter.  This difference in frame and population can undercut the generalizability of quantitative results.

The specific group of individuals that you will collect data from.  Contrast population.

The large group of interest to the researcher.  Although it will likely be impossible to design a study that incorporates or reaches all members of the population of interest, this should be clearly defined at the outset of a study so that a reasonable sample of the population can be taken.  For example, if one is studying working-class college students, the sample may include twenty such students attending a particular college, while the population is “working-class college students.”  In quantitative research, clearly defining the general population of interest is a necessary step in generalizing results from a sample.  In qualitative research, defining the population is conceptually important for clarity.

A sampling strategy in which the sample is chosen to represent (numerically) the larger population from which it is drawn by random selection.  Each person in the population has an equal chance of making it into the sample.  This is often done through a lottery or other chance mechanisms (e.g., a random selection of every twelfth name on an alphabetical list of voters).  Also known as random sampling .

The selection of research participants or other data sources based on availability or accessibility, in contrast to purposive sampling .

A sample generated non-randomly by asking participants to help recruit more participants the idea being that a person who fits your sampling criteria probably knows other people with similar criteria.

Broad codes that are assigned to the main issues emerging in the data; identifying themes is often part of initial coding . 

A form of case selection focusing on examples that do not fit the emerging patterns. This allows the researcher to evaluate rival explanations or to define the limitations of their research findings. While disconfirming cases are found (not sought out), researchers should expand their analysis or rethink their theories to include/explain them.

A methodological tradition of inquiry and approach to analyzing qualitative data in which theories emerge from a rigorous and systematic process of induction.  This approach was pioneered by the sociologists Glaser and Strauss (1967).  The elements of theory generated from comparative analysis of data are, first, conceptual categories and their properties and, second, hypotheses or generalized relations among the categories and their properties – “The constant comparing of many groups draws the [researcher’s] attention to their many similarities and differences.  Considering these leads [the researcher] to generate abstract categories and their properties, which, since they emerge from the data, will clearly be important to a theory explaining the kind of behavior under observation.” (36).

The result of probability sampling, in which a sample is chosen to represent (numerically) the larger population from which it is drawn by random selection.  Each person in the population has an equal chance of making it into the random sample.  This is often done through a lottery or other chance mechanisms (e.g., the random selection of every twelfth name on an alphabetical list of voters).  This is typically not required in qualitative research but rather essential for the generalizability of quantitative research.

A form of case selection or purposeful sampling in which cases that are unusual or special in some way are chosen to highlight processes or to illuminate gaps in our knowledge of a phenomenon.   See also extreme case .

The point at which you can conclude data collection because every person you are interviewing, the interaction you are observing, or content you are analyzing merely confirms what you have already noted.  Achieving saturation is often used as the justification for the final sample size.

The accuracy with which results or findings can be transferred to situations or people other than those originally studied.  Qualitative studies generally are unable to use (and are uninterested in) statistical generalizability where the sample population is said to be able to predict or stand in for a larger population of interest.  Instead, qualitative researchers often discuss “theoretical generalizability,” in which the findings of a particular study can shed light on processes and mechanisms that may be at play in other settings.  See also statistical generalization and theoretical generalization .

A term used by IRBs to denote all materials aimed at recruiting participants into a research study (including printed advertisements, scripts, audio or video tapes, or websites).  Copies of this material are required in research protocols submitted to IRB.

Introduction to Qualitative Research Methods Copyright © 2023 by Allison Hurst is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

A simple method to assess and report thematic saturation in qualitative research

Roles Conceptualization, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

Affiliation Q42 Research, Research Triangle Park, North Carolina, United States of America

Roles Conceptualization, Data curation, Formal analysis, Methodology, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Global Health, Population, and Nutrition, FHI 360, Durham, North Carolina, United States of America

ORCID logo

  • Greg Guest, 
  • Emily Namey, 

PLOS

  • Published: May 5, 2020
  • https://doi.org/10.1371/journal.pone.0232076
  • Reader Comments

Table 1

Data saturation is the most commonly employed concept for estimating sample sizes in qualitative research. Over the past 20 years, scholars using both empirical research and mathematical/statistical models have made significant contributions to the question: How many qualitative interviews are enough? This body of work has advanced the evidence base for sample size estimation in qualitative inquiry during the design phase of a study, prior to data collection, but it does not provide qualitative researchers with a simple and reliable way to determine the adequacy of sample sizes during and/or after data collection. Using the principle of saturation as a foundation, we describe and validate a simple-to-apply method for assessing and reporting on saturation in the context of inductive thematic analyses. Following a review of the empirical research on data saturation and sample size estimation in qualitative research, we propose an alternative way to evaluate saturation that overcomes the shortcomings and challenges associated with existing methods identified in our review. Our approach includes three primary elements in its calculation and assessment: Base Size, Run Length, and New Information Threshold. We additionally propose a more flexible approach to reporting saturation. To validate our method, we use a bootstrapping technique on three existing thematically coded qualitative datasets generated from in-depth interviews. Results from this analysis indicate the method we propose to assess and report on saturation is feasible and congruent with findings from earlier studies.

Citation: Guest G, Namey E, Chen M (2020) A simple method to assess and report thematic saturation in qualitative research. PLoS ONE 15(5): e0232076. https://doi.org/10.1371/journal.pone.0232076

Editor: Andrew Soundy, University of Birmingham, UNITED KINGDOM

Received: January 4, 2020; Accepted: April 3, 2020; Published: May 5, 2020

Copyright: © 2020 Guest et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript's Supporting Information files.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Data saturation is the conceptual yardstick for estimating and assessing qualitative sample sizes. During the past two decades, scholars have conducted empirical research and developed mathematical/statistical models designed to estimate the likely number of qualitative interviews needed to reach saturation for a given study. Although this body of work has advanced the evidence base for sample size estimation during the design phase of a qualitative study, it does not provide a method to determine saturation, and the adequacy of sample sizes, during and/or after data collection. As Morse pointed out more than 20 years ago, “saturation is an important component of rigor. It is present in all qualitative research but, unfortunately, it is evident mainly by declaration” [ 1 ]. In this paper we present a method to assess and report on saturation that enables qualitative researchers to speak about--and provide some evidence for--saturation that goes beyond simple declaration.

To provide the foundation for this approach, we define saturation and then review the work to date on estimating saturation and sample sizes for in-depth interviews. We follow this with an overview of the few empirically-based methods that have been put forward to operationalize and measure saturation and identify challenges of applying these approaches to real-life research contexts, particularly those that use inductive thematic analyses. We subsequently propose an alternative way of evaluating saturation and offer a relatively easy-to-use method of assessing and reporting on it during or after an inductive thematic analysis. We test and validate our method using a bootstrapping technique on three distinctly different qualitative datasets.

The method we propose is designed for qualitative data collection techniques that aim to generate narratives–i.e., focus groups and one-on-one interviews that use open-ended questioning with inductive probing (though we have only attempted to validate the method on individual interview data). Our method also specifically applies to contexts in which an inductive thematic analysis [ 2 – 4 ] is used, where emergent themes are discovered in the data and then transformed into codes.

A brief history of saturation and qualitative sample size estimation

How many qualitative interviews are enough? Across academic disciplines, and for about the past five decades, the answer to this question has usually revolved around reaching saturation [ 1 , 5 – 9 ]. The concept of saturation was first introduced into the field of qualitative research as “theoretical saturation” by Glaser and Strauss in their 1967 book The Discovery of Grounded Theory [ 10 ]. They defined the term as the point at which “no additional data are being found whereby the [researcher] can develop properties of the category” (pg. 61). Their definition was specifically intended for the practice of building and testing theoretical models using qualitative data and refers to the point at which the theoretical model being developed stabilizes. Many qualitative data analyses, however, do not use the specific grounded theory method, but rather a more general inductive thematic analysis. Over time, the broader term “data saturation” has become increasingly adopted, to reflect a wider application of the term and concept. In this broader sense, saturation is often described as the point in data collection and analysis when new incoming data produces little or no new information to address the research question [ 4 , 9 , 11 – 13 ].

Interestingly, empirical research on saturation began with efforts to determine when one might expect it to be reached. Though “interviewing until saturation” was recognized as a best practice, it was not a sufficient description of sample size. In most research contexts, sample size specification and justification is required by funders, ethics committees, and other reviewers before a study is implemented [ 14 , 15 ]. Applied qualitative researchers faced the question: How do I estimate how many interviews I’ll need before I head into the field?

Empirical research to address this issue began appearing in the literature in the early 2000s. Morgan et al. [ 16 ] conducted a pioneer methodological study using data collected on environmental risks. They found that the first five to six interviews produced the majority of new information in the dataset, and that little new information was gained as the sample size approached 20 interviews. Across four datasets, approximately 80% to 92% of all concepts identified within the dataset were noted within the first 10 interviews. Similarly, Guest et al. [ 9 ] conducted a stepwise inductive thematic analysis of 60 in-depth interviews among female sex workers in West Africa and discovered that 70% of all 114 identified themes turned up in the first six interviews, and 92% were identified within the first 12 interviews. Subsequent studies by Francis et al. and Namey et al. [ 17 , 18 ] reported similar findings. Building on these earlier studies, Hagaman and Wutich [ 19 ] calculated saturation within a cross-cultural study and found that fewer than 16 interviews were enough to reach data saturation at each of the four sites but that 20–40 interviews were necessary to identify cross-cultural meta-themes across sites.

Using a meta-analytic approach, Galvin [ 20 ] reviewed and statistically analyzed—using binomial logic—54 qualitative studies. He found the probability of identifying a concept (theme) among a sample of six individuals is greater than 99% if that concept is shared among 55% of the larger study population. Employing this same logic, Fugard and Potts [ 21 ] developed a quantitative tool to estimate sample sizes needed for thematic analyses of qualitative data. Their calculation incorporates: (1) the estimated prevalence of a theme within the population, (2) the number of desired instances of that theme, and (3) the desired power for a study. Their tool estimates, for example, that to have 80% power to detect two instances of a theme with a 10% prevalence in a population, 29 participants would be required. Note that their model assumes a random sample.

The above studies are foundational in the field of qualitative sample size estimation. They provide empirically-based guidance for approximating how many qualitative interviews might be needed for a given study and serve a role analogous to power calculations in quantitative research design (albeit in some case without the math and degree of precision). And, like power calculations, they are moot once data collection begins. Estimates are based on (specified) assumptions, and expectations regarding various elements in a particular study. As all researchers know, reality often presents surprises. Though a study may be powered to certain parameters (quantitative) or have a sample size based on empirical guidance (qualitative), after data collection is completed the resulting data may not conform to either.

Not surprisingly, researchers have recently begun asking two follow up questions about data saturation that go beyond estimation: How can we better operationalize the concept of saturation ? and How do we know if we have reached saturation ?

Operationalizing and assessing saturation

The range of empirical work on saturation in qualitative research and detail on the operationalization and assessment metrics used in data-driven studies that address saturation are summarized in Table 1 . In reviewing these studies to inform the development of our approach to assessing saturation, we identified three limitations to the broad application of saturation assessment processes which we sought to overcome: lack of comparability of metrics, reliance on probability theory or random sampling, and retrospective assessment dependent on having a fully coded/analyzed dataset. We discuss each limitation briefly before introducing our alternative approach.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0232076.t001

Lack of comparability in metrics.

Current operationalizations of saturation vary widely in the criteria used to arrive at a binary determination of saturation having been reached or not reached (e.g., Francis et al. [ 17 ] and Coenen et al. [ 22 ]). Given how different approaches are–in terms of units of analysis and strictness of saturation thresholds–it is difficult to understand how much confidence to have in a conclusion about whether saturation was reached or not. Unlike quantitative researchers using statistical analysis methods who have established options for levels of confidence intervals and other metrics to report, there are no agreed-upon metrics to help qualitative researchers interpret the strength of their saturation findings. The method we propose facilitates qualitative researchers’ choice among levels of assessment criteria along with a common description of those criteria that will allow readers to interpret conclusions regarding saturation with more or less confidence, depending on the strictness of the criteria used.

Reliance on probability theory, and/or the assumption of a random sample.

Basing assessments of saturation on probabilistic assumptions (e.g., Lowe et al. [ 26 ], Fugard & Potts [ 21 ], Galvin [ 20 ]) ignores the fact that most qualitative research employs non-probabilistic, purposive sampling suited to the nature and objectives of qualitative inquiry [ 28 ]. Even in cases where random sampling is employed, the open-ended nature of qualitative inquiry doesn’t lend itself well to probability theory or statistical inference to a larger population because response categories are not structured, so are not mutually exclusive. The expression of Theme A is not necessarily to the exclusion of Theme B, nor does the absence of the expression of Theme A necessarily indicate Not-A. Further, from a logistical standpoint, many qualitative researchers do not have the expertise, nor the time required, to perform complicated statistical tests on their datasets. Our approach involves only simple arithmetic and calculation of percentages.

Retrospective assessment dependent on having a fully coded/analyzed dataset.

Methods that calculate saturation based on the proportion of new themes relative to the overall number of themes in a dataset (e.g., Guest et al. [ 9 ], Hennink et al. [ 23 ]) are limited by the total number of interviews conducted: the denominator represents the total number of themes in the fully-analyzed dataset and is fixed, while the number of themes in the numerator gets closer to the denominator with every new interview considered, thus eventually reaching 100% saturation. Saturation will inevitably occur in a retrospectively-assessed, fully-analyzed, fixed-size dataset. The method we outline eliminates this problem by using a subset of data items in the denominator instead of the entire dataset, facilitating better prospective assessment of saturation and offering the advantage of allowing researchers to stop before reaching a pre-specified number of interviews. (Under our approach, however, a measure of percent saturation as defined by these authors will not be available.)

An alternative approach and method to calculating and reporting saturation

For the purposes of our assessment, saturation refers to the point during data analysis at which incoming data points (interviews) produce little or no new useful information relative to the study objectives. Our approach to operationalizing this definition of saturation consists of three distinct elements–the base size , the run length , and the relative amount of incoming new information, or the new information threshold .

When assessing saturation, incoming information is weighed against the information already obtained. Base size refers to how we circumscribe the body of information already identified in a dataset to subsequently use as a denominator (similar to Francis et al.’s initial analysis sample). In other words, what is the minimum number of data collection events (i.e., interviews) we should review/analyze to calculate the amount of information already gained ? We know that if we use all of the data collection events as our base size, we can reach saturation by default as there are no more data to consider. We also know from previous studies [ 9 , 16 , 29 ] that most novel information in a qualitative dataset is generated early in the process, and generally follows an asymptotic curve, with a relatively sharp decline in new information occurring after just a small number of data collection/analysis events. For this reason, we have chosen to test 4, 5, and 6 interviews as base sizes from which to calculate the total number of unique themes to be used in the denominator of the saturation ratio. The unit of analysis for base size is the data collection event; the items of analysis are unique codes representing themes.

Run length.

A run can be defined as a set of consecutive events or observations, in this case interviews. The run length is the number of interviews within which we look for, and calculate, new information . The number of new themes found in the run defines the numerator in the saturation ratio. Hagaman and Wutich (2017) and Francis et al. (2010), for example, consider runs of three data collection events each time they (re)assess the number of new themes for the numerator, whereas Coenen et al. (2012) include only two events in their data runs. For our analyses we provide both options for run lengths in our calculations–two events and three events–to afford researchers more flexibility. Note that in our analyses, successive runs overlap: each set of interviews shifts to the right or “forward” in time by one event. Fig 1 shows the process, and how base size and run length relate to one another. Here again the unit of analysis is the data collection event; the items of analysis are unique codes.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.g001

New information threshold.

Once units of analysis for the numerator and denominator are determined the proportional calculation is simple. But the next question is a purely subjective one: What level of paucity of new information should we accept as indicative of saturation? We propose that furnishing researchers with options—rather than a prescriptive threshold—is a more realistic, transparent and accurate practice. We therefore propose initially two levels of new information that represent the proportion of new information we would accept as evidence that saturation has been reached at a given point in data collection: ≤5% new information and no (0%) new information.

These new information thresholds can be used as benchmarks similar to how a p-value of <0.05 or <0.01 is used to determine whether enough evidence exists to reject a null hypothesis in statistical analysis. As in statistical analysis—but absent the probability theory—there is no guarantee that saturation is in fact reached when meeting these thresholds. But they do provide a transparent way of presenting data saturation assessments that can be subsequently interpreted by other researchers. The lower the new information threshold, the less likely an important number of themes may remain undiscovered in later interviews if data collection stops when the threshold is reached. Taken together, the concepts of base size, run length, and new information threshold allow researchers to choose how stringently they wish to apply the saturation concept–and the level of confidence they might have that data saturation was attained for a given sample ( Fig 2 ).

thumbnail

https://doi.org/10.1371/journal.pone.0232076.g002

The advantages of the method we propose are several:

  • It does not assume or require a random sample, nor prior knowledge of theme prevalence.
  • Calculation is simple. It can be done quickly and with no statistical expertise.
  • Metrics can be used prospectively during the data collection and analysis process to ascertain when saturation is reached (and providing the possibility of conducting fewer data collection events than planned).
  • Metrics can be used retrospectively , after data collection and analysis are complete, to report on the adequacy of the sample to reach thematic saturation.
  • Options for each metric can be specified prior to analysis or reported after data analysis.
  • The metrics are flexible. Researchers have options for how they describe saturation and can also use the term with more transparency and precision.
  • Saturation is conceptualized as a relative measure. This neutralizes differences in the level of coding granularity among researchers, as the method affects both numerator and denominator.

Application of the approach

An example of prospective data saturation calculation..

Let’s consider a step-by-step example of how this process works, using a hypothetical dataset to illustrate the approach. We will prospectively calculate saturation using a base size of 4 interviews and run length of 2 interviews. For this example, we have selected a new information threshold of ≤ 5% to indicate that we have reached adequate saturation. [The data used for each step are included in Fig 3 , along with indication of the base, runs, and saturation points.]

thumbnail

https://doi.org/10.1371/journal.pone.0232076.g003

STEP 1 –Find the number of unique themes for base.

We start by looking at the first four interviews conducted and summing the number of unique themes identified within this group. The resulting sum, 37, is the denominator in our equation.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t002

STEP 2—Find the number of unique themes for the first run.

In this example, we’re using a run length of two, so include data for the next two interviews after the base set–i.e., interviews 5 and 6. After reviewing those interviews, let’s say we identified four new themes in interview 5 and three new themes in interview 6. The number of new themes in this first run is seven.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t003

STEP 3 –Calculate the saturation ratio.

Divide the number of new themes in this run (seven) by the number of unique themes in the base set (37). The quotient reveals 19% new information. This is not below our ≤5% threshold, so we continue.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t004

STEP 4 –Find the number of new unique themes for the next run in the series.

For the next run we add the new themes for the next two interviews, 6 and 7 (note the overlap of interview 6), resulting in a sum of four.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t005

STEP 5—Update saturation ratio.

Take the number of new themes in the latest run (four) and divide by the number of themes in the base set (37). This renders a quotient of 11%, still not below our ≤5% threshold. We continue to the next run.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t006

STEP 6 –Find the number of new unique themes for the next run in the series.

For this third run we add the number of new themes identified within interviews 7 and 8.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t007

STEP 7—Update saturation ratio.

Take the number of new themes in the latest run (one) divided by the number of themes in the base set (37).

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t008

At this point the proportion of new information added by the last run is below the ≤5% threshold we established, so we stop here after the 8 th interview and have a good sense that the amount of new information is diminishing to a level where we could say saturation has been reached based on our subjective metric of ≤5%. Since the last two interviews did not add substantially to the body of information collected, we would say that saturation was reached at interview 6 (each of the next two interviews were completed to see how much new information would be generated and whether this would fall below the set threshold). We would annotate these two extra interviews (indicative of run length) by appending a superscript “+2” to the interview number, to indicate a total of eight interviews were completed. In writing up our saturation assessment then, we would say that using a base size 4 we reached the ≤5% new information threshold at 6 +2 interviews.

If we wanted to be more conservative, and confident in our conclusion of reaching saturation in this example, we could adjust two parameters of our assessment. We could increase the run length to 3 (or an even larger number), and/or we could set a more stringent new information threshold of no new information. If we consider the hypothetical data set used here (see Fig 3 ) and kept the run length of 2, the 0% new information threshold would have been reached at interview 10 +2 .

One may still raise two logical questions after reviewing the example process above. The first is “How do we know that we’re not missing important information by capping our sample at n when saturation is indicated?” Put another way, if we had conducted, say, five more interviews would we have gotten additional and important data? The honest answer to this is that we don’t know, and we can never know unless we conduct those five extra interviews, and then five more after that and so on. That is where we rely on the empirical research that shows the rate at which new information emerges decreases over time and that the most common and salient themes are generated early, assuming that we keep the interview questions, sample characteristics, and other study parameters relatively consistent. To further illustrate how saturation may have been affected by doing additional interviews, we include 20 interviews in Fig 3 . The interviews following Interview 12, though yielding four additional themes, remained at or below the ≤5% new information threshold.

The second question is to a degree related to the first question and pertains to possible order effects. Would the theme identification pattern in a dataset of 20 interviews look the same if interviews #10 through #20 were conducted first? Could new themes start emerging later in the data collection process? Though it is possible an important theme will emerge later in the process/dataset, the empirical studies referenced above demonstrate that the most prevalent, high-level, themes are identified very early on in data collection, within about six interviews. But, to further check this, we use a bootstrapping technique on three actual datasets to corroborate findings from these earlier studies and to assess the distributional properties of our proposed metrics. These bootstrap findings give us information on how saturation may be reached at different stopping points as new themes are discovered in new interviews and when the interviews are ordered randomly in different replications of the sample of interviews.

Sample datasets.

We selected three existing qualitative datasets to which we applied the bootstrapping method. Although the datasets were all generated from individual interviews analyzed using an inductive thematic analysis approach, the studies from which they were drawn differed with respect to study population, topics of inquiry, sample heterogeneity, interviewer, and structure of data collection instrument, as described below.

Dataset 1 . This study included 40 individual interviews with African American men in the Southeast US about their health seeking behaviors [ 29 ]. The interview guide contained 13 main questions, each with scripted sub-questions. Inductive probing was employed throughout all interviews. The inductive thematic analysis included 11 of the 13 questions and generated 93 unique codes. The study sample was highly homogenous.

Dataset 2 . The second dataset consists of 48 individual interviews conducted with (mostly white) mothers in the Southeast US about medical risk and research during pregnancy [ 30 ]. The interview guide contained 13 main questions, each with scripted sub-questions. Inductive probing was employed throughout all interviews. Of note, the 48 interviews were conducted, 12 each, using different modes of data collection: in-person, by video (Skype-like platform), email (asynchronous), or text chat (synchronous). The qualitative thematic analysis included 10 of these questions and generated 85 unique codes.

Dataset 3 . This study included 60 interviews with women at higher risk of HIV acquisition—30 participants in Kenya and 30 in South Africa [ 31 ]. The interview was a follow-up qualitative inquiry into women’s responses on a quantitative survey. Though there were 14 questions on the guide, only data from three questions were included in the thematic analysis referenced here. Those three questions generated 55 codes. Participants from the two sites were similar demographically with the exceptions of education and marital status. Substantially more women from the Kenya sample were married and living with their partners (63% versus 3%) and were less likely to have completed at least some secondary education. All interviews were conducted in a local language.

Data from all three studies were digitally recorded and transcribed using a transcription protocol [ 32 ]; transcripts were translated to English for Dataset 3. Transcripts were imported into NVivo [ 33 ] to facilitate coding and analysis. All three datasets were analyzed using a systematic inductive thematic approach [ 2 ], and all codes were explicitly defined in a codebook following a standard template [ 34 ]. For Datasets 1 & 2, two analysts coded each transcript independently and compared code application after each transcript. Discrepancies in code application were resolved through discussion, resulting in consensus-coded documents. For Dataset 3, two coders conducted this type of inter-coder reliability assessment on 20% of the interviews (a standard, more efficient approach than double-coding all interviews [ 2 ]). All three studies were reviewed and approved by the FHI 360 Protection of Human Subjects Committee; the study which produced Dataset 3 was also reviewed and approved by local IRBs in Kenya and South Africa.

Bootstrapping method.

While these three studies offer diverse and analytically rigorous case studies, they provide limited generalizability. To approximate population-level statistics and broaden our validation exercise, we drew empirical bootstrap samples from each of the datasets described above. The bootstrap method is a resampling technique that uses the variability within a sample to estimate the sampling distribution of metrics (in this case saturation metrics) empirically [ 35 ]. This is done by randomly resampling from the sample with replacement (i.e., an item may be selected more than once in a resample) many times in a way that mimics the original sampling scheme. For each qualitative dataset, we generated 10,000 resamples from the original sample. In addition, we randomly ordered the selected transcripts in each resample to offset any order effect on how/when new codes are discovered. For each resample, we calculated the proportion of new themes found in run lengths of two or three new events relative to a base size of four, five or six interviews. We then identified the number of transcripts needed to meet a new information threshold of ≤5% or 0%. Based on these thresholds from 10,000 resamples, for each dataset we computed the median and the 5th and 95th percentiles for number of interviews required to reach each new information threshold across different base sizes and run lengths. The 5th and 95th percentiles provide a nonparametric 90% confidence interval for the number of transcripts needed to reach saturation as defined at these new information thresholds.

Since we had available the total number of codes identified in each dataset, we carried out one additional calculation as a way to provide another metric to understand how the median number of interviews to reach a new information threshold related to retrospectively-assessed degrees of saturation with the entire dataset. In this case, once the number of interviews to reach a new information threshold was determined for each run of a dataset, we divided the number of unique themes identified up to that point by the total number of unique themes. This provided a percent–or degree–of saturation for each run of the data, which was then used to generate a median and 5 th and 95 th percentile for the degree of saturation reached. This can then be compared across base sizes, run lengths, and new information thresholds. [Note that we include this as a further way to understand and validate the proposed approach for calculating saturation, rather than as part of the proposed process.]

The results from the bootstrapping analyses are presented by dataset, in Tables 2 , 3 and 4 . Each table presents median and percentiles of the bootstrap distribution using bases of 4, 5 or 6 and run lengths of 2 and 3, at new information thresholds of ≤5% and no new information.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t009

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t010

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t011

Note that, as described in the example above, the number of interviews in the run length is not included in the number of interviews to reach the given new information threshold, so the total number of events needed to assess having reached the threshold is two or three more interviews than the given median, depending on the run length of choice. This is indicated by a superscript +2 or +3.

For Dataset 1 ( Table 2 ), at the ≤5% new information threshold, the median number of interviews needed to reach a drop-off in new information was consistent across all base sizes. At a run length of two interviews, the median number of interviews required before a drop in new information was observed was six. This means that relative to the total number of unique codes identified in the first four, five, or six interviews, the amount of new information contributed by interviews 7 and 8 was less than or equal to 5% of the total. At a run length of three interviews, the median number of interviews required before a drop in new information was observed was seven. This means that relative to the total number of unique codes identified in the first four, five, or six interviews, the amount of new information contributed by interviews 8, 9, and 10 was less than or equal to 5% of the total. Across base sizes, for a run length of two, we would say that saturation was indicated at 6 +2 , while for a run length of three we would say saturation was observed at 7 +3 , both at the ≤5% new information level. Using the total number of themes in the dataset retrospectively, the number of themes evident across 6–7 interviews corresponded with a median degree of saturation of 78% to 82%.

At the 0% new information threshold, the median number of interviews to indicate saturation were again consistent across bases sizes, varying only by the run length. The median number of interviews required were 11 +2 and 14 +3 . In other words, at run length 2, it took 11 interviews, plus two more to confirm that no new information was contributed. At run length 3 it was 14 interviews plus three more to confirm no new information. The number of themes evident across 11–14 interviews corresponded with a median degree of saturation of 87% to 89%.

The results for Dataset 2 were nearly identical to Dataset 1 ( Table 3 ). Saturation was indicated at 6 interviews at a run length of 2 (6 +2 ) and 7–8 interviews at run length 3 (7 +3 or 8 +3 ). The number of themes evident across 6–8 interviews corresponded with a median degree of saturation of 79% to 82%. At the 0% new information threshold saturation was indicated at the same points as in Dataset 1: 11 +2 and 14 +3 , consistent across all base sizes. In other words, no new information was observed after a median of 11 interviews using a run-length of 2, nor after 14 interviews using a run length of 3. Here again, despite a different total number of themes in the overall dataset, the number of new themes evident across 11–14 interviews corresponded with a median degree of saturation of 87% to 89%.

Dataset 3 ( Table 4 ) contained more variation in the sample than the others, which was reflected in a slightly higher median number of interviews and a lower degree of saturation. At the ≤5% new information threshold, the median number of interviews required to reach saturation at a run length of 2 was 8–9 (higher for base size 4). At a run length of 3, the median number of required interviews was 11–12 (again higher for base size 4). The number of new themes evident across 8–12 interviews corresponded with a median degree of saturation of 62% to 71%. At the 0% new information threshold, saturation was indicated at 12 +2 and 16 +3 , consistent across base sizes. The number of new themes evident across 12–16 interviews corresponded with a median degree of saturation of 69% to 76%.

In this paper we present a way of assessing thematic saturation in inductive analysis of qualitative interviews. We describe how this method circumvents many of the limitations associated with other ways of conceptualizing, assessing and reporting on saturation within an in-depth interview context. The process can be applied either prospectively during the data collection and analysis process or retrospectively , after data collection and analysis are complete. A key advantage is that the metrics are flexible, affording researchers the ability to choose different degrees of rigor by selecting different run lengths and/or new information thresholds. Similarly, the method allows for different options–and greater clarity and transparency–in describing and reporting on saturation.

Based on the bootstrapping analyses we can draw several conclusions. The first is that the results are within the range of what we would have expected based on previous empirical studies. Using the ≤5% new information threshold, our findings indicate that typically 6–7 interviews will capture the majority of themes in a homogenous sample (6 interviews to reach 80% saturation). Our analyses also show that at the higher end of the range for this option (95 th %ile) 11–12 interviews might be needed, tracking with existing literature indicating 12 interviews are typically needed to reach higher degrees of saturation.

We can also draw other lessons to inform application of this process:

  • Base size appears to have almost no effect on the outcome. This is important from an efficiency perspective. If our findings hold true in other contexts, it suggests that using a default base size of four interviews is sufficient. In practical terms, this implies that saturation should initially be assessed after six interviews (four in the base, and two in the run). If analyzing data in real time, the results of this initial assessment can then determine whether or not more interviews are needed.
  • Run length has an effect on the outcome, as one would expect. The longer the run length, the greater number of interviews required to reach saturation. The size of run length effect is smallest–very minimal–if employing the ≤5% new information threshold. The practical implication of this finding is that researchers can choose a longer run length–e.g., three interviews (or more)–to generate a more conservative assessment of saturation.
  • The new information threshold selected affects the point at which saturation is indicated, as one would expect. The lower the new information threshold–and therefore the more conservative the allowance for recognizing new information–the more interviews are needed to achieve saturation. From an applied standpoint this finding is important in that researchers can feel confident that choosing a more stringent new information threshold–e.g., 0%—will result in a more conservative assessment of saturation, if so desired.

There are, of course, still limitations to this approach. It was developed with applied inductive thematic analyses in mind–those for which the research is designed to answer a relatively narrow question about a specific real-world issue or problem–and the datasets used in the bootstrapping analyses were generated and analyzed within this framework. The applicability of this approach for qualitative research with a different epistemological or phenomenological perspective is yet untested. Another potential limitation of this method relates to codebook structure. When conducting an inductive thematic analysis, researchers must decide on an appropriate codebook organizational scheme (see Hennink et al. [ 23 ] for discussion on this as it relates to saturation). We tested our method on single-tier codebooks, but qualitative researchers often create hierarchical codebooks. A two-tier structure with primary (“parent”) codes and constituent secondary (“child”) codes is a common form, but researchers may also want to identify and look for higher-level, meta-themes (e.g., Hagaman and Wutich [ 19 ]). For any method of assessing saturation, including ours, researchers need to decide at which level they will identify and include themes/codes. For inductive thematic analyses this is a subjective decision that depends on the degree of coding granularity necessary for a particular analytic objective, and how the research team wants to discuss saturation when reporting study findings. That said, a researcher could, with this approach, run and report on saturation analyses of two or more codebooks that contain differing levels of coding granularity.

Tran and colleagues [ 24 ] accurately point out that determining the point of saturation is a difficult endeavor, because “researchers have information on only what they have found” (pg. 17). They further argue that the stopping point for an inductive study is typically determined by the “judgement and experience of researchers”. We acknowledge and agree with these assertions.

Selecting and interpreting levels of rigor, precision, and confidence is a subjective enterprise. What a quantitative researcher accepts, for example, as a large enough effect size or a small enough p-value is a subjective determination and based on convention in a particular field of study. The same can be said for how a researcher chooses to report and interpret statistical findings. P-values can be expressed either in absolute terms (e.g., p = .043) or in several commonly used increments (e.g., p < .05, p < .01, etc.). Likewise, while an odds ratio of 1.2 may be statistically significant, whether or not it’s meaningful in a real-world sense is entirely open to interpretation.

We are advocating for similar flexibility and transparency in assessing and reporting on thematic saturation. We have provided researchers with a method to easily calculate saturation during or after data collection. This method also enables researchers to select different levels of the constituent elements in the process–i.e., Base Size, Run Length and New Information Threshold–based on how confident they wish to be that their interpretations and conclusions are based on a dataset that reached thematic saturation. We hope researchers find this method useful, and that others build on our work by empirically testing the method on different types of datasets drawn from diverse study populations and contexts.

Supporting information

S1 datasets. datasets used in bootstrapping analyses..

https://doi.org/10.1371/journal.pone.0232076.s001

Acknowledgments

We would like to thank Betsy Tolley for reviewing an earlier draft of this work and Alissa Bernholc for programming support.

  • View Article
  • Google Scholar
  • 2. Guest G, MacQueen K, Namey E. Applied Thematic Analysis. Thousand Oaks, CA: Sage; 2012.
  • 3. Miles MB, Huberman A.M., Saldana J. Qualitative Data Analysis: A Methods Sourcebook. 3 ed. Thousand Oaks, CA: Sage; 2014.
  • 4. Bernard HR, & Ryan G. W. Analyzing qualitative data: Systematic approaches. Thousand Oaks, CA: Sage; 2010.
  • PubMed/NCBI
  • 10. Glaser B, Strauss A. The Discovery of Grounded Theory: Strategies for Qualitative Research. New York, NY: Aldine; 1967 1967.
  • 11. Given LM. 100 Questions (and Answer) about Qualitative Research. Thousand Oaks, CA: Sage; 2016.
  • 12. Birks M, Mills J. Grounded Theory: A Practical Guide. 2 ed. London: Sage; 2015.
  • 13. Olshansky EF. Generating theory using grounded theory methodology. In: de Chesnay M, editor. Nursing Research Using Grounded Theory: Qualitative Designs and Methods in Nursing. New York Springer; 2015. p. 19–28.
  • 14. Cheek J. An untold story: doing funded qualitative research. In: Denzin N, Lincoln Y, editors. Handbook for Qualitative Research. Thousand Oaks, CA: Sage Publications; 2000. p. 401–20.
  • 15. Charmaz K. Constructing Grounded Theory, 2nd ed. Thousand Oaks, CA: Sage; 2014.
  • 16. Morgan M, Fischoff B, Bostrom A, Atman C. Risk Communication: A Mental Models Approach. New York, NY: Cambridge University Press; 2002.
  • 28. Patton M. Qualitative research & evaluation methods: integrating theory and practice. 4th ed. Thousand Oaks, CA: Sage; 2015.
  • 33. QSR. NVivo qualitative data analysis software, version 10. 2012.
  • 34. MacQueen K, McLellan-Lemal E, Bartholow K, Milstein B. Team-based codebook development: structure, process, and agreement. In: Guest G, MacQueen K, editors. Handbook for Team-based Qualitative Research. Lanham, MD: AltaMira Press; 2008. p. 119–36.
  • 35. Lavrakas PJ, editor. Encyclopedia of Survey Research Methods. Thousand Oaks, California2008.

Qualitative study design: Sampling

  • Qualitative study design
  • Phenomenology
  • Grounded theory
  • Ethnography
  • Narrative inquiry
  • Action research
  • Case Studies
  • Field research
  • Focus groups
  • Observation
  • Surveys & questionnaires
  • Study Designs Home

As part of your research, you will need to identify "who" you need to recruit or work with to answer your research question/s. Often this population will be quite large (such as nurses or doctors across Victoria), or they may be difficult to access (such as people with mental health conditions). Sampling is a way that you can choose a smaller group of your population to research and then generalize the results of this across the larger population.

There are several ways that you can sample. Time, money, and difficulty or ease in reaching your target population will shape your sampling decisions. While there are no hard and fast rules around how many people you should involve in your research, some researchers estimate between 10 and 50 participants as being sufficient depending on your type of research and research question (Creswell & Creswell, 2018). Other study designs may require you to continue gathering data until you are no longer discovering new information ("theoretical saturation") or your data is sufficient to answer your question ("data saturation").

Why is it important to think about sampling?

It is important to match your sample as far as possible to the broader population that you wish to generalise to. The extent to which your findings can be applied to settings or people outside of who you have researched ("generalisability") can be influenced by your sample and sampling approach. For example, if you have interviewed homeless people in hospital with mental health conditions, you may not be able to generalise the results of this to every person in Australia with a mental health condition, or every person who is homeless, or every person who is in hospital. Your sampling approach will vary depending on what you are researching, but you might use a non-probability or probability (or randomised) approach.

Non-Probability sampling approaches

Non-Probability sampling is not randomised, meaning that some members of your population will have a higher chance of being included in your study than others. If you wanted to interview homeless people with mental health conditions in hospital and chose only homeless people with mental health conditions at your local hospital, this would be an example of convenience sampling; you have recruited participants who are close to hand. Other times, you may ask your participants if they can recommend other people who may be interested in the study: this is an example of snowball sampling. Lastly, you might want to ask Chief Executive Officers at rural hospitals how they support their staff mental health; this is an example of purposive sampling.

Examples of non-probability sampling include:

  • Purposive (judgemental)
  • Convenience

Probability (Randomised) sampling

Probability sampling methods are also called randomised sampling. They are generally preferred in research as this approach means that every person in a population has a chance of being selected for research. Truly randomised sampling is very complex; even a simple random sample requires the use of a random number generator to be used to select participants from a list of sampling frame of the accessible population. For example, if you were to do a probability sample of homeless people in hospital with a mental health condition, you would need to develop a table of all people matching this criteria; allocate each person a number; and then use a random number generator to find your sample pool. For this reason, while probability sampling is preferred, it may not be feasible to draw out a probability sample.

Things to remember:

  • Sampling involves selecting a small subsection of your population to generalise back to a larger population
  • Your sampling approach (probability or non-probability) will reflect how you will recruit your participants, and how generalisable your results are to the wider population
  • How many participants you include in your study will vary based on your research design, research question, and sampling approach

Further reading:

Babbie, E. (2008). The basics of social research (4th ed). Belmont: Thomson Wadsworth

Creswell, J.W. & Creswell, J.D. (2018). Research design: Qualitative, quantitative and mixed methods approaches (5th ed). Thousand Oaks: SAGE

Salkind, N.J. (2010) Encyclopedia of research design. Thousand Oaks: SAGE Publications

Vasileiou, K., Barnett, J., Thorpe, S., & Young, T. (2018). Characterising and justifying sample size sufficiency in interview-based studies: systematic analysis of qualitative health research over a 15-year period. BMC Medical Research Methodology, 18(148)

  • << Previous: Interviews
  • Next: Appraisal >>
  • Last Updated: Jul 3, 2024 11:46 AM
  • URL: https://deakin.libguides.com/qualitative-study-designs

InterQ Research

What’s in a Number? Understanding the Right Sample Size for Qualitative Research

  • May 3, 2019

By Julia Schaefer

Unlike quantitative research , numbers matter less when doing qualitative research.

It’s about quality, not quantity. So what’s in a number?

When thinking about sample size, it’s really important to ensure that you understand your target and have recruited the right people for the study. Whether your company is targeting moms from the Midwest with household incomes of $70k+, or teens who use Facebook for more than 8 hours a week, it’s crucial to understand the goals and objectives of the study and how the right target can help answer your essential research questions.

Determining the Right Sample Size For Qualitative Research Tip #1: Right Size for Qualitative Research

A high-quality panel includes much more than just members who are pulled from a general population. The right respondents for the study will have met all the criteria line-items identified from quantitative research studies and check the boxes that the client has identified through their own research. Only participants who match the audience specifications and background relevance expressed by the client should be actively recruited.

Determining the Right Sample Size For Qualitative Research Tip #2: No Two Studies are Alike

Choosing an appropriate study design is an important factor to consider when determining which sample size to use. There are various methods that can be used to gather insightful data, but not all methods may be applicable to your study and your project goal. In-depth interviews , focus groups , and ethnographic research are the most common methods used in qualitative market research. Each method can provide unique information and certain methods are more relevant than others. The types of questions being studied play an equally important role in deciding on a sample size.

Determining the Right Sample Size For Qualitative Research Tip #3:  Principle of Saturation and Diminishing Returns

Understanding the difference of which qualitative study to use is very important. Your study should have a large enough sample size to uncover a variety of opinions, and the sample size should be limited at the point of saturation.

Saturation occurs when adding more participants to the study does not result in obtaining additional perspectives or information. One can say there is a point of diminishing returns with larger samples, as it leads to more data but doesn’t necessarily lead to more information. A sample size should be large enough to sufficiently describe the phenomenon of interest, and address the research question at hand. However, a large sample size risks having repetitive and redundant data.

The objective of qualitative research is to reduce discovery failure, while quantitative research aims to reduce estimation error. As qualitative research works to obtain diverse opinions from a sample size on a client’s product/service/project, saturated data does benefit the project findings. As part of the analysis framework, one respondent’s opinion is enough to generate a code.

The Magic Number? Between 15-30

Based on research conducted on this issue, if you are building similar segments within the population, InterQ’s recommendation for in-depth interviews is to have a sample size of 15-30. In some cases, a minimum of 10 is sufficient, assuming there has been integrity in the recruiting process. With the goal to maintain a rigorous recruiting process, studies have noted having a sample size as little as 10 can be extremely fruitful, and still yield strong results.

Curious about qualitative research? Request a proposal today >

how to determine number of respondents in qualitative research

  • Request Proposal
  • Participate in Studies
  • Our Leadership Team
  • Our Approach
  • Mission, Vision and Core Values
  • Qualitative Research
  • Quantitative Research
  • Research Insights Workshops
  • Customer Journey Mapping
  • Millennial & Gen Z Market Research
  • Market Research Services
  • Our Clients
  • InterQ Blog

Sample Size for Interview in Qualitative Research in Social Sciences: A Guide to Novice Researchers

  • September 2022
  • Research in Educational Policy and Management 4(1):42-50

Wasihun Bekele at Mizan-Tepi University

  • Mizan-Tepi University

Fikire Yohannes at Mizan-Tepi University

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • J ENVIRON MANAGE

Inamutila Kahupi

  • RG Guntur Alam
  • Pebi Selviani
  • ARCH SEX BEHAV
  • Shari L. Dworkin

I Made Agus Wirawan

  • Ni Nyoman Mestri Agustini

Yan Wu

  • Imane Loulidi Mafhoum
  • Nabil Belmekki
  • Daoud Miloud
  • ريناد عبــد اللــه القحطــانــي
  • لمى عبـد الحكــيم سحــــاب
  • Mashanim Mahazir

Rahimi A. Rahman

  • Nurhaizan Mohd Zainudin
  • Salmaliza Salleh

Poeti Nazura Gulfira Akbar

  • Mareta Maulidiyanti

Ngurah Wiwesa

  • Anisatul Auliya

Eneli Kindsiko

  • BMC MED RES METHODOL
  • Konstantina Vasileiou

Julie Barnett

  • Susan Thorpe
  • Terry Young

Julius Sim

  • Sarah Elsie Baker

Rosalind Edwards

  • QUAL HEALTH RES

Janice M Morse

  • Creswell JW
  • Susan E. Chase
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Sample Size Policy for Qualitative Studies Using In-Depth Interviews

  • Published: 12 September 2012
  • Volume 41 , pages 1319–1320, ( 2012 )

Cite this article

how to determine number of respondents in qualitative research

  • Shari L. Dworkin 1  

303k Accesses

580 Citations

25 Altmetric

Explore all metrics

Avoid common mistakes on your manuscript.

In recent years, there has been an increase in submissions to the Journal that draw on qualitative research methods. This increase is welcome and indicates not only the interdisciplinarity embraced by the Journal (Zucker, 2002 ) but also its commitment to a wide array of methodologies.

For those who do select qualitative methods and use grounded theory and in-depth interviews in particular, there appear to be a lot of questions that authors have had recently about how to write a rigorous Method section. This topic will be addressed in a subsequent Editorial. At this time, however, the most common question we receive is: “How large does my sample size have to be?” and hence I would like to take this opportunity to answer this question by discussing relevant debates and then the policy of the Archives of Sexual Behavior . Footnote 1

The sample size used in qualitative research methods is often smaller than that used in quantitative research methods. This is because qualitative research methods are often concerned with garnering an in-depth understanding of a phenomenon or are focused on meaning (and heterogeneities in meaning )—which are often centered on the how and why of a particular issue, process, situation, subculture, scene or set of social interactions. In-depth interview work is not as concerned with making generalizations to a larger population of interest and does not tend to rely on hypothesis testing but rather is more inductive and emergent in its process. As such, the aim of grounded theory and in-depth interviews is to create “categories from the data and then to analyze relationships between categories” while attending to how the “lived experience” of research participants can be understood (Charmaz, 1990 , p. 1162).

There are several debates concerning what sample size is the right size for such endeavors. Most scholars argue that the concept of saturation is the most important factor to think about when mulling over sample size decisions in qualitative research (Mason, 2010 ). Saturation is defined by many as the point at which the data collection process no longer offers any new or relevant data. Another way to state this is that conceptual categories in a research project can be considered saturated “when gathering fresh data no longer sparks new theoretical insights, nor reveals new properties of your core theoretical categories” (Charmaz, 2006 , p. 113). Saturation depends on many factors and not all of them are under the researcher’s control. Some of these include: How homogenous or heterogeneous is the population being studied? What are the selection criteria? How much money is in the budget to carry out the study? Are there key stratifiers (e.g., conceptual, demographic) that are critical for an in-depth understanding of the topic being examined? What is the timeline that the researcher faces? How experienced is the researcher in being able to even determine when she or he has actually reached saturation (Charmaz, 2006 )? Is the author carrying out theoretical sampling and is, therefore, concerned with ensuring depth on relevant concepts and examining a range of concepts and characteristics that are deemed critical for emergent findings (Glaser & Strauss, 1967 ; Strauss & Corbin, 1994 , 2007 )?

While some experts in qualitative research avoid the topic of “how many” interviews “are enough,” there is indeed variability in what is suggested as a minimum. An extremely large number of articles, book chapters, and books recommend guidance and suggest anywhere from 5 to 50 participants as adequate. All of these pieces of work engage in nuanced debates when responding to the question of “how many” and frequently respond with a vague (and, actually, reasonable) “it depends.” Numerous factors are said to be important, including “the quality of data, the scope of the study, the nature of the topic, the amount of useful information obtained from each participant, the use of shadowed data, and the qualitative method and study designed used” (Morse, 2000 , p. 1). Others argue that the “how many” question can be the wrong question and that the rigor of the method “depends upon developing the range of relevant conceptual categories, saturating (filling, supporting, and providing repeated evidence for) those categories,” and fully explaining the data (Charmaz, 1990 ). Indeed, there have been countless conferences and conference sessions on these debates, reports written, and myriad publications are available as well (for a compilation of debates, see Baker & Edwards, 2012 ).

Taking all of these perspectives into account, the Archives of Sexual Behavior is putting forward a policy for authors in order to have more clarity on what is expected in terms of sample size for studies drawing on grounded theory and in-depth interviews. The policy of the Archives of Sexual Behavior will be that it adheres to the recommendation that 25–30 participants is the minimum sample size required to reach saturation and redundancy in grounded theory studies that use in-depth interviews. This number is considered adequate for publications in journals because it (1) may allow for thorough examination of the characteristics that address the research questions and to distinguish conceptual categories of interest, (2) maximizes the possibility that enough data have been collected to clarify relationships between conceptual categories and identify variation in processes, and (3) maximizes the chances that negative cases and hypothetical negative cases have been explored in the data (Charmaz, 2006 ; Morse, 1994 , 1995 ).

The Journal does not want to paradoxically and rigidly quantify sample size when the endeavor at hand is qualitative in nature and the debates on this matter are complex. However, we are providing this practical guidance. We want to ensure that more of our submissions have an adequate sample size so as to get closer to reaching the goal of saturation and redundancy across relevant characteristics and concepts. The current recommendation that is being put forward does not include any comment on other qualitative methodologies, such as content and textual analysis, participant observation, focus groups, case studies, clinical cases or mixed quantitative–qualitative methods. The current recommendation also does not apply to phenomenological studies or life history approaches. The current guidance is intended to offer one clear and consistent standard for research projects that use grounded theory and draw on in-depth interviews.

Editor’s note: Dr. Dworkin is an Associate Editor of the Journal and is responsible for qualitative submissions.

Baker, S. E., & Edwards, R. (2012). How many qualitative interviews is enough? National Center for Research Methods. Available at: http://eprints.ncrm.ac.uk/2273/ .

Charmaz, K. (1990). ‘Discovering’ chronic illness: Using grounded theory. Social Science and Medicine, 30 , 1161–1172.

Article   PubMed   Google Scholar  

Charmaz, K. (2006). Constructing grounded theory: A practical guide through qualitative analysis . London: Sage Publications.

Google Scholar  

Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research . Chicago: Aldine Publishing Co.

Mason, M. (2010). Sample size and saturation in PhD studies using qualitative interviews. Forum: Qualitative Social Research, 11 (3) [Article No. 8].

Morse, J. M. (1994). Designing funded qualitative research. In N. Denzin & Y. Lincoln (Eds.), Handbook of qualitative research (pp. 220–235). Thousand Oaks, CA: Sage Publications.

Morse, J. M. (1995). The significance of saturation. Qualitative Health Research, 5 , 147–149.

Article   Google Scholar  

Morse, J. M. (2000). Determining sample size. Qualitative Health Research, 10 , 3–5.

Strauss, A. L., & Corbin, J. M. (1994). Grounded theory methodology. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 273–285). Thousand Oaks, CA: Sage Publications.

Strauss, A. L., & Corbin, J. M. (2007). Basics of qualitative research: Techniques and procedures for developing grounded theory . Thousand Oaks, CA: Sage Publications.

Zucker, K. J. (2002). From the Editor’s desk: Receiving the torch in the era of sexology’s renaissance. Archives of Sexual Behavior, 31 , 1–6.

Download references

Author information

Authors and affiliations.

Department of Social and Behavioral Sciences, University of California at San Francisco, 3333 California St., LHTS #455, San Francisco, CA, 94118, USA

Shari L. Dworkin

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Shari L. Dworkin .

Rights and permissions

Reprints and permissions

About this article

Dworkin, S.L. Sample Size Policy for Qualitative Studies Using In-Depth Interviews. Arch Sex Behav 41 , 1319–1320 (2012). https://doi.org/10.1007/s10508-012-0016-6

Download citation

Published : 12 September 2012

Issue Date : December 2012

DOI : https://doi.org/10.1007/s10508-012-0016-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

Qualitative Researcher Dr Kriukow

Articles and blog posts

How to choose the right sample size for a qualitative study… and convince your supervisor that you know what you’re doing.

how to determine number of respondents in qualitative research

The question of how many participants are enough for a qualitative interview is, in my opinion, one of the most difficult questions to find an answer to in the literature. In fact, many authors who set out to find specific guidelines on the ideal sample size in qualitative research in the literature have also concluded that these are “virtually non-existent” (Guest, Bunce and Johnson, 2005: 59).   This is particularly unfortunate, given that as a student planning to undertake your research, one of the things that will be most likely to be asked of you is to indicate, and justify, the number of participants in your planned study (this also includes your PhD proposal in which you are expected to give as much detail of the study as possible).

If you, then, turn to the literature, hoping to find advice from some of the great minds in research methodology, you are likely to find them evading the question and often hiding behind the term “saturation” which refers to the point at which gathering new data does not provide any new theoretical insights into the studied phenomenon. Although the concept of saturation may also be controversial, not least because the longer you explore, analyse and reflect on your data, you are always likely to find something “new” in it, it has come to be the guiding concept in establishing sample size in many qualitative studies. As Guest, Bunce and Johnson (2005) rightly point out, however

“although the idea of saturation is helpful at the conceptual level, it provides little practical guidance for estimating sample sizes for robust research prior to data collection”

     (Guest, Bunce and Johnson, 2005: 59)

In other words – how in the world are we supposed to know when we will reach saturation PRIOR TO THE STUDY???

My advice is to use the available literature on the point of saturation and use it to justify your decision regarding the sample size. I did it for my PhD study, as I was growing frustrated that I really have to justify my decision to include 20 participants for an interview, even though I had read dozens of reports in which this number, or smaller, was common (“are you going to interview 20 participants just because others did?”). I just felt that this would be enough, and my common sense, which as I learnt throughout my PhD was the last thing that anyone would care about, was telling me the same thing. In order to support my decision with the literature, however, and considering that there are hardly any guidelines for establishing sample size , I decided to try to reach some sort of conclusion as to how many participants are enough to reach saturation and use it as my main argument for establishing the size of the sample.

So what does the literature tell us about this? Just as there is not single answer as to what sample size is sufficient, there is no single answer to the question of what sample size is sufficient to reach theoretical saturation .  Such factors as heterogeneity of the studied population, the scope of the study and the adopted methods and their application (e.g. the length of the interviews) are believed, however, to have a central role in achieving this (cf. Baker and Edwards, 2012; Guest, Bunce and Johnson, 2005; Mason, 2010). Mason’s (2010) analysis of 560 PhD studies that adopted a qualitative interview as their main method revealed that the most common sample size in qualitative research is between 15 and 50 participants, with 20 being the average sample size in grounded theory studies (which was also the type of study I was undertaking). Guest, Bunce and Johnson (2005) used data from their own study to conclude that 88% of the codes they developed when analysing the data from 60 qualitative interviews were created by the time 12 interviews had been conducted.

These findings helped me in arguing that my initial sample size was going to be 20. “Given the detailed design of the study, which includes triangulation of the data and methods”, I argued, “I believe that this number will enable me to make valid judgements about the general trends emerging in the data”. I also stated that I am planning to recruit more participants, should the saturation not occur.

I hope that this article will help you in your quest to determine the sample size for your study and give you an idea of how you can go about arguing that it is a well thought-through decision. Do remember, however, that 20 participants may be enough for one study and not enough, or too many, for another. The point of this article was not to argue that 20 participants is a universally right number for a qualitative study, but rather to point to the fact that there is no such universally right number and that you are not the only one struggling to find guidelines regarding the interview sample size, as well as to put forward the concept of saturation as one of possible principles that may guide you in deciding how many participants to recruit for your study.

If you have any questions regarding this topic, comment below or send me a message through my Facebook page .

  • UPDATE – see my Facebook page for my response to the question about the relevance of “saturation” for Phenomenological research

References:

Baker, S. & Edwards, R. (eds., 2012). How many qualitative interviews is enough? Expert voices and early career reflections on sampling and cases in qualitative research. National Centre for Research Methods , 1-42.

Guest, G., Bunce, A. & Johnson, L. (2005). How many interviews are enough? An experiment with data saturation and variability. Field Methods, 18 (1), 59-82.

Mason, M. (2010). Sample Size and Saturation in PhD Studies Using Qualitative Interviews. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 11 (3).

Jarek Kriukow

France phone number

Earn money by answering our surveys

Logo of IntoTheMinds

  • Market research
  • Satisfaction and Loyalty
  • Marketing strategy
  • (Big) Data analysis
  • Sectoral expertise
  • Our specialty : customer satisfaction
  • Our passion for SMEs
  • Our beliefs
  • Our vision of marketing
  • B2B market research
  • B2C market research
  • Customer satisfaction analysis
  • Customer experience analysis
  • Mystery shopping
  • Retail location analysis
  • Design Thinking
  • Market research (national)
  • Market research (international)
  • Satisfaction survey
  • Reputation survey
  • Product development

How many interviews should you conduct for your qualitative research?

taille echantillon qualitatif

How many qualitative interviews should you conduct for your market research? We have built an interactive tool (below) to answer this complex question. Select one of the 4 scenarios to automatically calculate the number of qualitative interviews you need to conduct.

Why do you want to carry out qualitative interviews? *

  • Find ideas for a new product or service
  • Submit an idea and get feedback
  • Test a new product or service
  • Analyze the current uses and behaviors of customers

Sometimes the same product/service can be used in completely different ways by different types of customers. This may be due to socio-demographic, cultural or industry related aspects.

If you don't know or are not sure, answer "yes".

We are trying to understand if it is a "disruptive innovation", i.e. a product / service so innovative that it completely changes the dynamics of a market. This type of innovation is rare.

If you are not sure of your answer, answer "no".

  • cell phone vs. landline phone
  • autonomous cars vs. conventional cars
  • digital book vs. paper book

In face-to-face mode, qualitative interviews are conducted face-to-face, in the same room. Remote interviews are conducted by videoconference (Skype, teams, etc.) or by telephone.

Answer the 4 questions above to find out how many qualitative interviews you need to conduct as part of your market research.

We advise you to carry out - interviews.

This is a minimum number of interviews. This number was determined on the basis of scientific research taking into account the saturation principle .

If you want to know more about qualitative methodologies, please read our guide or contact us. We conduct qualitative and quantitative market research throughout Europe.

How to determine the size of a qualitative sample?

The ideal size of a qualitative sample is a hotly debated issue. For this reason, it isn’t easy to get a clear answer. Most experts hide behind the  saturation principle . This is based on the idea that  the number of qualitative interviews is not known in advance . It is only when the new interviews do not reveal anything new compared to the previous ones that it is reasonable to stop the fieldwork phase.

While this approach may be acceptable in an academic context, it is not applicable in the business world. Indeed, a qualitative project in a business requires defining a precise perimeter beforehand and committing to the number of qualitative interviews to be conducted.

chair blue orange

What does science say about qualitative sample sizes?

To design our calculator, we based ourselves on scientific results and academic articles.

Let’s start by reminding you that qualitative research does not seek statistical representativeness. Qualitative research aims to uncover a maximum of “themes” related to a subject. These themes will make it possible to formulate hypotheses that we can then verify with a quantitative phase.

Research that has focused on the number of qualitative interviews remains rare.

Dworkin (2012)  points out that most authors suggest sample sizes of 5 to 50. This leaves a lot of room for error and does not, in advance, propose a reasonable estimate. He also reminds us that in qualitative research of the “ grounded theory ” type, having 25 to 30 participants is a minimum to reach saturation.

To anticipate the number of qualitative interviews necessary, Dworkin, therefore, proposes to look at the factors that influence saturation on the one hand and the constitution of the qualitative sample on the other:

  • Is the study population homogeneous or heterogeneous?
  • What are the criteria for selecting participants?
  • What is the budget for the qualitative research?
  • What is the time frame for the qualitative research?
  • Are there any variables (key “stratifiers”) that play a decisive role in understanding the topic?
  • Is the researcher able to determine when saturation is reached?

Practical advice for determining your qualitative sample size

  • Identify the segments of your market: they may respond to different dynamics that would require you to provide sufficient respondents for each of these groups.
  • Identify key stratifiers through a literature review
  • Ask yourself honestly about the difficulty of recruiting respondents and your ability to conduct and analyze the results.
  • Analyze the results as you go along using a coding matrix to determine when saturation is reached.

Marshall et al. (2013)  analyzed the number of interviews conducted in qualitative research on information systems (IT)

The authors distinguish several designs of qualitative research: “grounded theory,” “single-use case,” “multiple use cases.” In market research, the “grounded theory” approach is hardly applicable because we are instead looking to validate a concrete application in a precise framework. Therefore, case studies (“use case”) are more representative of what is being done in market research . The 2nd quartile (see table below) shows us the average values found by Marshall et al. (2012):

  • 23 qualitative interviews for single case studies and 40 for multiple case studies
  • 24 interviewees for single case studies and 39 for multiple case studies
  • 28 hours of interviews for single case studies and 38.8 for multiple case studies

Marshall et al. 2012 qualitative interviews

Qualitative sample sizes by type of research (from Marshall et al. 2013)

Morse (2000) proposes different variables that influence the number of qualitative interviews to be conducted:

  • The scope of the research: is it a specific or broader problem?
  • The subject matter: will the interviews be easy or difficult to conduct? Is the topic concrete or abstract?
  • Will respondents have all the information?
  • Data quality: what is the intellectual level of the respondents? Do the respondents know the subject matter? How much time is available for the interview, and how well can the respondents concentrate?
  • Research design: Qualitative interviews applied to market research often fall into the “use case” category.

Mason (2010) number of qualitative interviews

Number of research studies by qualitative sample size (Morse, 2000)

Mason (2010)  studied 2533 doctoral dissertations using a qualitative approach and classified each qualitative sample according to the nature of the research. On average, the research was based on 31 qualitative interviews, with a median of 28. However, there were notable differences, with the “poorest” research having 1 interview. In contrast, one research study was based on 98 interviews.

The different types of qualitative approaches

The very nature of qualitative research helps determine sample size.

Ethnography and ethnoscience

For Morse (1994), 30 to 50 interviews are sufficient. Bernard (2000) notes that most research uses samples of 30 to 60 interviews.

Grounded Theory

Creswell (1998) recommends 20 to 30 qualitative interviews, while Morse (1994) recommends 30 to 50.

Phenomenology

For Creswell (1998), 5 to 25 interviews are ideal. Morse (1994) indicates that at least 6 interviews should be conducted.

Qualitative research in general

Bertaux (1981, p.35) suggests that the smallest acceptable qualitative sample size is 15 interviews.

Bernard, Harvey R. (2000). Social research methods. Thousand Oaks, CA: Sage.

Bertaux, D. (1981). From the life-history approach to the transformation of sociological practice. Biography and society: The life history approach in the social sciences, 29-45.

Creswell, John (1998). Qualitative inquiry and research design: Choosing among five traditions. Thousand Oaks, CA: Sage.

Dworkin, S. L. (2012). Sample size policy for qualitative studies using in-depth interviews.

Marshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does sample size matter in qualitative research?: A review of qualitative interviews in IS research. Journal of computer information systems, 54(1), 11-22.

Mason, M. (2010, August). Sample size and saturation in PhD studies using qualitative interviews. In Forum qualitative Sozialforschung/Forum: qualitative social research (Vol. 11, No. 3).

Morse, J. M. (1994). Designing funded qualitative research.

Find the right market research agencies, suppliers, platforms, and facilities by exploring the services and solutions that best match your needs

list of top MR Specialties

Browse all specialties

Browse Companies and Platforms

by Specialty

by Location

Browse Focus Group Facilities

how to determine number of respondents in qualitative research

Manage your listing

Follow a step-by-step guide with online chat support to create or manage your listing.

About Greenbook Directory

IIEX Conferences

Discover the future of insights at the Insight Innovation Exchange (IIEX) event closest to you

IIEX Virtual Events

Explore important trends, best practices, and innovative use cases without leaving your desk

Insights Tech Showcase

See the latest research tech in action during curated interactive demos from top vendors

Stay updated on what’s new in insights and learn about solutions to the challenges you face

Latest on Insights

Editor's Choice

Greenbook Podcast

The Exchange

Lenny Murphy and Karen Lynch debate current news - live on Linkedln and Youtube every Friday

See more on YouTube | LinkedIn

Greenbook Future list

An esteemed awards program that supports and encourages the voices of emerging leaders in the insight community.

Insight Innovation Competition

Submit your innovation that could impact the insights and market research industry for the better.

Find your next position in the world's largest database of market research and data analytics jobs.

how to determine number of respondents in qualitative research

For Suppliers

Directory: Renew your listing

Directory: Create a listing

Event sponsorship

Get Recommended Program

Digital Ads

Content marketing

Ads in Reports

Podcasts sponsorship

Run your Webinar

Host a Tech Showcase

Future List Partnership

All services

how to determine number of respondents in qualitative research

Dana Stanley

Greenbook’s Chief Revenue Officer

What is the ideal Sample Size in Qualitative Research?

Presented by InterQ Research LLC

If we were to assemble a list of “most asked questions” that we receive from new clients, it’s this:

What is the ideal sample size in qualitative research? It’s a great question. A fantastic one. Because panel size does matter, though perhaps not as much as it does in quantitative research, when we’re aiming for a statistically meaningful number. Let’s explore this whole issue of panel size and what you should be looking for from participant panels when conducing qualitative research.

First off, look at quality versus quantity

Most likely, your company is looking for market research on a very specific audience type. B2B decision makers in human resources. Moms who live in the Midwest and have household incomes of $70k +. Teens who use Facebook more than 8 hours a week. Specificity is great thing, and without fail, every client we work with has a good grasp on their audience type. In qualitative panels, therefore, our first objective is to ensure that we’re recruiting people who meet each and every criteria line-item that we identify through quantitative research  – and the criteria that our clients have pinpointed through their own research. Panel quality – having the right members in the panel – is so much more important than just pulling from a general population that falls within broad parameters. So first and foremost, we focus on recruiting the right respondents who match our audience specifications.

Study design in qualitative research

The type of qualitative study chosen is also one of the most important factors to consider when choosing sample size. In-depth interviews, focus groups, and ethnographic research are the most common methods used in qualitative market research, and the types of questions being studied have an equally important factor as the sample size chosen for these various methods. One of the most important principles to keep in mind – in all of these study designs – is the principle of saturation .

The objective of qualitative research (as compared to quantitative research) is to lessen discovery failure; in quantitative research, the objective is to reduce estimation error. Here’s where the principle of saturation comes in: With saturation, we say that the collection of new data isn’t giving the researcher any new additional insights into the issue being investigated. Qualitative seeks to uncover diverse opinions from the sample size, and one person’s opinion is enough to generate a code (part of the analysis framework). There is a point of diminishing return with larger samples; more data does not necessarily lead to more information – it simply leads to the same information being repeated (saturation). The goal, therefore, is to have a large enough sample size in a qualitative study that we’re able to uncover a range of opinions, but to cut the sample size off at the number where we’re getting saturation and repetitive data.

So … is there a magical number to aim for in qualitative research?

So now we’re back to our original question:

What is the ideal sample size in qualitative research?

We’ll answer it this time. Based on studies that have been done in academia  on this very issue, 30 seems to be an ideal sample size for the most comprehensive view, but studies can have as little as 10 total participants and still yield extremely fruitful, and applicable, results. (This goes back to excellence in recruiting.)

Our general recommendation for in-depth interviews is a sample size of 30, if we’re building a study that includes similar segments within the population. A minimum size can be 10 – but again, this assumes the population integrity in recruiting.

Presented by

InterQ Research LLC

San Francisco, California

SOCIAL MEDIA

Save to my lists

Featured expert

InterQ Research LLC

Full Service

Qualitative Research

Quantitative Research

Headquartered in Silicon Valley, InterQ delivers innovative market research for the tech industry, including qualitative, quantitative, and UX.

Why choose InterQ Research LLC

how to determine number of respondents in qualitative research

Tech industry specialist

B2B complex recruiting

Innovative methodologies

Big brand experience

Proven results

Learn more about InterQ Research LLC

Sign Up for Updates

Get content that matters, written by top insights industry experts, delivered right to your inbox.

how to determine number of respondents in qualitative research

67k+ subscribers

Weekly Newsletter

Event Updates

I agree to receive emails with insights-related content from Greenbook. I understand that I can manage my email preferences or unsubscribe at any time and that Greenbook protects my privacy under the General Data Protection Regulation.*

Get the latest updates from top market research, insights, and analytics experts delivered weekly to your inbox

Your guide for all things market research and consumer insights

Create a New Listing

Manage My Listing

Find Companies

Find Focus Group Facilities

Tech Showcases

GRIT Report

Expert Channels

Get in touch

Marketing Services

Future List

Publish With Us

Privacy policy

Cookie policy

Terms of use

Copyright © 2024 New York AMA Communication Services, Inc. All rights reserved. 234 5th Avenue, 2nd Floor, New York, NY 10001 | Phone: (212) 849-2752

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Neurol Res Pract

Logo of neurrp

How to use and assess qualitative research methods

Loraine busetto.

1 Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120 Heidelberg, Germany

Wolfgang Wick

2 Clinical Cooperation Unit Neuro-Oncology, German Cancer Research Center, Heidelberg, Germany

Christoph Gumbinger

Associated data.

Not applicable.

This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions, and focussing on intervention improvement. The most common methods of data collection are document study, (non-) participant observations, semi-structured interviews and focus groups. For data analysis, field-notes and audio-recordings are transcribed into protocols and transcripts, and coded using qualitative data management software. Criteria such as checklists, reflexivity, sampling strategies, piloting, co-coding, member-checking and stakeholder involvement can be used to enhance and assess the quality of the research conducted. Using qualitative in addition to quantitative designs will equip us with better tools to address a greater range of research problems, and to fill in blind spots in current neurological research and practice.

The aim of this paper is to provide an overview of qualitative research methods, including hands-on information on how they can be used, reported and assessed. This article is intended for beginning qualitative researchers in the health sciences as well as experienced quantitative researchers who wish to broaden their understanding of qualitative research.

What is qualitative research?

Qualitative research is defined as “the study of the nature of phenomena”, including “their quality, different manifestations, the context in which they appear or the perspectives from which they can be perceived” , but excluding “their range, frequency and place in an objectively determined chain of cause and effect” [ 1 ]. This formal definition can be complemented with a more pragmatic rule of thumb: qualitative research generally includes data in form of words rather than numbers [ 2 ].

Why conduct qualitative research?

Because some research questions cannot be answered using (only) quantitative methods. For example, one Australian study addressed the issue of why patients from Aboriginal communities often present late or not at all to specialist services offered by tertiary care hospitals. Using qualitative interviews with patients and staff, it found one of the most significant access barriers to be transportation problems, including some towns and communities simply not having a bus service to the hospital [ 3 ]. A quantitative study could have measured the number of patients over time or even looked at possible explanatory factors – but only those previously known or suspected to be of relevance. To discover reasons for observed patterns, especially the invisible or surprising ones, qualitative designs are needed.

While qualitative research is common in other fields, it is still relatively underrepresented in health services research. The latter field is more traditionally rooted in the evidence-based-medicine paradigm, as seen in " research that involves testing the effectiveness of various strategies to achieve changes in clinical practice, preferably applying randomised controlled trial study designs (...) " [ 4 ]. This focus on quantitative research and specifically randomised controlled trials (RCT) is visible in the idea of a hierarchy of research evidence which assumes that some research designs are objectively better than others, and that choosing a "lesser" design is only acceptable when the better ones are not practically or ethically feasible [ 5 , 6 ]. Others, however, argue that an objective hierarchy does not exist, and that, instead, the research design and methods should be chosen to fit the specific research question at hand – "questions before methods" [ 2 , 7 – 9 ]. This means that even when an RCT is possible, some research problems require a different design that is better suited to addressing them. Arguing in JAMA, Berwick uses the example of rapid response teams in hospitals, which he describes as " a complex, multicomponent intervention – essentially a process of social change" susceptible to a range of different context factors including leadership or organisation history. According to him, "[in] such complex terrain, the RCT is an impoverished way to learn. Critics who use it as a truth standard in this context are incorrect" [ 8 ] . Instead of limiting oneself to RCTs, Berwick recommends embracing a wider range of methods , including qualitative ones, which for "these specific applications, (...) are not compromises in learning how to improve; they are superior" [ 8 ].

Research problems that can be approached particularly well using qualitative methods include assessing complex multi-component interventions or systems (of change), addressing questions beyond “what works”, towards “what works for whom when, how and why”, and focussing on intervention improvement rather than accreditation [ 7 , 9 – 12 ]. Using qualitative methods can also help shed light on the “softer” side of medical treatment. For example, while quantitative trials can measure the costs and benefits of neuro-oncological treatment in terms of survival rates or adverse effects, qualitative research can help provide a better understanding of patient or caregiver stress, visibility of illness or out-of-pocket expenses.

How to conduct qualitative research?

Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [ 13 , 14 ]. As Fossey puts it : “sampling, data collection, analysis and interpretation are related to each other in a cyclical (iterative) manner, rather than following one after another in a stepwise approach” [ 15 ]. The researcher can make educated decisions with regard to the choice of method, how they are implemented, and to which and how many units they are applied [ 13 ]. As shown in Fig.  1 , this can involve several back-and-forth steps between data collection and analysis where new insights and experiences can lead to adaption and expansion of the original plan. Some insights may also necessitate a revision of the research question and/or the research design as a whole. The process ends when saturation is achieved, i.e. when no relevant new information can be found (see also below: sampling and saturation). For reasons of transparency, it is essential for all decisions as well as the underlying reasoning to be well-documented.

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig1_HTML.jpg

Iterative research process

While it is not always explicitly addressed, qualitative methods reflect a different underlying research paradigm than quantitative research (e.g. constructivism or interpretivism as opposed to positivism). The choice of methods can be based on the respective underlying substantive theory or theoretical framework used by the researcher [ 2 ].

Data collection

The methods of qualitative data collection most commonly used in health research are document study, observations, semi-structured interviews and focus groups [ 1 , 14 , 16 , 17 ].

Document study

Document study (also called document analysis) refers to the review by the researcher of written materials [ 14 ]. These can include personal and non-personal documents such as archives, annual reports, guidelines, policy documents, diaries or letters.

Observations

Observations are particularly useful to gain insights into a certain setting and actual behaviour – as opposed to reported behaviour or opinions [ 13 ]. Qualitative observations can be either participant or non-participant in nature. In participant observations, the observer is part of the observed setting, for example a nurse working in an intensive care unit [ 18 ]. In non-participant observations, the observer is “on the outside looking in”, i.e. present in but not part of the situation, trying not to influence the setting by their presence. Observations can be planned (e.g. for 3 h during the day or night shift) or ad hoc (e.g. as soon as a stroke patient arrives at the emergency room). During the observation, the observer takes notes on everything or certain pre-determined parts of what is happening around them, for example focusing on physician-patient interactions or communication between different professional groups. Written notes can be taken during or after the observations, depending on feasibility (which is usually lower during participant observations) and acceptability (e.g. when the observer is perceived to be judging the observed). Afterwards, these field notes are transcribed into observation protocols. If more than one observer was involved, field notes are taken independently, but notes can be consolidated into one protocol after discussions. Advantages of conducting observations include minimising the distance between the researcher and the researched, the potential discovery of topics that the researcher did not realise were relevant and gaining deeper insights into the real-world dimensions of the research problem at hand [ 18 ].

Semi-structured interviews

Hijmans & Kuyper describe qualitative interviews as “an exchange with an informal character, a conversation with a goal” [ 19 ]. Interviews are used to gain insights into a person’s subjective experiences, opinions and motivations – as opposed to facts or behaviours [ 13 ]. Interviews can be distinguished by the degree to which they are structured (i.e. a questionnaire), open (e.g. free conversation or autobiographical interviews) or semi-structured [ 2 , 13 ]. Semi-structured interviews are characterized by open-ended questions and the use of an interview guide (or topic guide/list) in which the broad areas of interest, sometimes including sub-questions, are defined [ 19 ]. The pre-defined topics in the interview guide can be derived from the literature, previous research or a preliminary method of data collection, e.g. document study or observations. The topic list is usually adapted and improved at the start of the data collection process as the interviewer learns more about the field [ 20 ]. Across interviews the focus on the different (blocks of) questions may differ and some questions may be skipped altogether (e.g. if the interviewee is not able or willing to answer the questions or for concerns about the total length of the interview) [ 20 ]. Qualitative interviews are usually not conducted in written format as it impedes on the interactive component of the method [ 20 ]. In comparison to written surveys, qualitative interviews have the advantage of being interactive and allowing for unexpected topics to emerge and to be taken up by the researcher. This can also help overcome a provider or researcher-centred bias often found in written surveys, which by nature, can only measure what is already known or expected to be of relevance to the researcher. Interviews can be audio- or video-taped; but sometimes it is only feasible or acceptable for the interviewer to take written notes [ 14 , 16 , 20 ].

Focus groups

Focus groups are group interviews to explore participants’ expertise and experiences, including explorations of how and why people behave in certain ways [ 1 ]. Focus groups usually consist of 6–8 people and are led by an experienced moderator following a topic guide or “script” [ 21 ]. They can involve an observer who takes note of the non-verbal aspects of the situation, possibly using an observation guide [ 21 ]. Depending on researchers’ and participants’ preferences, the discussions can be audio- or video-taped and transcribed afterwards [ 21 ]. Focus groups are useful for bringing together homogeneous (to a lesser extent heterogeneous) groups of participants with relevant expertise and experience on a given topic on which they can share detailed information [ 21 ]. Focus groups are a relatively easy, fast and inexpensive method to gain access to information on interactions in a given group, i.e. “the sharing and comparing” among participants [ 21 ]. Disadvantages include less control over the process and a lesser extent to which each individual may participate. Moreover, focus group moderators need experience, as do those tasked with the analysis of the resulting data. Focus groups can be less appropriate for discussing sensitive topics that participants might be reluctant to disclose in a group setting [ 13 ]. Moreover, attention must be paid to the emergence of “groupthink” as well as possible power dynamics within the group, e.g. when patients are awed or intimidated by health professionals.

Choosing the “right” method

As explained above, the school of thought underlying qualitative research assumes no objective hierarchy of evidence and methods. This means that each choice of single or combined methods has to be based on the research question that needs to be answered and a critical assessment with regard to whether or to what extent the chosen method can accomplish this – i.e. the “fit” between question and method [ 14 ]. It is necessary for these decisions to be documented when they are being made, and to be critically discussed when reporting methods and results.

Let us assume that our research aim is to examine the (clinical) processes around acute endovascular treatment (EVT), from the patient’s arrival at the emergency room to recanalization, with the aim to identify possible causes for delay and/or other causes for sub-optimal treatment outcome. As a first step, we could conduct a document study of the relevant standard operating procedures (SOPs) for this phase of care – are they up-to-date and in line with current guidelines? Do they contain any mistakes, irregularities or uncertainties that could cause delays or other problems? Regardless of the answers to these questions, the results have to be interpreted based on what they are: a written outline of what care processes in this hospital should look like. If we want to know what they actually look like in practice, we can conduct observations of the processes described in the SOPs. These results can (and should) be analysed in themselves, but also in comparison to the results of the document analysis, especially as regards relevant discrepancies. Do the SOPs outline specific tests for which no equipment can be observed or tasks to be performed by specialized nurses who are not present during the observation? It might also be possible that the written SOP is outdated, but the actual care provided is in line with current best practice. In order to find out why these discrepancies exist, it can be useful to conduct interviews. Are the physicians simply not aware of the SOPs (because their existence is limited to the hospital’s intranet) or do they actively disagree with them or does the infrastructure make it impossible to provide the care as described? Another rationale for adding interviews is that some situations (or all of their possible variations for different patient groups or the day, night or weekend shift) cannot practically or ethically be observed. In this case, it is possible to ask those involved to report on their actions – being aware that this is not the same as the actual observation. A senior physician’s or hospital manager’s description of certain situations might differ from a nurse’s or junior physician’s one, maybe because they intentionally misrepresent facts or maybe because different aspects of the process are visible or important to them. In some cases, it can also be relevant to consider to whom the interviewee is disclosing this information – someone they trust, someone they are otherwise not connected to, or someone they suspect or are aware of being in a potentially “dangerous” power relationship to them. Lastly, a focus group could be conducted with representatives of the relevant professional groups to explore how and why exactly they provide care around EVT. The discussion might reveal discrepancies (between SOPs and actual care or between different physicians) and motivations to the researchers as well as to the focus group members that they might not have been aware of themselves. For the focus group to deliver relevant information, attention has to be paid to its composition and conduct, for example, to make sure that all participants feel safe to disclose sensitive or potentially problematic information or that the discussion is not dominated by (senior) physicians only. The resulting combination of data collection methods is shown in Fig.  2 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig2_HTML.jpg

Possible combination of data collection methods

Attributions for icons: “Book” by Serhii Smirnov, “Interview” by Adrien Coquet, FR, “Magnifying Glass” by anggun, ID, “Business communication” by Vectors Market; all from the Noun Project

The combination of multiple data source as described for this example can be referred to as “triangulation”, in which multiple measurements are carried out from different angles to achieve a more comprehensive understanding of the phenomenon under study [ 22 , 23 ].

Data analysis

To analyse the data collected through observations, interviews and focus groups these need to be transcribed into protocols and transcripts (see Fig.  3 ). Interviews and focus groups can be transcribed verbatim , with or without annotations for behaviour (e.g. laughing, crying, pausing) and with or without phonetic transcription of dialects and filler words, depending on what is expected or known to be relevant for the analysis. In the next step, the protocols and transcripts are coded , that is, marked (or tagged, labelled) with one or more short descriptors of the content of a sentence or paragraph [ 2 , 15 , 23 ]. Jansen describes coding as “connecting the raw data with “theoretical” terms” [ 20 ]. In a more practical sense, coding makes raw data sortable. This makes it possible to extract and examine all segments describing, say, a tele-neurology consultation from multiple data sources (e.g. SOPs, emergency room observations, staff and patient interview). In a process of synthesis and abstraction, the codes are then grouped, summarised and/or categorised [ 15 , 20 ]. The end product of the coding or analysis process is a descriptive theory of the behavioural pattern under investigation [ 20 ]. The coding process is performed using qualitative data management software, the most common ones being InVivo, MaxQDA and Atlas.ti. It should be noted that these are data management tools which support the analysis performed by the researcher(s) [ 14 ].

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig3_HTML.jpg

From data collection to data analysis

Attributions for icons: see Fig. ​ Fig.2, 2 , also “Speech to text” by Trevor Dsouza, “Field Notes” by Mike O’Brien, US, “Voice Record” by ProSymbols, US, “Inspection” by Made, AU, and “Cloud” by Graphic Tigers; all from the Noun Project

How to report qualitative research?

Protocols of qualitative research can be published separately and in advance of the study results. However, the aim is not the same as in RCT protocols, i.e. to pre-define and set in stone the research questions and primary or secondary endpoints. Rather, it is a way to describe the research methods in detail, which might not be possible in the results paper given journals’ word limits. Qualitative research papers are usually longer than their quantitative counterparts to allow for deep understanding and so-called “thick description”. In the methods section, the focus is on transparency of the methods used, including why, how and by whom they were implemented in the specific study setting, so as to enable a discussion of whether and how this may have influenced data collection, analysis and interpretation. The results section usually starts with a paragraph outlining the main findings, followed by more detailed descriptions of, for example, the commonalities, discrepancies or exceptions per category [ 20 ]. Here it is important to support main findings by relevant quotations, which may add information, context, emphasis or real-life examples [ 20 , 23 ]. It is subject to debate in the field whether it is relevant to state the exact number or percentage of respondents supporting a certain statement (e.g. “Five interviewees expressed negative feelings towards XYZ”) [ 21 ].

How to combine qualitative with quantitative research?

Qualitative methods can be combined with other methods in multi- or mixed methods designs, which “[employ] two or more different methods [ …] within the same study or research program rather than confining the research to one single method” [ 24 ]. Reasons for combining methods can be diverse, including triangulation for corroboration of findings, complementarity for illustration and clarification of results, expansion to extend the breadth and range of the study, explanation of (unexpected) results generated with one method with the help of another, or offsetting the weakness of one method with the strength of another [ 1 , 17 , 24 – 26 ]. The resulting designs can be classified according to when, why and how the different quantitative and/or qualitative data strands are combined. The three most common types of mixed method designs are the convergent parallel design , the explanatory sequential design and the exploratory sequential design. The designs with examples are shown in Fig.  4 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig4_HTML.jpg

Three common mixed methods designs

In the convergent parallel design, a qualitative study is conducted in parallel to and independently of a quantitative study, and the results of both studies are compared and combined at the stage of interpretation of results. Using the above example of EVT provision, this could entail setting up a quantitative EVT registry to measure process times and patient outcomes in parallel to conducting the qualitative research outlined above, and then comparing results. Amongst other things, this would make it possible to assess whether interview respondents’ subjective impressions of patients receiving good care match modified Rankin Scores at follow-up, or whether observed delays in care provision are exceptions or the rule when compared to door-to-needle times as documented in the registry. In the explanatory sequential design, a quantitative study is carried out first, followed by a qualitative study to help explain the results from the quantitative study. This would be an appropriate design if the registry alone had revealed relevant delays in door-to-needle times and the qualitative study would be used to understand where and why these occurred, and how they could be improved. In the exploratory design, the qualitative study is carried out first and its results help informing and building the quantitative study in the next step [ 26 ]. If the qualitative study around EVT provision had shown a high level of dissatisfaction among the staff members involved, a quantitative questionnaire investigating staff satisfaction could be set up in the next step, informed by the qualitative study on which topics dissatisfaction had been expressed. Amongst other things, the questionnaire design would make it possible to widen the reach of the research to more respondents from different (types of) hospitals, regions, countries or settings, and to conduct sub-group analyses for different professional groups.

How to assess qualitative research?

A variety of assessment criteria and lists have been developed for qualitative research, ranging in their focus and comprehensiveness [ 14 , 17 , 27 ]. However, none of these has been elevated to the “gold standard” in the field. In the following, we therefore focus on a set of commonly used assessment criteria that, from a practical standpoint, a researcher can look for when assessing a qualitative research report or paper.

Assessors should check the authors’ use of and adherence to the relevant reporting checklists (e.g. Standards for Reporting Qualitative Research (SRQR)) to make sure all items that are relevant for this type of research are addressed [ 23 , 28 ]. Discussions of quantitative measures in addition to or instead of these qualitative measures can be a sign of lower quality of the research (paper). Providing and adhering to a checklist for qualitative research contributes to an important quality criterion for qualitative research, namely transparency [ 15 , 17 , 23 ].

Reflexivity

While methodological transparency and complete reporting is relevant for all types of research, some additional criteria must be taken into account for qualitative research. This includes what is called reflexivity, i.e. sensitivity to the relationship between the researcher and the researched, including how contact was established and maintained, or the background and experience of the researcher(s) involved in data collection and analysis. Depending on the research question and population to be researched this can be limited to professional experience, but it may also include gender, age or ethnicity [ 17 , 27 ]. These details are relevant because in qualitative research, as opposed to quantitative research, the researcher as a person cannot be isolated from the research process [ 23 ]. It may influence the conversation when an interviewed patient speaks to an interviewer who is a physician, or when an interviewee is asked to discuss a gynaecological procedure with a male interviewer, and therefore the reader must be made aware of these details [ 19 ].

Sampling and saturation

The aim of qualitative sampling is for all variants of the objects of observation that are deemed relevant for the study to be present in the sample “ to see the issue and its meanings from as many angles as possible” [ 1 , 16 , 19 , 20 , 27 ] , and to ensure “information-richness [ 15 ]. An iterative sampling approach is advised, in which data collection (e.g. five interviews) is followed by data analysis, followed by more data collection to find variants that are lacking in the current sample. This process continues until no new (relevant) information can be found and further sampling becomes redundant – which is called saturation [ 1 , 15 ] . In other words: qualitative data collection finds its end point not a priori , but when the research team determines that saturation has been reached [ 29 , 30 ].

This is also the reason why most qualitative studies use deliberate instead of random sampling strategies. This is generally referred to as “ purposive sampling” , in which researchers pre-define which types of participants or cases they need to include so as to cover all variations that are expected to be of relevance, based on the literature, previous experience or theory (i.e. theoretical sampling) [ 14 , 20 ]. Other types of purposive sampling include (but are not limited to) maximum variation sampling, critical case sampling or extreme or deviant case sampling [ 2 ]. In the above EVT example, a purposive sample could include all relevant professional groups and/or all relevant stakeholders (patients, relatives) and/or all relevant times of observation (day, night and weekend shift).

Assessors of qualitative research should check whether the considerations underlying the sampling strategy were sound and whether or how researchers tried to adapt and improve their strategies in stepwise or cyclical approaches between data collection and analysis to achieve saturation [ 14 ].

Good qualitative research is iterative in nature, i.e. it goes back and forth between data collection and analysis, revising and improving the approach where necessary. One example of this are pilot interviews, where different aspects of the interview (especially the interview guide, but also, for example, the site of the interview or whether the interview can be audio-recorded) are tested with a small number of respondents, evaluated and revised [ 19 ]. In doing so, the interviewer learns which wording or types of questions work best, or which is the best length of an interview with patients who have trouble concentrating for an extended time. Of course, the same reasoning applies to observations or focus groups which can also be piloted.

Ideally, coding should be performed by at least two researchers, especially at the beginning of the coding process when a common approach must be defined, including the establishment of a useful coding list (or tree), and when a common meaning of individual codes must be established [ 23 ]. An initial sub-set or all transcripts can be coded independently by the coders and then compared and consolidated after regular discussions in the research team. This is to make sure that codes are applied consistently to the research data.

Member checking

Member checking, also called respondent validation , refers to the practice of checking back with study respondents to see if the research is in line with their views [ 14 , 27 ]. This can happen after data collection or analysis or when first results are available [ 23 ]. For example, interviewees can be provided with (summaries of) their transcripts and asked whether they believe this to be a complete representation of their views or whether they would like to clarify or elaborate on their responses [ 17 ]. Respondents’ feedback on these issues then becomes part of the data collection and analysis [ 27 ].

Stakeholder involvement

In those niches where qualitative approaches have been able to evolve and grow, a new trend has seen the inclusion of patients and their representatives not only as study participants (i.e. “members”, see above) but as consultants to and active participants in the broader research process [ 31 – 33 ]. The underlying assumption is that patients and other stakeholders hold unique perspectives and experiences that add value beyond their own single story, making the research more relevant and beneficial to researchers, study participants and (future) patients alike [ 34 , 35 ]. Using the example of patients on or nearing dialysis, a recent scoping review found that 80% of clinical research did not address the top 10 research priorities identified by patients and caregivers [ 32 , 36 ]. In this sense, the involvement of the relevant stakeholders, especially patients and relatives, is increasingly being seen as a quality indicator in and of itself.

How not to assess qualitative research

The above overview does not include certain items that are routine in assessments of quantitative research. What follows is a non-exhaustive, non-representative, experience-based list of the quantitative criteria often applied to the assessment of qualitative research, as well as an explanation of the limited usefulness of these endeavours.

Protocol adherence

Given the openness and flexibility of qualitative research, it should not be assessed by how well it adheres to pre-determined and fixed strategies – in other words: its rigidity. Instead, the assessor should look for signs of adaptation and refinement based on lessons learned from earlier steps in the research process.

Sample size

For the reasons explained above, qualitative research does not require specific sample sizes, nor does it require that the sample size be determined a priori [ 1 , 14 , 27 , 37 – 39 ]. Sample size can only be a useful quality indicator when related to the research purpose, the chosen methodology and the composition of the sample, i.e. who was included and why.

Randomisation

While some authors argue that randomisation can be used in qualitative research, this is not commonly the case, as neither its feasibility nor its necessity or usefulness has been convincingly established for qualitative research [ 13 , 27 ]. Relevant disadvantages include the negative impact of a too large sample size as well as the possibility (or probability) of selecting “ quiet, uncooperative or inarticulate individuals ” [ 17 ]. Qualitative studies do not use control groups, either.

Interrater reliability, variability and other “objectivity checks”

The concept of “interrater reliability” is sometimes used in qualitative research to assess to which extent the coding approach overlaps between the two co-coders. However, it is not clear what this measure tells us about the quality of the analysis [ 23 ]. This means that these scores can be included in qualitative research reports, preferably with some additional information on what the score means for the analysis, but it is not a requirement. Relatedly, it is not relevant for the quality or “objectivity” of qualitative research to separate those who recruited the study participants and collected and analysed the data. Experiences even show that it might be better to have the same person or team perform all of these tasks [ 20 ]. First, when researchers introduce themselves during recruitment this can enhance trust when the interview takes place days or weeks later with the same researcher. Second, when the audio-recording is transcribed for analysis, the researcher conducting the interviews will usually remember the interviewee and the specific interview situation during data analysis. This might be helpful in providing additional context information for interpretation of data, e.g. on whether something might have been meant as a joke [ 18 ].

Not being quantitative research

Being qualitative research instead of quantitative research should not be used as an assessment criterion if it is used irrespectively of the research problem at hand. Similarly, qualitative research should not be required to be combined with quantitative research per se – unless mixed methods research is judged as inherently better than single-method research. In this case, the same criterion should be applied for quantitative studies without a qualitative component.

The main take-away points of this paper are summarised in Table ​ Table1. 1 . We aimed to show that, if conducted well, qualitative research can answer specific research questions that cannot to be adequately answered using (only) quantitative designs. Seeing qualitative and quantitative methods as equal will help us become more aware and critical of the “fit” between the research problem and our chosen methods: I can conduct an RCT to determine the reasons for transportation delays of acute stroke patients – but should I? It also provides us with a greater range of tools to tackle a greater range of research problems more appropriately and successfully, filling in the blind spots on one half of the methodological spectrum to better address the whole complexity of neurological research and practice.

Take-away-points

• Assessing complex multi-component interventions or systems (of change)

• What works for whom when, how and why?

• Focussing on intervention improvement

• Document study

• Observations (participant or non-participant)

• Interviews (especially semi-structured)

• Focus groups

• Transcription of audio-recordings and field notes into transcripts and protocols

• Coding of protocols

• Using qualitative data management software

• Combinations of quantitative and/or qualitative methods, e.g.:

• : quali and quanti in parallel

• : quanti followed by quali

• : quali followed by quanti

• Checklists

• Reflexivity

• Sampling strategies

• Piloting

• Co-coding

• Member checking

• Stakeholder involvement

• Protocol adherence

• Sample size

• Randomization

• Interrater reliability, variability and other “objectivity checks”

• Not being quantitative research

Acknowledgements

Abbreviations.

EVTEndovascular treatment
RCTRandomised Controlled Trial
SOPStandard Operating Procedure
SRQRStandards for Reporting Qualitative Research

Authors’ contributions

LB drafted the manuscript; WW and CG revised the manuscript; all authors approved the final versions.

no external funding.

Availability of data and materials

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Sign Up Now
  • -- Navigate To -- CR Dashboard Connect for Researchers Connect for Participants
  • Log In Log Out Log In
  • Recent Press
  • Papers Citing Connect
  • Connect for Participants
  • Connect for Researchers
  • Connect AI Training
  • Managed Research
  • Prime Panels
  • MTurk Toolkit
  • Health & Medicine
  • Enterprise Accounts
  • Conferences
  • Knowledge Base
  • A Researcher’s Guide To Statistical Significance And Sample Size Calculations

Determining Sample Size: How Many Survey Participants Do You Need?

CloudResearch_Pillar-Page_Statistical-Significance_Sample-Size-How-Many-Participants-Do-I-need-for-a-Survey-to-be-Va

Quick Navigation:

How to calculate a statistically significant sample size in research, determining sample size for probability-based surveys and polling studies, determining sample size for controlled surveys, determining sample size for experiments, how to calculate sample size for simple experiments, an example sample size calculation for an a/b test, what if i don’t know what size difference to expect, part iii: sample size: how many participants do i need for a survey to be valid.

In the U.S., there is a Presidential election every four years. In election years, there is a steady stream of polls in the months leading up to the election announcing which candidates are up and which are down in the horse race of popular opinion.

If you have ever wondered what makes these polls accurate and how each poll decides how many voters to talk to, then you have thought like a researcher who seeks to know how many participants they need in order to obtain statistically significant survey results.

Statistically significant results are those in which the researchers have confidence their findings are not due to chance . Obtaining statistically significant results depends on the researchers’ sample size (how many people they gather data from) and the overall size of the population they wish to understand (voters in the U.S., for example).

Calculating sample sizes can be difficult even for expert researchers. Here, we show you how to calculate sample size for a variety of different research designs.

Before jumping into the details, it is worth noting that formal sample size calculations are often based on the premise that researchers are conducting a representative survey with probability-based sampling techniques. Probability-based sampling ensures that every member of the population being studied has an equal chance of participating in the study and respondents are selected at random.

For a variety of reasons, probability sampling is not feasible for most behavioral studies conducted in industry and academia . As a result, we outline the steps required to calculate sample sizes for probability-based surveys and then extend our discussion to calculating sample sizes for non-probability surveys (i.e., controlled samples) and experiments.

Determining how many people you need to sample in a survey study can be difficult. How difficult? Look at this formula for sample size.

how to determine number of respondents in qualitative research

No one wants to work through something like that just to know how many people they should sample. Fortunately, there are several sample size calculators online that simplify knowing how many people to collect data from.

Even if you use a sample size calculator, however, you still need to know some important details about your study. Specifically, you need to know:

  • What is the population size in my research?

Population size is the total number of people in the group you are trying to study. If, for example, you were conducting a poll asking U.S. voters about Presidential candidates, then your population of interest would be everyone living in the U.S.—about 330 million people.

Determining the size of the population you’re interested in will often require some background research. For instance, if your company sells digital marketing services and you’re interested in surveying potential customers, it isn’t easy to determine the size of your population. Everyone who is currently engaged in digital marketing may be a potential customer. In situations like these, you can often use industry data or other information to arrive at a reasonable estimate for your population size.

  • What margin of error should you use?

Margin of error is a percentage that tells you how much the results from your sample may deviate from the views of the overall population. The smaller your margin of error, the closer your data reflect the opinion of the population at a given confidence level.

Generally speaking, the more people you gather data from the smaller your margin of error. However, because it is almost never feasible to collect data from everyone in the population, some margin of error is necessary in most studies.

  • What is your survey’s significance level?

The significance level  is a percentage that tells you how confident you can be that the true population value lies within your margin of error. So, for example, if you are asking people whether they support a candidate for President, the significance level tells you how likely it is that the level of support for the candidate in the population (i.e., people not in your sample) falls within the margin of error found in your sample.

Common significance levels in survey research are 90%, 95%, and 99%.

Once you know the values above, you can plug them into a sample size formula or more conveniently an online calculator to determine your sample size.

The table below displays the necessary sample size for different sized populations and margin of errors. As you can see, even when a population is large, researchers can often understand the entire group with about 1,000 respondents.

  • How Many People Should I Invite to My Study?

Sample size calculations tell you how many people you need to complete your survey. What they do not tell you, however, is how many people you need to invite to your survey. To find that number, you need to consider the response rate.

For example, if you are conducting a study of customer satisfaction and you know from previous experience that only about 30% of the people you contact will actually respond to your survey, then you can determine how many people you should invite to the survey to wind up with your desired sample size.

All you have to do is take the number of respondents you need, divide by your expected response rate, and multiple by 100. For example, if you need 500 customers to respond to your survey and you know the response rate is 30%, you should invite about 1,666 people to your study (500/30*100 = 1,666).

Sample size formulas are based on probability sampling techniques—methods that randomly select people from the population to participate in a survey. For most market surveys and academic studies, however, researchers do not use probability sampling methods. Instead they use a mix of convenience and purposive sampling methods that we refer to as controlled sampling .

When surveys and descriptive studies are based on controlled sampling methods, how should researchers calculate sample size?

When the study’s aim is to measure the frequency of something or to describe people’s behavior, we recommend following the calculations made for probability sampling. This often translates to a sample of about 1,000 to 2,000 people. When a study’s aim is to investigate a correlational relationship, however, we recommend sampling between 500 and 1,000 people. More participants in a study will always be better, but these numbers are a useful rule of thumb for researchers seeking to find out how many participants they need to sample.

If you look online, you will find many sources with information for calculating sample size when conducting a survey, but fewer resources for calculating sample size when conducting an experiment. Experiments involve randomly assigning people to different conditions and manipulating variables in order to determine a cause-and-effect relationship. The reason why sample size calculators for experiments are hard to find is simple: experiments are complex and sample size calculations depend on several factors.

The guidance we offer here is to help researchers calculate sample size for some of the simplest and most common experimental designs: t -tests, A/B tests, and chi square tests.

Many businesses today rely on A/B tests. Especially in the digital environment, A/B tests provide an efficient way to learn what kinds of features, messages, and displays cause people to spend more time or money on a website or an app.

For example, one common use of A/B testing is marketing emails. A marketing manager might create two versions of an email, randomly send one to half the company’s customers and randomly send the second to the other half of customers and then measure which email generates more sales.

In many cases , researchers may know they want to conduct an A/B test but be unsure how many people they need in their sample to obtain statistically significant results. In order to begin a sample size calculation, you need to know three things.

1. The significance level .

The significance level represents how sure you want to be that your results are not due to chance. A significance level of .05 is a good starting point, but you may adjust this number up or down depending on the aim of your study.

2. Your desired power.

Statistical tests are only useful when they have enough power to detect an effect if one actually exists. Most researchers aim for 80% power—meaning their tests are sensitive enough to detect an effect 8 out of 10 times if one exists.

3. The minimum effect size you are interested in.

The final piece of information you need is the minimum effect size, or difference between groups, you are interested in. Sometimes there may be a difference between groups, but if the difference is so small that it makes little practical difference to your business, it probably isn’t worth investigating. Determining the minimum effect size you are interested in requires some thought about your goals and the potential impact on your business. 

Once you have decided on the factors above, you can use a sample size calculator to determine how many people you need in each of your study’s conditions.

Let’s say a marketing team wants to test two different email campaigns. They set their significance level at .05 and their power at 80%. In addition, the team determines that the minimum response rate difference between groups that they are interested in is 7.5%. Plugging these numbers into an effect size calculator reveals that the team needs 693 people in each condition of their study, for a total of 1,386.

Sending an email out to 1,386 people who are already on your contact list doesn’t cost too much. But for many other studies, each respondent you recruit will cost money. For this reason, it is important to strongly consider what the minimum effect size of interest is when planning a study.    

When you don’t know what size difference to expect among groups, you can default to one of a few rules of thumb. First, use the effect size of minimum practical significance. By deciding what the minimum difference is between groups that would be meaningful, you can avoid spending resources investigating things that are likely to have little consequences for your business.

A second rule of thumb that is particularly relevant for researchers in academia is to assume an effect size of d = .4. A d = .4 is considered by some to be the smallest effect size that begins to have practical relevance . And fortunately, with this effect size and just two conditions, researchers need about 100 people per condition.

After you know how many people to recruit for your study, the next step is finding your participants. By using CloudResearch’s Prime Panels or MTurk Toolkit, you can gain access to more than 50 million people worldwide in addition to user-friendly tools designed to make running your study easy. We can help you find your sample regardless of what your study entails. Need people from a narrow demographic group? Looking to collect data from thousands of people? Do you need people who are willing to engage in a long or complicated study? Our team has the knowledge and expertise to match you with the right group of participants for your study. Get in touch with us today and learn what we can do for you.

Continue Reading: A Researcher’s Guide to Statistical Significance and Sample Size Calculations

how to determine number of respondents in qualitative research

Part 1: What Does It Mean for Research to Be Statistically Significant?

how to determine number of respondents in qualitative research

Part 2: How to Calculate Statistical Significance

Related articles, what is data quality and why is it important.

If you were a researcher studying human behavior 30 years ago, your options for identifying participants for your studies were limited. If you worked at a university, you might be...

How to Identify and Handle Invalid Responses to Online Surveys

As a researcher, you are aware that planning studies, designing materials and collecting data each take a lot of work. So when you get your hands on a new dataset,...

SUBSCRIBE TO RECEIVE UPDATES

how to determine number of respondents in qualitative research

2024 Grant Application Form

Personal and institutional information.

  • Full Name * First Last
  • Position/Title *
  • Affiliated Academic Institution or Research Organization *

Detailed Research Proposal Questions

  • Project Title *
  • Research Category * - Antisemitism Islamophobia Both
  • Objectives *
  • Methodology (including who the targeted participants are) *
  • Expected Outcomes *
  • Significance of the Study *

Budget and Grant Tier Request

  • Requested Grant Tier * - $200 $500 $1000 Applicants requesting larger grants may still be eligible for smaller awards if the full amount requested is not granted.
  • Budget Justification *

Research Timeline

  • Projected Start Date * MM slash DD slash YYYY Preference will be given to projects that can commence soon, preferably before September 2024.
  • Estimated Completion Date * MM slash DD slash YYYY Preference will be given to projects that aim to complete within a year.
  • Project Timeline *
  • Phone This field is for validation purposes and should be left unchanged.

  • Name * First Name Last Name
  • I would like to request a demo of the Sentry platform
  • Email This field is for validation purposes and should be left unchanged.
  • Name * First name Last name
  • Comments This field is for validation purposes and should be left unchanged.

  • Name * First Last
  • Name This field is for validation purposes and should be left unchanged.
  • Name * First and Last
  • Please select the best time to discuss your project goals/details to claim your free Sentry pilot for the next 60 days or to receive 10% off your first Managed Research study with Sentry.

  • Email * Enter Email Confirm Email
  • Organization
  • Job Title *

IMAGES

  1. Categories of respondents in the qualitative research.

    how to determine number of respondents in qualitative research

  2. Respondents for qualitative data.

    how to determine number of respondents in qualitative research

  3. showing number of respondents and details per method.

    how to determine number of respondents in qualitative research

  4. The number of respondents analyzed in this study amounted to 100 people

    how to determine number of respondents in qualitative research

  5. 😍 What is respondents in research. Respondents Of The Research And

    how to determine number of respondents in qualitative research

  6. Showing the number of respondents by category.

    how to determine number of respondents in qualitative research

VIDEO

  1. Definitions of Survey Sampling & Sampling Distribution ||Chapter#14 ||Part#2

  2. Number system |qualitative aptitude |part-2 in tamil

  3. 🔴 How to select Participants: Quantitative Research

  4. Determine whether the data described below are qualitative or quantitative and explain why

  5. 3.7 How Many Cases Are Enough

  6. Patient and public involvement and qualitative research methods

COMMENTS

  1. Sample size: how many participants do I need in my research?

    It is the ability of the test to detect a difference in the sample, when it exists in the target population. Calculated as 1-Beta. The greater the power, the larger the required sample size will be. A value between 80%-90% is usually used. Relationship between non-exposed/exposed groups in the sample.

  2. Big enough? Sampling in qualitative inquiry

    Any senior researcher, or seasoned mentor, has a practiced response to the 'how many' question. Mine tends to start with a reminder about the different philosophical assumptions undergirding qualitative and quantitative research projects (Staller, 2013). As Abrams (2010) points out, this difference leads to "major differences in sampling ...

  3. PDF Determining the Sample in Qualitative Research

    called the 'participants' or 'informants' rather than respondents (Nakkeeran, 2016; Padgett, ... 2019) to determine the number of the participants in qualitative inquiry. I have searched the articles in different databases related to determining the number of ... in qualitative research', 'qualitative sample size', 'number of participants ...

  4. Series: Practical guidance to qualitative research. Part 3: Sampling

    This article is the third paper in a series of four articles aiming to provide practical guidance to qualitative research. In an introductory paper, we have described the objective, nature and outline of the Series . Part 2 of the series focused on context, research questions and design of qualitative research . In this paper, Part 3, we ...

  5. Determining the Sample Size in Qualitative Research

    finds a variation of the sample size from 1 to 95 (averages being of 31 in the first ca se and 28 in the. second). The research region - one of t he cultural factors, plays a significant role in ...

  6. Determining Sample Size for Qualitative Research: What is the Magical

    Based on research conducted on this very issue, 30 seems to be a good number for the most comprehensive assessment. Some studies have noted having a sample size as little as 10 can be extremely fruitful, and still yield applicable results. This would of course be only after a rigorous recruiting process is put in place.

  7. How to Justify Sample Size in Qualitative Research

    To bring this one home, let's answer the question we sought out to investigate: the sample size in qualitative research. Typically, sample sizes will range from 6-20, per segment. (So if you have 5 segments, 6 is your multiplier for the total number you'll need, so you would have a total sample size of 30.) For very specific tasks, such as ...

  8. Sample sizes for saturation in qualitative research: A systematic

    Results. We identified 23 articles that used empirical data (n = 17) or statistical modeling (n = 6) to assess saturation. Studies using empirical data reached saturation within a narrow range of interviews (9-17) or focus group discussions (4-8), particularly those with relatively homogenous study populations and narrowly defined objectives.

  9. Chapter 5. Sampling

    Sampling in qualitative research has different purposes and goals than sampling in quantitative research. Sampling in both allows you to say something of interest about a population without having to include the entire population in your sample. We begin this chapter with the case of a population of interest composed of actual people.

  10. A simple method to assess and report thematic saturation in qualitative

    A brief history of saturation and qualitative sample size estimation. How many qualitative interviews are enough? Across academic disciplines, and for about the past five decades, the answer to this question has usually revolved around reaching saturation [1, 5-9].The concept of saturation was first introduced into the field of qualitative research as "theoretical saturation" by Glaser ...

  11. Sampling

    Sampling is a way that you can choose a smaller group of your population to research and then generalize the results of this across the larger population. There are several ways that you can sample. Time, money, and difficulty or ease in reaching your target population will shape your sampling decisions. While there are no hard and fast rules ...

  12. What's in a Number? Understanding the Right Sample Size for Qualitative

    Between 15-30. Based on research conducted on this issue, if you are building similar segments within the population, InterQ's recommendation for in-depth interviews is to have a sample size of 15-30. In some cases, a minimum of 10 is sufficient, assuming there has been integrity in the recruiting process. With the goal to maintain a rigorous ...

  13. (PDF) Sample Size for Interview in Qualitative Research in Social

    Specific to number 20-60 is the most frequently observed range of sample size in qualitative research which of course is determined by the aforementioned factors. Discover the world's research 25 ...

  14. Sampling: how to select participants in my research study?

    The essential topics related to the selection of participants for a health research are: 1) whether to work with samples or include the whole reference population in the study (census); 2) the sample basis; 3) the sampling process and 4) the potential effects nonrespondents might have on study results. We will refer to each of these aspects ...

  15. Qualitative Sample size Calculator

    What is a good sample size for a qualitative research study? Our sample size calculator will work out the answer based on your project's scope, participant characteristics, researcher expertise, and methodology. Just answer 4 quick questions to get a super actionable, data-backed recommendation for your next study.

  16. Sample Size Policy for Qualitative Studies Using In-Depth Interviews

    The sample size used in qualitative research methods is often smaller than that used in quantitative research methods. This is because qualitative research methods are often concerned with garnering an in-depth understanding of a phenomenon or are focused on meaning (and heterogeneities in meaning)—which are often centered on the how and why of a particular issue, process, situation ...

  17. Planning Qualitative Research: Design and Decision Making for New

    While many books and articles guide various qualitative research methods and analyses, there is currently no concise resource that explains and differentiates among the most common qualitative approaches. We believe novice qualitative researchers, students planning the design of a qualitative study or taking an introductory qualitative research course, and faculty teaching such courses can ...

  18. How to choose the right sample size for a qualitative study… and

    Mason's (2010) analysis of 560 PhD studies that adopted a qualitative interview as their main method revealed that the most common sample size in qualitative research is between 15 and 50 participants, with 20 being the average sample size in grounded theory studies (which was also the type of study I was undertaking).

  19. Qualitative Research Part II: Participants, Analysis, and Quality

    This is the second of a two-part series on qualitative research. Part 1 in the December 2011 issue of Journal of Graduate Medical Education provided an introduction to the topic and compared characteristics of quantitative and qualitative research, identified common data collection approaches, and briefly described data analysis and quality assessment techniques.

  20. How many interviews should you conduct for your qualitative research

    Calculate in 2 clics your sample size for a qualitative research. Calculate the number of qualitative interviews for your market research. +33 1 88 32 73 44 +32 2 347 45 86. Earn money by answering our surveys ... they may respond to different dynamics that would require you to provide sufficient respondents for each of these groups.

  21. What is the ideal Sample Size in Qualitative Research?

    Based on studies that have been done in academia on this very issue, 30 seems to be an ideal sample size for the most comprehensive view, but studies can have as little as 10 total participants and still yield extremely fruitful, and applicable, results. (This goes back to excellence in recruiting.) Our general recommendation for in-depth ...

  22. How to use and assess qualitative research methods

    Abstract. This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions ...

  23. Determining Sample Size: How Many Survey Participants Do You Need?

    All you have to do is take the number of respondents you need, divide by your expected response rate, and multiple by 100. For example, if you need 500 customers to respond to your survey and you know the response rate is 30%, you should invite about 1,666 people to your study (500/30*100 = 1,666).

Population SizeSample Size Based on ±3% Margin of ErrorSample Size Based on ±5% Margin of ErrorSample Size Based on ±10% Margin of Error
50034522080
1,00052528590
3,000810350100
5,000910370100
10,0001,000385100
100,00+1,100400100