Qualitative case study data analysis: an example from practice

Affiliation.

  • 1 School of Nursing and Midwifery, National University of Ireland, Galway, Republic of Ireland.
  • PMID: 25976531
  • DOI: 10.7748/nr.22.5.8.e1307

Aim: To illustrate an approach to data analysis in qualitative case study methodology.

Background: There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research.

Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software.

Review methods: Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided. Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources.

Conclusion: By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis.

Implications for research/practice: This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.

Keywords: Case study data analysis; case study research methodology; clinical skills research; qualitative case study methodology; qualitative data analysis; qualitative research.

  • Case-Control Studies*
  • Data Interpretation, Statistical*
  • Nursing Research / methods*
  • Qualitative Research*
  • Research Design

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Case Study? | Definition, Examples & Methods

What Is a Case Study? | Definition, Examples & Methods

Published on May 8, 2019 by Shona McCombes . Revised on November 20, 2023.

A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research.

A case study research design usually involves qualitative methods , but quantitative methods are sometimes also used. Case studies are good for describing , comparing, evaluating and understanding different aspects of a research problem .

Table of contents

When to do a case study, step 1: select a case, step 2: build a theoretical framework, step 3: collect your data, step 4: describe and analyze the case, other interesting articles.

A case study is an appropriate research design when you want to gain concrete, contextual, in-depth knowledge about a specific real-world subject. It allows you to explore the key characteristics, meanings, and implications of the case.

Case studies are often a good choice in a thesis or dissertation . They keep your project focused and manageable when you don’t have the time or resources to do large-scale research.

You might use just one complex case study where you explore a single subject in depth, or conduct multiple case studies to compare and illuminate different aspects of your research problem.

Prevent plagiarism. Run a free check.

Once you have developed your problem statement and research questions , you should be ready to choose the specific case that you want to focus on. A good case study should have the potential to:

  • Provide new or unexpected insights into the subject
  • Challenge or complicate existing assumptions and theories
  • Propose practical courses of action to resolve a problem
  • Open up new directions for future research

TipIf your research is more practical in nature and aims to simultaneously investigate an issue as you solve it, consider conducting action research instead.

Unlike quantitative or experimental research , a strong case study does not require a random or representative sample. In fact, case studies often deliberately focus on unusual, neglected, or outlying cases which may shed new light on the research problem.

Example of an outlying case studyIn the 1960s the town of Roseto, Pennsylvania was discovered to have extremely low rates of heart disease compared to the US average. It became an important case study for understanding previously neglected causes of heart disease.

However, you can also choose a more common or representative case to exemplify a particular category, experience or phenomenon.

Example of a representative case studyIn the 1920s, two sociologists used Muncie, Indiana as a case study of a typical American city that supposedly exemplified the changing culture of the US at the time.

While case studies focus more on concrete details than general theories, they should usually have some connection with theory in the field. This way the case study is not just an isolated description, but is integrated into existing knowledge about the topic. It might aim to:

  • Exemplify a theory by showing how it explains the case under investigation
  • Expand on a theory by uncovering new concepts and ideas that need to be incorporated
  • Challenge a theory by exploring an outlier case that doesn’t fit with established assumptions

To ensure that your analysis of the case has a solid academic grounding, you should conduct a literature review of sources related to the topic and develop a theoretical framework . This means identifying key concepts and theories to guide your analysis and interpretation.

There are many different research methods you can use to collect data on your subject. Case studies tend to focus on qualitative data using methods such as interviews , observations , and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data.

Example of a mixed methods case studyFor a case study of a wind farm development in a rural area, you could collect quantitative data on employment rates and business revenue, collect qualitative data on local people’s perceptions and experiences, and analyze local and national media coverage of the development.

The aim is to gain as thorough an understanding as possible of the case and its context.

In writing up the case study, you need to bring together all the relevant aspects to give as complete a picture as possible of the subject.

How you report your findings depends on the type of research you are doing. Some case studies are structured like a standard scientific paper or thesis , with separate sections or chapters for the methods , results and discussion .

Others are written in a more narrative style, aiming to explore the case from various angles and analyze its meanings and implications (for example, by using textual analysis or discourse analysis ).

In all cases, though, make sure to give contextual details about the case, connect it back to the literature and theory, and discuss how it fits into wider patterns or debates.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, November 20). What Is a Case Study? | Definition, Examples & Methods. Scribbr. Retrieved April 10, 2024, from https://www.scribbr.com/methodology/case-study/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, primary vs. secondary sources | difference & examples, what is a theoretical framework | guide to organizing, what is action research | definition & examples, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

Case Study Research in Software Engineering: Guidelines and Examples by Per Runeson, Martin Höst, Austen Rainer, Björn Regnell

Get full access to Case Study Research in Software Engineering: Guidelines and Examples and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

DATA ANALYSIS AND INTERPRETATION

5.1 introduction.

Once data has been collected the focus shifts to analysis of data. It can be said that in this phase, data is used to understand what actually has happened in the studied case, and where the researcher understands the details of the case and seeks patterns in the data. This means that there inevitably is some analysis going on also in the data collection phase where the data is studied, and for example when data from an interview is transcribed. The understandings in the earlier phases are of course also valid and important, but this chapter is more focusing on the separate phase that starts after the data has been collected.

Data analysis is conducted differently for quantitative and qualitative data. Sections 5.2 – 5.5 describe how to analyze qualitative data and how to assess the validity of this type of analysis. In Section 5.6 , a short introduction to quantitative analysis methods is given. Since quantitative analysis is covered extensively in textbooks on statistical analysis, and case study research to a large extent relies on qualitative data, this section is kept short.

5.2 ANALYSIS OF DATA IN FLEXIBLE RESEARCH

5.2.1 introduction.

As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence. The chain of evidence means that a reader ...

Get Case Study Research in Software Engineering: Guidelines and Examples now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

in a case study the data analysis

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Case Study | Definition, Examples & Methods

Case Study | Definition, Examples & Methods

Published on 5 May 2022 by Shona McCombes . Revised on 30 January 2023.

A case study is a detailed study of a specific subject, such as a person, group, place, event, organisation, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research.

A case study research design usually involves qualitative methods , but quantitative methods are sometimes also used. Case studies are good for describing , comparing, evaluating, and understanding different aspects of a research problem .

Table of contents

When to do a case study, step 1: select a case, step 2: build a theoretical framework, step 3: collect your data, step 4: describe and analyse the case.

A case study is an appropriate research design when you want to gain concrete, contextual, in-depth knowledge about a specific real-world subject. It allows you to explore the key characteristics, meanings, and implications of the case.

Case studies are often a good choice in a thesis or dissertation . They keep your project focused and manageable when you don’t have the time or resources to do large-scale research.

You might use just one complex case study where you explore a single subject in depth, or conduct multiple case studies to compare and illuminate different aspects of your research problem.

Prevent plagiarism, run a free check.

Once you have developed your problem statement and research questions , you should be ready to choose the specific case that you want to focus on. A good case study should have the potential to:

  • Provide new or unexpected insights into the subject
  • Challenge or complicate existing assumptions and theories
  • Propose practical courses of action to resolve a problem
  • Open up new directions for future research

Unlike quantitative or experimental research, a strong case study does not require a random or representative sample. In fact, case studies often deliberately focus on unusual, neglected, or outlying cases which may shed new light on the research problem.

If you find yourself aiming to simultaneously investigate and solve an issue, consider conducting action research . As its name suggests, action research conducts research and takes action at the same time, and is highly iterative and flexible. 

However, you can also choose a more common or representative case to exemplify a particular category, experience, or phenomenon.

While case studies focus more on concrete details than general theories, they should usually have some connection with theory in the field. This way the case study is not just an isolated description, but is integrated into existing knowledge about the topic. It might aim to:

  • Exemplify a theory by showing how it explains the case under investigation
  • Expand on a theory by uncovering new concepts and ideas that need to be incorporated
  • Challenge a theory by exploring an outlier case that doesn’t fit with established assumptions

To ensure that your analysis of the case has a solid academic grounding, you should conduct a literature review of sources related to the topic and develop a theoretical framework . This means identifying key concepts and theories to guide your analysis and interpretation.

There are many different research methods you can use to collect data on your subject. Case studies tend to focus on qualitative data using methods such as interviews, observations, and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data .

The aim is to gain as thorough an understanding as possible of the case and its context.

In writing up the case study, you need to bring together all the relevant aspects to give as complete a picture as possible of the subject.

How you report your findings depends on the type of research you are doing. Some case studies are structured like a standard scientific paper or thesis, with separate sections or chapters for the methods , results , and discussion .

Others are written in a more narrative style, aiming to explore the case from various angles and analyse its meanings and implications (for example, by using textual analysis or discourse analysis ).

In all cases, though, make sure to give contextual details about the case, connect it back to the literature and theory, and discuss how it fits into wider patterns or debates.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2023, January 30). Case Study | Definition, Examples & Methods. Scribbr. Retrieved 9 April 2024, from https://www.scribbr.co.uk/research-methods/case-studies/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, correlational research | guide, design & examples, a quick guide to experimental design | 5 steps & examples, descriptive research design | definition, methods & examples.

Data Analytics Case Study Guide (Updated for 2024)

Data Analytics Case Study Guide (Updated for 2024)

What are data analytics case study interviews.

When you’re trying to land a data analyst job, the last thing to stand in your way is the data analytics case study interview.

One reason they’re so challenging is that case studies don’t typically have a right or wrong answer.

Instead, case study interviews require you to come up with a hypothesis for an analytics question and then produce data to support or validate your hypothesis. In other words, it’s not just about your technical skills; you’re also being tested on creative problem-solving and your ability to communicate with stakeholders.

This article provides an overview of how to answer data analytics case study interview questions. You can find an in-depth course in the data analytics learning path .

How to Solve Data Analytics Case Questions

Check out our video below on How to solve a Data Analytics case study problem:

Data Analytics Case Study Vide Guide

With data analyst case questions, you will need to answer two key questions:

  • What metrics should I propose?
  • How do I write a SQL query to get the metrics I need?

In short, to ace a data analytics case interview, you not only need to brush up on case questions, but you also should be adept at writing all types of SQL queries and have strong data sense.

These questions are especially challenging to answer if you don’t have a framework or know how to answer them. To help you prepare, we created this step-by-step guide to answering data analytics case questions.

We show you how to use a framework to answer case questions, provide example analytics questions, and help you understand the difference between analytics case studies and product metrics case studies .

Data Analytics Cases vs Product Metrics Questions

Product case questions sometimes get lumped in with data analytics cases.

Ultimately, the type of case question you are asked will depend on the role. For example, product analysts will likely face more product-oriented questions.

Product metrics cases tend to focus on a hypothetical situation. You might be asked to:

Investigate Metrics - One of the most common types will ask you to investigate a metric, usually one that’s going up or down. For example, “Why are Facebook friend requests falling by 10 percent?”

Measure Product/Feature Success - A lot of analytics cases revolve around the measurement of product success and feature changes. For example, “We want to add X feature to product Y. What metrics would you track to make sure that’s a good idea?”

With product data cases, the key difference is that you may or may not be required to write the SQL query to find the metric.

Instead, these interviews are more theoretical and are designed to assess your product sense and ability to think about analytics problems from a product perspective. Product metrics questions may also show up in the data analyst interview , but likely only for product data analyst roles.

in a case study the data analysis

Data Analytics Case Study Question: Sample Solution

Data Analytics Case Study Sample Solution

Let’s start with an example data analytics case question :

You’re given a table that represents search results from searches on Facebook. The query column is the search term, the position column represents each position the search result came in, and the rating column represents the human rating from 1 to 5, where 5 is high relevance, and 1 is low relevance.

Each row in the search_events table represents a single search, with the has_clicked column representing if a user clicked on a result or not. We have a hypothesis that the CTR is dependent on the search result rating.

Write a query to return data to support or disprove this hypothesis.

search_results table:

search_events table

Step 1: With Data Analytics Case Studies, Start by Making Assumptions

Hint: Start by making assumptions and thinking out loud. With this question, focus on coming up with a metric to support the hypothesis. If the question is unclear or if you think you need more information, be sure to ask.

Answer. The hypothesis is that CTR is dependent on search result rating. Therefore, we want to focus on the CTR metric, and we can assume:

  • If CTR is high when search result ratings are high, and CTR is low when the search result ratings are low, then the hypothesis is correct.
  • If CTR is low when the search ratings are high, or there is no proven correlation between the two, then our hypothesis is not proven.

Step 2: Provide a Solution for the Case Question

Hint: Walk the interviewer through your reasoning. Talking about the decisions you make and why you’re making them shows off your problem-solving approach.

Answer. One way we can investigate the hypothesis is to look at the results split into different search rating buckets. For example, if we measure the CTR for results rated at 1, then those rated at 2, and so on, we can identify if an increase in rating is correlated with an increase in CTR.

First, I’d write a query to get the number of results for each query in each bucket. We want to look at the distribution of results that are less than a rating threshold, which will help us see the relationship between search rating and CTR.

This CTE aggregates the number of results that are less than a certain rating threshold. Later, we can use this to see the percentage that are in each bucket. If we re-join to the search_events table, we can calculate the CTR by then grouping by each bucket.

Step 3: Use Analysis to Backup Your Solution

Hint: Be prepared to justify your solution. Interviewers will follow up with questions about your reasoning, and ask why you make certain assumptions.

Answer. By using the CASE WHEN statement, I calculated each ratings bucket by checking to see if all the search results were less than 1, 2, or 3 by subtracting the total from the number within the bucket and seeing if it equates to 0.

I did that to get away from averages in our bucketing system. Outliers would make it more difficult to measure the effect of bad ratings. For example, if a query had a 1 rating and another had a 5 rating, that would equate to an average of 3. Whereas in my solution, a query with all of the results under 1, 2, or 3 lets us know that it actually has bad ratings.

Product Data Case Question: Sample Solution

product analytics on screen

In product metrics interviews, you’ll likely be asked about analytics, but the discussion will be more theoretical. You’ll propose a solution to a problem, and supply the metrics you’ll use to investigate or solve it. You may or may not be required to write a SQL query to get those metrics.

We’ll start with an example product metrics case study question :

Let’s say you work for a social media company that has just done a launch in a new city. Looking at weekly metrics, you see a slow decrease in the average number of comments per user from January to March in this city.

The company has been consistently growing new users in the city from January to March.

What are some reasons why the average number of comments per user would be decreasing and what metrics would you look into?

Step 1: Ask Clarifying Questions Specific to the Case

Hint: This question is very vague. It’s all hypothetical, so we don’t know very much about users, what the product is, and how people might be interacting. Be sure you ask questions upfront about the product.

Answer: Before I jump into an answer, I’d like to ask a few questions:

  • Who uses this social network? How do they interact with each other?
  • Has there been any performance issues that might be causing the problem?
  • What are the goals of this particular launch?
  • Has there been any changes to the comment features in recent weeks?

For the sake of this example, let’s say we learn that it’s a social network similar to Facebook with a young audience, and the goals of the launch are to grow the user base. Also, there have been no performance issues and the commenting feature hasn’t been changed since launch.

Step 2: Use the Case Question to Make Assumptions

Hint: Look for clues in the question. For example, this case gives you a metric, “average number of comments per user.” Consider if the clue might be helpful in your solution. But be careful, sometimes questions are designed to throw you off track.

Answer: From the question, we can hypothesize a little bit. For example, we know that user count is increasing linearly. That means two things:

  • The decreasing comments issue isn’t a result of a declining user base.
  • The cause isn’t loss of platform.

We can also model out the data to help us get a better picture of the average number of comments per user metric:

  • January: 10000 users, 30000 comments, 3 comments/user
  • February: 20000 users, 50000 comments, 2.5 comments/user
  • March: 30000 users, 60000 comments, 2 comments/user

One thing to note: Although this is an interesting metric, I’m not sure if it will help us solve this question. For one, average comments per user doesn’t account for churn. We might assume that during the three-month period users are churning off the platform. Let’s say the churn rate is 25% in January, 20% in February and 15% in March.

Step 3: Make a Hypothesis About the Data

Hint: Don’t worry too much about making a correct hypothesis. Instead, interviewers want to get a sense of your product initiation and that you’re on the right track. Also, be prepared to measure your hypothesis.

Answer. I would say that average comments per user isn’t a great metric to use, because it doesn’t reveal insights into what’s really causing this issue.

That’s because it doesn’t account for active users, which are the users who are actually commenting. A better metric to investigate would be retained users and monthly active users.

What I suspect is causing the issue is that active users are commenting frequently and are responsible for the increase in comments month-to-month. New users, on the other hand, aren’t as engaged and aren’t commenting as often.

Step 4: Provide Metrics and Data Analysis

Hint: Within your solution, include key metrics that you’d like to investigate that will help you measure success.

Answer: I’d say there are a few ways we could investigate the cause of this problem, but the one I’d be most interested in would be the engagement of monthly active users.

If the growth in comments is coming from active users, that would help us understand how we’re doing at retaining users. Plus, it will also show if new users are less engaged and commenting less frequently.

One way that we could dig into this would be to segment users by their onboarding date, which would help us to visualize engagement and see how engaged some of our longest-retained users are.

If engagement of new users is the issue, that will give us some options in terms of strategies for addressing the problem. For example, we could test new onboarding or commenting features designed to generate engagement.

Step 5: Propose a Solution for the Case Question

Hint: In the majority of cases, your initial assumptions might be incorrect, or the interviewer might throw you a curveball. Be prepared to make new hypotheses or discuss the pitfalls of your analysis.

Answer. If the cause wasn’t due to a lack of engagement among new users, then I’d want to investigate active users. One potential cause would be active users commenting less. In that case, we’d know that our earliest users were churning out, and that engagement among new users was potentially growing.

Again, I think we’d want to focus on user engagement since the onboarding date. That would help us understand if we were seeing higher levels of churn among active users, and we could start to identify some solutions there.

Tip: Use a Framework to Solve Data Analytics Case Questions

Analytics case questions can be challenging, but they’re much more challenging if you don’t use a framework. Without a framework, it’s easier to get lost in your answer, to get stuck, and really lose the confidence of your interviewer. Find helpful frameworks for data analytics questions in our data analytics learning path and our product metrics learning path .

Once you have the framework down, what’s the best way to practice? Mock interviews with our coaches are very effective, as you’ll get feedback and helpful tips as you answer. You can also learn a lot by practicing P2P mock interviews with other Interview Query students. No data analytics background? Check out how to become a data analyst without a degree .

Finally, if you’re looking for sample data analytics case questions and other types of interview questions, see our guide on the top data analyst interview questions .

Organizing Your Social Sciences Research Assignments

  • Annotated Bibliography
  • Analyzing a Scholarly Journal Article
  • Group Presentations
  • Dealing with Nervousness
  • Using Visual Aids
  • Grading Someone Else's Paper
  • Types of Structured Group Activities
  • Group Project Survival Skills
  • Leading a Class Discussion
  • Multiple Book Review Essay
  • Reviewing Collected Works
  • Writing a Case Analysis Paper
  • Writing a Case Study
  • About Informed Consent
  • Writing Field Notes
  • Writing a Policy Memo
  • Writing a Reflective Paper
  • Writing a Research Proposal
  • Generative AI and Writing
  • Acknowledgments

Definition and Introduction

Case analysis is a problem-based teaching and learning method that involves critically analyzing complex scenarios within an organizational setting for the purpose of placing the student in a “real world” situation and applying reflection and critical thinking skills to contemplate appropriate solutions, decisions, or recommended courses of action. It is considered a more effective teaching technique than in-class role playing or simulation activities. The analytical process is often guided by questions provided by the instructor that ask students to contemplate relationships between the facts and critical incidents described in the case.

Cases generally include both descriptive and statistical elements and rely on students applying abductive reasoning to develop and argue for preferred or best outcomes [i.e., case scenarios rarely have a single correct or perfect answer based on the evidence provided]. Rather than emphasizing theories or concepts, case analysis assignments emphasize building a bridge of relevancy between abstract thinking and practical application and, by so doing, teaches the value of both within a specific area of professional practice.

Given this, the purpose of a case analysis paper is to present a structured and logically organized format for analyzing the case situation. It can be assigned to students individually or as a small group assignment and it may include an in-class presentation component. Case analysis is predominately taught in economics and business-related courses, but it is also a method of teaching and learning found in other applied social sciences disciplines, such as, social work, public relations, education, journalism, and public administration.

Ellet, William. The Case Study Handbook: A Student's Guide . Revised Edition. Boston, MA: Harvard Business School Publishing, 2018; Christoph Rasche and Achim Seisreiner. Guidelines for Business Case Analysis . University of Potsdam; Writing a Case Analysis . Writing Center, Baruch College; Volpe, Guglielmo. "Case Teaching in Economics: History, Practice and Evidence." Cogent Economics and Finance 3 (December 2015). doi:https://doi.org/10.1080/23322039.2015.1120977.

How to Approach Writing a Case Analysis Paper

The organization and structure of a case analysis paper can vary depending on the organizational setting, the situation, and how your professor wants you to approach the assignment. Nevertheless, preparing to write a case analysis paper involves several important steps. As Hawes notes, a case analysis assignment “...is useful in developing the ability to get to the heart of a problem, analyze it thoroughly, and to indicate the appropriate solution as well as how it should be implemented” [p.48]. This statement encapsulates how you should approach preparing to write a case analysis paper.

Before you begin to write your paper, consider the following analytical procedures:

  • Review the case to get an overview of the situation . A case can be only a few pages in length, however, it is most often very lengthy and contains a significant amount of detailed background information and statistics, with multilayered descriptions of the scenario, the roles and behaviors of various stakeholder groups, and situational events. Therefore, a quick reading of the case will help you gain an overall sense of the situation and illuminate the types of issues and problems that you will need to address in your paper. If your professor has provided questions intended to help frame your analysis, use them to guide your initial reading of the case.
  • Read the case thoroughly . After gaining a general overview of the case, carefully read the content again with the purpose of understanding key circumstances, events, and behaviors among stakeholder groups. Look for information or data that appears contradictory, extraneous, or misleading. At this point, you should be taking notes as you read because this will help you develop a general outline of your paper. The aim is to obtain a complete understanding of the situation so that you can begin contemplating tentative answers to any questions your professor has provided or, if they have not provided, developing answers to your own questions about the case scenario and its connection to the course readings,lectures, and class discussions.
  • Determine key stakeholder groups, issues, and events and the relationships they all have to each other . As you analyze the content, pay particular attention to identifying individuals, groups, or organizations described in the case and identify evidence of any problems or issues of concern that impact the situation in a negative way. Other things to look for include identifying any assumptions being made by or about each stakeholder, potential biased explanations or actions, explicit demands or ultimatums , and the underlying concerns that motivate these behaviors among stakeholders. The goal at this stage is to develop a comprehensive understanding of the situational and behavioral dynamics of the case and the explicit and implicit consequences of each of these actions.
  • Identify the core problems . The next step in most case analysis assignments is to discern what the core [i.e., most damaging, detrimental, injurious] problems are within the organizational setting and to determine their implications. The purpose at this stage of preparing to write your analysis paper is to distinguish between the symptoms of core problems and the core problems themselves and to decide which of these must be addressed immediately and which problems do not appear critical but may escalate over time. Identify evidence from the case to support your decisions by determining what information or data is essential to addressing the core problems and what information is not relevant or is misleading.
  • Explore alternative solutions . As noted, case analysis scenarios rarely have only one correct answer. Therefore, it is important to keep in mind that the process of analyzing the case and diagnosing core problems, while based on evidence, is a subjective process open to various avenues of interpretation. This means that you must consider alternative solutions or courses of action by critically examining strengths and weaknesses, risk factors, and the differences between short and long-term solutions. For each possible solution or course of action, consider the consequences they may have related to their implementation and how these recommendations might lead to new problems. Also, consider thinking about your recommended solutions or courses of action in relation to issues of fairness, equity, and inclusion.
  • Decide on a final set of recommendations . The last stage in preparing to write a case analysis paper is to assert an opinion or viewpoint about the recommendations needed to help resolve the core problems as you see them and to make a persuasive argument for supporting this point of view. Prepare a clear rationale for your recommendations based on examining each element of your analysis. Anticipate possible obstacles that could derail their implementation. Consider any counter-arguments that could be made concerning the validity of your recommended actions. Finally, describe a set of criteria and measurable indicators that could be applied to evaluating the effectiveness of your implementation plan.

Use these steps as the framework for writing your paper. Remember that the more detailed you are in taking notes as you critically examine each element of the case, the more information you will have to draw from when you begin to write. This will save you time.

NOTE : If the process of preparing to write a case analysis paper is assigned as a student group project, consider having each member of the group analyze a specific element of the case, including drafting answers to the corresponding questions used by your professor to frame the analysis. This will help make the analytical process more efficient and ensure that the distribution of work is equitable. This can also facilitate who is responsible for drafting each part of the final case analysis paper and, if applicable, the in-class presentation.

Framework for Case Analysis . College of Management. University of Massachusetts; Hawes, Jon M. "Teaching is Not Telling: The Case Method as a Form of Interactive Learning." Journal for Advancement of Marketing Education 5 (Winter 2004): 47-54; Rasche, Christoph and Achim Seisreiner. Guidelines for Business Case Analysis . University of Potsdam; Writing a Case Study Analysis . University of Arizona Global Campus Writing Center; Van Ness, Raymond K. A Guide to Case Analysis . School of Business. State University of New York, Albany; Writing a Case Analysis . Business School, University of New South Wales.

Structure and Writing Style

A case analysis paper should be detailed, concise, persuasive, clearly written, and professional in tone and in the use of language . As with other forms of college-level academic writing, declarative statements that convey information, provide a fact, or offer an explanation or any recommended courses of action should be based on evidence. If allowed by your professor, any external sources used to support your analysis, such as course readings, should be properly cited under a list of references. The organization and structure of case analysis papers can vary depending on your professor’s preferred format, but its structure generally follows the steps used for analyzing the case.

Introduction

The introduction should provide a succinct but thorough descriptive overview of the main facts, issues, and core problems of the case . The introduction should also include a brief summary of the most relevant details about the situation and organizational setting. This includes defining the theoretical framework or conceptual model on which any questions were used to frame your analysis.

Following the rules of most college-level research papers, the introduction should then inform the reader how the paper will be organized. This includes describing the major sections of the paper and the order in which they will be presented. Unless you are told to do so by your professor, you do not need to preview your final recommendations in the introduction. U nlike most college-level research papers , the introduction does not include a statement about the significance of your findings because a case analysis assignment does not involve contributing new knowledge about a research problem.

Background Analysis

Background analysis can vary depending on any guiding questions provided by your professor and the underlying concept or theory that the case is based upon. In general, however, this section of your paper should focus on:

  • Providing an overarching analysis of problems identified from the case scenario, including identifying events that stakeholders find challenging or troublesome,
  • Identifying assumptions made by each stakeholder and any apparent biases they may exhibit,
  • Describing any demands or claims made by or forced upon key stakeholders, and
  • Highlighting any issues of concern or complaints expressed by stakeholders in response to those demands or claims.

These aspects of the case are often in the form of behavioral responses expressed by individuals or groups within the organizational setting. However, note that problems in a case situation can also be reflected in data [or the lack thereof] and in the decision-making, operational, cultural, or institutional structure of the organization. Additionally, demands or claims can be either internal and external to the organization [e.g., a case analysis involving a president considering arms sales to Saudi Arabia could include managing internal demands from White House advisors as well as demands from members of Congress].

Throughout this section, present all relevant evidence from the case that supports your analysis. Do not simply claim there is a problem, an assumption, a demand, or a concern; tell the reader what part of the case informed how you identified these background elements.

Identification of Problems

In most case analysis assignments, there are problems, and then there are problems . Each problem can reflect a multitude of underlying symptoms that are detrimental to the interests of the organization. The purpose of identifying problems is to teach students how to differentiate between problems that vary in severity, impact, and relative importance. Given this, problems can be described in three general forms: those that must be addressed immediately, those that should be addressed but the impact is not severe, and those that do not require immediate attention and can be set aside for the time being.

All of the problems you identify from the case should be identified in this section of your paper, with a description based on evidence explaining the problem variances. If the assignment asks you to conduct research to further support your assessment of the problems, include this in your explanation. Remember to cite those sources in a list of references. Use specific evidence from the case and apply appropriate concepts, theories, and models discussed in class or in relevant course readings to highlight and explain the key problems [or problem] that you believe must be solved immediately and describe the underlying symptoms and why they are so critical.

Alternative Solutions

This section is where you provide specific, realistic, and evidence-based solutions to the problems you have identified and make recommendations about how to alleviate the underlying symptomatic conditions impacting the organizational setting. For each solution, you must explain why it was chosen and provide clear evidence to support your reasoning. This can include, for example, course readings and class discussions as well as research resources, such as, books, journal articles, research reports, or government documents. In some cases, your professor may encourage you to include personal, anecdotal experiences as evidence to support why you chose a particular solution or set of solutions. Using anecdotal evidence helps promote reflective thinking about the process of determining what qualifies as a core problem and relevant solution .

Throughout this part of the paper, keep in mind the entire array of problems that must be addressed and describe in detail the solutions that might be implemented to resolve these problems.

Recommended Courses of Action

In some case analysis assignments, your professor may ask you to combine the alternative solutions section with your recommended courses of action. However, it is important to know the difference between the two. A solution refers to the answer to a problem. A course of action refers to a procedure or deliberate sequence of activities adopted to proactively confront a situation, often in the context of accomplishing a goal. In this context, proposed courses of action are based on your analysis of alternative solutions. Your description and justification for pursuing each course of action should represent the overall plan for implementing your recommendations.

For each course of action, you need to explain the rationale for your recommendation in a way that confronts challenges, explains risks, and anticipates any counter-arguments from stakeholders. Do this by considering the strengths and weaknesses of each course of action framed in relation to how the action is expected to resolve the core problems presented, the possible ways the action may affect remaining problems, and how the recommended action will be perceived by each stakeholder.

In addition, you should describe the criteria needed to measure how well the implementation of these actions is working and explain which individuals or groups are responsible for ensuring your recommendations are successful. In addition, always consider the law of unintended consequences. Outline difficulties that may arise in implementing each course of action and describe how implementing the proposed courses of action [either individually or collectively] may lead to new problems [both large and small].

Throughout this section, you must consider the costs and benefits of recommending your courses of action in relation to uncertainties or missing information and the negative consequences of success.

The conclusion should be brief and introspective. Unlike a research paper, the conclusion in a case analysis paper does not include a summary of key findings and their significance, a statement about how the study contributed to existing knowledge, or indicate opportunities for future research.

Begin by synthesizing the core problems presented in the case and the relevance of your recommended solutions. This can include an explanation of what you have learned about the case in the context of your answers to the questions provided by your professor. The conclusion is also where you link what you learned from analyzing the case with the course readings or class discussions. This can further demonstrate your understanding of the relationships between the practical case situation and the theoretical and abstract content of assigned readings and other course content.

Problems to Avoid

The literature on case analysis assignments often includes examples of difficulties students have with applying methods of critical analysis and effectively reporting the results of their assessment of the situation. A common reason cited by scholars is that the application of this type of teaching and learning method is limited to applied fields of social and behavioral sciences and, as a result, writing a case analysis paper can be unfamiliar to most students entering college.

After you have drafted your paper, proofread the narrative flow and revise any of these common errors:

  • Unnecessary detail in the background section . The background section should highlight the essential elements of the case based on your analysis. Focus on summarizing the facts and highlighting the key factors that become relevant in the other sections of the paper by eliminating any unnecessary information.
  • Analysis relies too much on opinion . Your analysis is interpretive, but the narrative must be connected clearly to evidence from the case and any models and theories discussed in class or in course readings. Any positions or arguments you make should be supported by evidence.
  • Analysis does not focus on the most important elements of the case . Your paper should provide a thorough overview of the case. However, the analysis should focus on providing evidence about what you identify are the key events, stakeholders, issues, and problems. Emphasize what you identify as the most critical aspects of the case to be developed throughout your analysis. Be thorough but succinct.
  • Writing is too descriptive . A paper with too much descriptive information detracts from your analysis of the complexities of the case situation. Questions about what happened, where, when, and by whom should only be included as essential information leading to your examination of questions related to why, how, and for what purpose.
  • Inadequate definition of a core problem and associated symptoms . A common error found in case analysis papers is recommending a solution or course of action without adequately defining or demonstrating that you understand the problem. Make sure you have clearly described the problem and its impact and scope within the organizational setting. Ensure that you have adequately described the root causes w hen describing the symptoms of the problem.
  • Recommendations lack specificity . Identify any use of vague statements and indeterminate terminology, such as, “A particular experience” or “a large increase to the budget.” These statements cannot be measured and, as a result, there is no way to evaluate their successful implementation. Provide specific data and use direct language in describing recommended actions.
  • Unrealistic, exaggerated, or unattainable recommendations . Review your recommendations to ensure that they are based on the situational facts of the case. Your recommended solutions and courses of action must be based on realistic assumptions and fit within the constraints of the situation. Also note that the case scenario has already happened, therefore, any speculation or arguments about what could have occurred if the circumstances were different should be revised or eliminated.

Bee, Lian Song et al. "Business Students' Perspectives on Case Method Coaching for Problem-Based Learning: Impacts on Student Engagement and Learning Performance in Higher Education." Education & Training 64 (2022): 416-432; The Case Analysis . Fred Meijer Center for Writing and Michigan Authors. Grand Valley State University; Georgallis, Panikos and Kayleigh Bruijn. "Sustainability Teaching using Case-Based Debates." Journal of International Education in Business 15 (2022): 147-163; Hawes, Jon M. "Teaching is Not Telling: The Case Method as a Form of Interactive Learning." Journal for Advancement of Marketing Education 5 (Winter 2004): 47-54; Georgallis, Panikos, and Kayleigh Bruijn. "Sustainability Teaching Using Case-based Debates." Journal of International Education in Business 15 (2022): 147-163; .Dean,  Kathy Lund and Charles J. Fornaciari. "How to Create and Use Experiential Case-Based Exercises in a Management Classroom." Journal of Management Education 26 (October 2002): 586-603; Klebba, Joanne M. and Janet G. Hamilton. "Structured Case Analysis: Developing Critical Thinking Skills in a Marketing Case Course." Journal of Marketing Education 29 (August 2007): 132-137, 139; Klein, Norman. "The Case Discussion Method Revisited: Some Questions about Student Skills." Exchange: The Organizational Behavior Teaching Journal 6 (November 1981): 30-32; Mukherjee, Arup. "Effective Use of In-Class Mini Case Analysis for Discovery Learning in an Undergraduate MIS Course." The Journal of Computer Information Systems 40 (Spring 2000): 15-23; Pessoa, Silviaet al. "Scaffolding the Case Analysis in an Organizational Behavior Course: Making Analytical Language Explicit." Journal of Management Education 46 (2022): 226-251: Ramsey, V. J. and L. D. Dodge. "Case Analysis: A Structured Approach." Exchange: The Organizational Behavior Teaching Journal 6 (November 1981): 27-29; Schweitzer, Karen. "How to Write and Format a Business Case Study." ThoughtCo. https://www.thoughtco.com/how-to-write-and-format-a-business-case-study-466324 (accessed December 5, 2022); Reddy, C. D. "Teaching Research Methodology: Everything's a Case." Electronic Journal of Business Research Methods 18 (December 2020): 178-188; Volpe, Guglielmo. "Case Teaching in Economics: History, Practice and Evidence." Cogent Economics and Finance 3 (December 2015). doi:https://doi.org/10.1080/23322039.2015.1120977.

Writing Tip

Ca se Study and Case Analysis Are Not the Same!

Confusion often exists between what it means to write a paper that uses a case study research design and writing a paper that analyzes a case; they are two different types of approaches to learning in the social and behavioral sciences. Professors as well as educational researchers contribute to this confusion because they often use the term "case study" when describing the subject of analysis for a case analysis paper. But you are not studying a case for the purpose of generating a comprehensive, multi-faceted understanding of a research problem. R ather, you are critically analyzing a specific scenario to argue logically for recommended solutions and courses of action that lead to optimal outcomes applicable to professional practice.

To avoid any confusion, here are twelve characteristics that delineate the differences between writing a paper using the case study research method and writing a case analysis paper:

  • Case study is a method of in-depth research and rigorous inquiry ; case analysis is a reliable method of teaching and learning . A case study is a modality of research that investigates a phenomenon for the purpose of creating new knowledge, solving a problem, or testing a hypothesis using empirical evidence derived from the case being studied. Often, the results are used to generalize about a larger population or within a wider context. The writing adheres to the traditional standards of a scholarly research study. A case analysis is a pedagogical tool used to teach students how to reflect and think critically about a practical, real-life problem in an organizational setting.
  • The researcher is responsible for identifying the case to study; a case analysis is assigned by your professor . As the researcher, you choose the case study to investigate in support of obtaining new knowledge and understanding about the research problem. The case in a case analysis assignment is almost always provided, and sometimes written, by your professor and either given to every student in class to analyze individually or to a small group of students, or students select a case to analyze from a predetermined list.
  • A case study is indeterminate and boundless; a case analysis is predetermined and confined . A case study can be almost anything [see item 9 below] as long as it relates directly to examining the research problem. This relationship is the only limit to what a researcher can choose as the subject of their case study. The content of a case analysis is determined by your professor and its parameters are well-defined and limited to elucidating insights of practical value applied to practice.
  • Case study is fact-based and describes actual events or situations; case analysis can be entirely fictional or adapted from an actual situation . The entire content of a case study must be grounded in reality to be a valid subject of investigation in an empirical research study. A case analysis only needs to set the stage for critically examining a situation in practice and, therefore, can be entirely fictional or adapted, all or in-part, from an actual situation.
  • Research using a case study method must adhere to principles of intellectual honesty and academic integrity; a case analysis scenario can include misleading or false information . A case study paper must report research objectively and factually to ensure that any findings are understood to be logically correct and trustworthy. A case analysis scenario may include misleading or false information intended to deliberately distract from the central issues of the case. The purpose is to teach students how to sort through conflicting or useless information in order to come up with the preferred solution. Any use of misleading or false information in academic research is considered unethical.
  • Case study is linked to a research problem; case analysis is linked to a practical situation or scenario . In the social sciences, the subject of an investigation is most often framed as a problem that must be researched in order to generate new knowledge leading to a solution. Case analysis narratives are grounded in real life scenarios for the purpose of examining the realities of decision-making behavior and processes within organizational settings. A case analysis assignments include a problem or set of problems to be analyzed. However, the goal is centered around the act of identifying and evaluating courses of action leading to best possible outcomes.
  • The purpose of a case study is to create new knowledge through research; the purpose of a case analysis is to teach new understanding . Case studies are a choice of methodological design intended to create new knowledge about resolving a research problem. A case analysis is a mode of teaching and learning intended to create new understanding and an awareness of uncertainty applied to practice through acts of critical thinking and reflection.
  • A case study seeks to identify the best possible solution to a research problem; case analysis can have an indeterminate set of solutions or outcomes . Your role in studying a case is to discover the most logical, evidence-based ways to address a research problem. A case analysis assignment rarely has a single correct answer because one of the goals is to force students to confront the real life dynamics of uncertainly, ambiguity, and missing or conflicting information within professional practice. Under these conditions, a perfect outcome or solution almost never exists.
  • Case study is unbounded and relies on gathering external information; case analysis is a self-contained subject of analysis . The scope of a case study chosen as a method of research is bounded. However, the researcher is free to gather whatever information and data is necessary to investigate its relevance to understanding the research problem. For a case analysis assignment, your professor will often ask you to examine solutions or recommended courses of action based solely on facts and information from the case.
  • Case study can be a person, place, object, issue, event, condition, or phenomenon; a case analysis is a carefully constructed synopsis of events, situations, and behaviors . The research problem dictates the type of case being studied and, therefore, the design can encompass almost anything tangible as long as it fulfills the objective of generating new knowledge and understanding. A case analysis is in the form of a narrative containing descriptions of facts, situations, processes, rules, and behaviors within a particular setting and under a specific set of circumstances.
  • Case study can represent an open-ended subject of inquiry; a case analysis is a narrative about something that has happened in the past . A case study is not restricted by time and can encompass an event or issue with no temporal limit or end. For example, the current war in Ukraine can be used as a case study of how medical personnel help civilians during a large military conflict, even though circumstances around this event are still evolving. A case analysis can be used to elicit critical thinking about current or future situations in practice, but the case itself is a narrative about something finite and that has taken place in the past.
  • Multiple case studies can be used in a research study; case analysis involves examining a single scenario . Case study research can use two or more cases to examine a problem, often for the purpose of conducting a comparative investigation intended to discover hidden relationships, document emerging trends, or determine variations among different examples. A case analysis assignment typically describes a stand-alone, self-contained situation and any comparisons among cases are conducted during in-class discussions and/or student presentations.

The Case Analysis . Fred Meijer Center for Writing and Michigan Authors. Grand Valley State University; Mills, Albert J. , Gabrielle Durepos, and Eiden Wiebe, editors. Encyclopedia of Case Study Research . Thousand Oaks, CA: SAGE Publications, 2010; Ramsey, V. J. and L. D. Dodge. "Case Analysis: A Structured Approach." Exchange: The Organizational Behavior Teaching Journal 6 (November 1981): 27-29; Yin, Robert K. Case Study Research and Applications: Design and Methods . 6th edition. Thousand Oaks, CA: Sage, 2017; Crowe, Sarah et al. “The Case Study Approach.” BMC Medical Research Methodology 11 (2011):  doi: 10.1186/1471-2288-11-100; Yin, Robert K. Case Study Research: Design and Methods . 4th edition. Thousand Oaks, CA: Sage Publishing; 1994.

  • << Previous: Reviewing Collected Works
  • Next: Writing a Case Study >>
  • Last Updated: Mar 6, 2024 1:00 PM
  • URL: https://libguides.usc.edu/writingguide/assignments

10 Real World Data Science Case Studies Projects with Example

Top 10 Data Science Case Studies Projects with Examples and Solutions in Python to inspire your data science learning in 2023.

10 Real World Data Science Case Studies Projects with Example

BelData science has been a trending buzzword in recent times. With wide applications in various sectors like healthcare , education, retail, transportation, media, and banking -data science applications are at the core of pretty much every industry out there. The possibilities are endless: analysis of frauds in the finance sector or the personalization of recommendations on eCommerce businesses.  We have developed ten exciting data science case studies to explain how data science is leveraged across various industries to make smarter decisions and develop innovative personalized products tailored to specific customers.

data_science_project

Walmart Sales Forecasting Data Science Project

Downloadable solution code | Explanatory videos | Tech Support

Table of Contents

Data science case studies in retail , data science case study examples in entertainment industry , data analytics case study examples in travel industry , case studies for data analytics in social media , real world data science projects in healthcare, data analytics case studies in oil and gas, what is a case study in data science, how do you prepare a data science case study, 10 most interesting data science case studies with examples.

data science case studies

So, without much ado, let's get started with data science business case studies !

With humble beginnings as a simple discount retailer, today, Walmart operates in 10,500 stores and clubs in 24 countries and eCommerce websites, employing around 2.2 million people around the globe. For the fiscal year ended January 31, 2021, Walmart's total revenue was $559 billion showing a growth of $35 billion with the expansion of the eCommerce sector. Walmart is a data-driven company that works on the principle of 'Everyday low cost' for its consumers. To achieve this goal, they heavily depend on the advances of their data science and analytics department for research and development, also known as Walmart Labs. Walmart is home to the world's largest private cloud, which can manage 2.5 petabytes of data every hour! To analyze this humongous amount of data, Walmart has created 'Data Café,' a state-of-the-art analytics hub located within its Bentonville, Arkansas headquarters. The Walmart Labs team heavily invests in building and managing technologies like cloud, data, DevOps , infrastructure, and security.

ProjectPro Free Projects on Big Data and Data Science

Walmart is experiencing massive digital growth as the world's largest retailer . Walmart has been leveraging Big data and advances in data science to build solutions to enhance, optimize and customize the shopping experience and serve their customers in a better way. At Walmart Labs, data scientists are focused on creating data-driven solutions that power the efficiency and effectiveness of complex supply chain management processes. Here are some of the applications of data science  at Walmart:

i) Personalized Customer Shopping Experience

Walmart analyses customer preferences and shopping patterns to optimize the stocking and displaying of merchandise in their stores. Analysis of Big data also helps them understand new item sales, make decisions on discontinuing products, and the performance of brands.

ii) Order Sourcing and On-Time Delivery Promise

Millions of customers view items on Walmart.com, and Walmart provides each customer a real-time estimated delivery date for the items purchased. Walmart runs a backend algorithm that estimates this based on the distance between the customer and the fulfillment center, inventory levels, and shipping methods available. The supply chain management system determines the optimum fulfillment center based on distance and inventory levels for every order. It also has to decide on the shipping method to minimize transportation costs while meeting the promised delivery date.

Here's what valued users are saying about ProjectPro

user profile

Abhinav Agarwal

Graduate Student at Northwestern University

user profile

Gautam Vermani

Data Consultant at Confidential

Not sure what you are looking for?

iii) Packing Optimization 

Also known as Box recommendation is a daily occurrence in the shipping of items in retail and eCommerce business. When items of an order or multiple orders for the same customer are ready for packing, Walmart has developed a recommender system that picks the best-sized box which holds all the ordered items with the least in-box space wastage within a fixed amount of time. This Bin Packing problem is a classic NP-Hard problem familiar to data scientists .

Whenever items of an order or multiple orders placed by the same customer are picked from the shelf and are ready for packing, the box recommendation system determines the best-sized box to hold all the ordered items with a minimum of in-box space wasted. This problem is known as the Bin Packing Problem, another classic NP-Hard problem familiar to data scientists.

Here is a link to a sales prediction data science case study to help you understand the applications of Data Science in the real world. Walmart Sales Forecasting Project uses historical sales data for 45 Walmart stores located in different regions. Each store contains many departments, and you must build a model to project the sales for each department in each store. This data science case study aims to create a predictive model to predict the sales of each product. You can also try your hands-on Inventory Demand Forecasting Data Science Project to develop a machine learning model to forecast inventory demand accurately based on historical sales data.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Amazon is an American multinational technology-based company based in Seattle, USA. It started as an online bookseller, but today it focuses on eCommerce, cloud computing , digital streaming, and artificial intelligence . It hosts an estimate of 1,000,000,000 gigabytes of data across more than 1,400,000 servers. Through its constant innovation in data science and big data Amazon is always ahead in understanding its customers. Here are a few data analytics case study examples at Amazon:

i) Recommendation Systems

Data science models help amazon understand the customers' needs and recommend them to them before the customer searches for a product; this model uses collaborative filtering. Amazon uses 152 million customer purchases data to help users to decide on products to be purchased. The company generates 35% of its annual sales using the Recommendation based systems (RBS) method.

Here is a Recommender System Project to help you build a recommendation system using collaborative filtering. 

ii) Retail Price Optimization

Amazon product prices are optimized based on a predictive model that determines the best price so that the users do not refuse to buy it based on price. The model carefully determines the optimal prices considering the customers' likelihood of purchasing the product and thinks the price will affect the customers' future buying patterns. Price for a product is determined according to your activity on the website, competitors' pricing, product availability, item preferences, order history, expected profit margin, and other factors.

Check Out this Retail Price Optimization Project to build a Dynamic Pricing Model.

iii) Fraud Detection

Being a significant eCommerce business, Amazon remains at high risk of retail fraud. As a preemptive measure, the company collects historical and real-time data for every order. It uses Machine learning algorithms to find transactions with a higher probability of being fraudulent. This proactive measure has helped the company restrict clients with an excessive number of returns of products.

You can look at this Credit Card Fraud Detection Project to implement a fraud detection model to classify fraudulent credit card transactions.

New Projects

Let us explore data analytics case study examples in the entertainment indusry.

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

Netflix started as a DVD rental service in 1997 and then has expanded into the streaming business. Headquartered in Los Gatos, California, Netflix is the largest content streaming company in the world. Currently, Netflix has over 208 million paid subscribers worldwide, and with thousands of smart devices which are presently streaming supported, Netflix has around 3 billion hours watched every month. The secret to this massive growth and popularity of Netflix is its advanced use of data analytics and recommendation systems to provide personalized and relevant content recommendations to its users. The data is collected over 100 billion events every day. Here are a few examples of data analysis case studies applied at Netflix :

i) Personalized Recommendation System

Netflix uses over 1300 recommendation clusters based on consumer viewing preferences to provide a personalized experience. Some of the data that Netflix collects from its users include Viewing time, platform searches for keywords, Metadata related to content abandonment, such as content pause time, rewind, rewatched. Using this data, Netflix can predict what a viewer is likely to watch and give a personalized watchlist to a user. Some of the algorithms used by the Netflix recommendation system are Personalized video Ranking, Trending now ranker, and the Continue watching now ranker.

ii) Content Development using Data Analytics

Netflix uses data science to analyze the behavior and patterns of its user to recognize themes and categories that the masses prefer to watch. This data is used to produce shows like The umbrella academy, and Orange Is the New Black, and the Queen's Gambit. These shows seem like a huge risk but are significantly based on data analytics using parameters, which assured Netflix that they would succeed with its audience. Data analytics is helping Netflix come up with content that their viewers want to watch even before they know they want to watch it.

iii) Marketing Analytics for Campaigns

Netflix uses data analytics to find the right time to launch shows and ad campaigns to have maximum impact on the target audience. Marketing analytics helps come up with different trailers and thumbnails for other groups of viewers. For example, the House of Cards Season 5 trailer with a giant American flag was launched during the American presidential elections, as it would resonate well with the audience.

Here is a Customer Segmentation Project using association rule mining to understand the primary grouping of customers based on various parameters.

Get FREE Access to Machine Learning Example Codes for Data Cleaning , Data Munging, and Data Visualization

In a world where Purchasing music is a thing of the past and streaming music is a current trend, Spotify has emerged as one of the most popular streaming platforms. With 320 million monthly users, around 4 billion playlists, and approximately 2 million podcasts, Spotify leads the pack among well-known streaming platforms like Apple Music, Wynk, Songza, amazon music, etc. The success of Spotify has mainly depended on data analytics. By analyzing massive volumes of listener data, Spotify provides real-time and personalized services to its listeners. Most of Spotify's revenue comes from paid premium subscriptions. Here are some of the examples of case study on data analytics used by Spotify to provide enhanced services to its listeners:

i) Personalization of Content using Recommendation Systems

Spotify uses Bart or Bayesian Additive Regression Trees to generate music recommendations to its listeners in real-time. Bart ignores any song a user listens to for less than 30 seconds. The model is retrained every day to provide updated recommendations. A new Patent granted to Spotify for an AI application is used to identify a user's musical tastes based on audio signals, gender, age, accent to make better music recommendations.

Spotify creates daily playlists for its listeners, based on the taste profiles called 'Daily Mixes,' which have songs the user has added to their playlists or created by the artists that the user has included in their playlists. It also includes new artists and songs that the user might be unfamiliar with but might improve the playlist. Similar to it is the weekly 'Release Radar' playlists that have newly released artists' songs that the listener follows or has liked before.

ii) Targetted marketing through Customer Segmentation

With user data for enhancing personalized song recommendations, Spotify uses this massive dataset for targeted ad campaigns and personalized service recommendations for its users. Spotify uses ML models to analyze the listener's behavior and group them based on music preferences, age, gender, ethnicity, etc. These insights help them create ad campaigns for a specific target audience. One of their well-known ad campaigns was the meme-inspired ads for potential target customers, which was a huge success globally.

iii) CNN's for Classification of Songs and Audio Tracks

Spotify builds audio models to evaluate the songs and tracks, which helps develop better playlists and recommendations for its users. These allow Spotify to filter new tracks based on their lyrics and rhythms and recommend them to users like similar tracks ( collaborative filtering). Spotify also uses NLP ( Natural language processing) to scan articles and blogs to analyze the words used to describe songs and artists. These analytical insights can help group and identify similar artists and songs and leverage them to build playlists.

Here is a Music Recommender System Project for you to start learning. We have listed another music recommendations dataset for you to use for your projects: Dataset1 . You can use this dataset of Spotify metadata to classify songs based on artists, mood, liveliness. Plot histograms, heatmaps to get a better understanding of the dataset. Use classification algorithms like logistic regression, SVM, and Principal component analysis to generate valuable insights from the dataset.

Explore Categories

Below you will find case studies for data analytics in the travel and tourism industry.

Airbnb was born in 2007 in San Francisco and has since grown to 4 million Hosts and 5.6 million listings worldwide who have welcomed more than 1 billion guest arrivals in almost every country across the globe. Airbnb is active in every country on the planet except for Iran, Sudan, Syria, and North Korea. That is around 97.95% of the world. Using data as a voice of their customers, Airbnb uses the large volume of customer reviews, host inputs to understand trends across communities, rate user experiences, and uses these analytics to make informed decisions to build a better business model. The data scientists at Airbnb are developing exciting new solutions to boost the business and find the best mapping for its customers and hosts. Airbnb data servers serve approximately 10 million requests a day and process around one million search queries. Data is the voice of customers at AirBnB and offers personalized services by creating a perfect match between the guests and hosts for a supreme customer experience. 

i) Recommendation Systems and Search Ranking Algorithms

Airbnb helps people find 'local experiences' in a place with the help of search algorithms that make searches and listings precise. Airbnb uses a 'listing quality score' to find homes based on the proximity to the searched location and uses previous guest reviews. Airbnb uses deep neural networks to build models that take the guest's earlier stays into account and area information to find a perfect match. The search algorithms are optimized based on guest and host preferences, rankings, pricing, and availability to understand users’ needs and provide the best match possible.

ii) Natural Language Processing for Review Analysis

Airbnb characterizes data as the voice of its customers. The customer and host reviews give a direct insight into the experience. The star ratings alone cannot be an excellent way to understand it quantitatively. Hence Airbnb uses natural language processing to understand reviews and the sentiments behind them. The NLP models are developed using Convolutional neural networks .

Practice this Sentiment Analysis Project for analyzing product reviews to understand the basic concepts of natural language processing.

iii) Smart Pricing using Predictive Analytics

The Airbnb hosts community uses the service as a supplementary income. The vacation homes and guest houses rented to customers provide for rising local community earnings as Airbnb guests stay 2.4 times longer and spend approximately 2.3 times the money compared to a hotel guest. The profits are a significant positive impact on the local neighborhood community. Airbnb uses predictive analytics to predict the prices of the listings and help the hosts set a competitive and optimal price. The overall profitability of the Airbnb host depends on factors like the time invested by the host and responsiveness to changing demands for different seasons. The factors that impact the real-time smart pricing are the location of the listing, proximity to transport options, season, and amenities available in the neighborhood of the listing.

Here is a Price Prediction Project to help you understand the concept of predictive analysis which is widely common in case studies for data analytics. 

Uber is the biggest global taxi service provider. As of December 2018, Uber has 91 million monthly active consumers and 3.8 million drivers. Uber completes 14 million trips each day. Uber uses data analytics and big data-driven technologies to optimize their business processes and provide enhanced customer service. The Data Science team at uber has been exploring futuristic technologies to provide better service constantly. Machine learning and data analytics help Uber make data-driven decisions that enable benefits like ride-sharing, dynamic price surges, better customer support, and demand forecasting. Here are some of the real world data science projects used by uber:

i) Dynamic Pricing for Price Surges and Demand Forecasting

Uber prices change at peak hours based on demand. Uber uses surge pricing to encourage more cab drivers to sign up with the company, to meet the demand from the passengers. When the prices increase, the driver and the passenger are both informed about the surge in price. Uber uses a predictive model for price surging called the 'Geosurge' ( patented). It is based on the demand for the ride and the location.

ii) One-Click Chat

Uber has developed a Machine learning and natural language processing solution called one-click chat or OCC for coordination between drivers and users. This feature anticipates responses for commonly asked questions, making it easy for the drivers to respond to customer messages. Drivers can reply with the clock of just one button. One-Click chat is developed on Uber's machine learning platform Michelangelo to perform NLP on rider chat messages and generate appropriate responses to them.

iii) Customer Retention

Failure to meet the customer demand for cabs could lead to users opting for other services. Uber uses machine learning models to bridge this demand-supply gap. By using prediction models to predict the demand in any location, uber retains its customers. Uber also uses a tier-based reward system, which segments customers into different levels based on usage. The higher level the user achieves, the better are the perks. Uber also provides personalized destination suggestions based on the history of the user and their frequently traveled destinations.

You can take a look at this Python Chatbot Project and build a simple chatbot application to understand better the techniques used for natural language processing. You can also practice the working of a demand forecasting model with this project using time series analysis. You can look at this project which uses time series forecasting and clustering on a dataset containing geospatial data for forecasting customer demand for ola rides.

Explore More  Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

7) LinkedIn 

LinkedIn is the largest professional social networking site with nearly 800 million members in more than 200 countries worldwide. Almost 40% of the users access LinkedIn daily, clocking around 1 billion interactions per month. The data science team at LinkedIn works with this massive pool of data to generate insights to build strategies, apply algorithms and statistical inferences to optimize engineering solutions, and help the company achieve its goals. Here are some of the real world data science projects at LinkedIn:

i) LinkedIn Recruiter Implement Search Algorithms and Recommendation Systems

LinkedIn Recruiter helps recruiters build and manage a talent pool to optimize the chances of hiring candidates successfully. This sophisticated product works on search and recommendation engines. The LinkedIn recruiter handles complex queries and filters on a constantly growing large dataset. The results delivered have to be relevant and specific. The initial search model was based on linear regression but was eventually upgraded to Gradient Boosted decision trees to include non-linear correlations in the dataset. In addition to these models, the LinkedIn recruiter also uses the Generalized Linear Mix model to improve the results of prediction problems to give personalized results.

ii) Recommendation Systems Personalized for News Feed

The LinkedIn news feed is the heart and soul of the professional community. A member's newsfeed is a place to discover conversations among connections, career news, posts, suggestions, photos, and videos. Every time a member visits LinkedIn, machine learning algorithms identify the best exchanges to be displayed on the feed by sorting through posts and ranking the most relevant results on top. The algorithms help LinkedIn understand member preferences and help provide personalized news feeds. The algorithms used include logistic regression, gradient boosted decision trees and neural networks for recommendation systems.

iii) CNN's to Detect Inappropriate Content

To provide a professional space where people can trust and express themselves professionally in a safe community has been a critical goal at LinkedIn. LinkedIn has heavily invested in building solutions to detect fake accounts and abusive behavior on their platform. Any form of spam, harassment, inappropriate content is immediately flagged and taken down. These can range from profanity to advertisements for illegal services. LinkedIn uses a Convolutional neural networks based machine learning model. This classifier trains on a training dataset containing accounts labeled as either "inappropriate" or "appropriate." The inappropriate list consists of accounts having content from "blocklisted" phrases or words and a small portion of manually reviewed accounts reported by the user community.

Here is a Text Classification Project to help you understand NLP basics for text classification. You can find a news recommendation system dataset to help you build a personalized news recommender system. You can also use this dataset to build a classifier using logistic regression, Naive Bayes, or Neural networks to classify toxic comments.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Pfizer is a multinational pharmaceutical company headquartered in New York, USA. One of the largest pharmaceutical companies globally known for developing a wide range of medicines and vaccines in disciplines like immunology, oncology, cardiology, and neurology. Pfizer became a household name in 2010 when it was the first to have a COVID-19 vaccine with FDA. In early November 2021, The CDC has approved the Pfizer vaccine for kids aged 5 to 11. Pfizer has been using machine learning and artificial intelligence to develop drugs and streamline trials, which played a massive role in developing and deploying the COVID-19 vaccine. Here are a few data analytics case studies by Pfizer :

i) Identifying Patients for Clinical Trials

Artificial intelligence and machine learning are used to streamline and optimize clinical trials to increase their efficiency. Natural language processing and exploratory data analysis of patient records can help identify suitable patients for clinical trials. These can help identify patients with distinct symptoms. These can help examine interactions of potential trial members' specific biomarkers, predict drug interactions and side effects which can help avoid complications. Pfizer's AI implementation helped rapidly identify signals within the noise of millions of data points across their 44,000-candidate COVID-19 clinical trial.

ii) Supply Chain and Manufacturing

Data science and machine learning techniques help pharmaceutical companies better forecast demand for vaccines and drugs and distribute them efficiently. Machine learning models can help identify efficient supply systems by automating and optimizing the production steps. These will help supply drugs customized to small pools of patients in specific gene pools. Pfizer uses Machine learning to predict the maintenance cost of equipment used. Predictive maintenance using AI is the next big step for Pharmaceutical companies to reduce costs.

iii) Drug Development

Computer simulations of proteins, and tests of their interactions, and yield analysis help researchers develop and test drugs more efficiently. In 2016 Watson Health and Pfizer announced a collaboration to utilize IBM Watson for Drug Discovery to help accelerate Pfizer's research in immuno-oncology, an approach to cancer treatment that uses the body's immune system to help fight cancer. Deep learning models have been used recently for bioactivity and synthesis prediction for drugs and vaccines in addition to molecular design. Deep learning has been a revolutionary technique for drug discovery as it factors everything from new applications of medications to possible toxic reactions which can save millions in drug trials.

You can create a Machine learning model to predict molecular activity to help design medicine using this dataset . You may build a CNN or a Deep neural network for this data analyst case study project.

Access Data Science and Machine Learning Project Code Examples

9) Shell Data Analyst Case Study Project

Shell is a global group of energy and petrochemical companies with over 80,000 employees in around 70 countries. Shell uses advanced technologies and innovations to help build a sustainable energy future. Shell is going through a significant transition as the world needs more and cleaner energy solutions to be a clean energy company by 2050. It requires substantial changes in the way in which energy is used. Digital technologies, including AI and Machine Learning, play an essential role in this transformation. These include efficient exploration and energy production, more reliable manufacturing, more nimble trading, and a personalized customer experience. Using AI in various phases of the organization will help achieve this goal and stay competitive in the market. Here are a few data analytics case studies in the petrochemical industry:

i) Precision Drilling

Shell is involved in the processing mining oil and gas supply, ranging from mining hydrocarbons to refining the fuel to retailing them to customers. Recently Shell has included reinforcement learning to control the drilling equipment used in mining. Reinforcement learning works on a reward-based system based on the outcome of the AI model. The algorithm is designed to guide the drills as they move through the surface, based on the historical data from drilling records. It includes information such as the size of drill bits, temperatures, pressures, and knowledge of the seismic activity. This model helps the human operator understand the environment better, leading to better and faster results will minor damage to machinery used. 

ii) Efficient Charging Terminals

Due to climate changes, governments have encouraged people to switch to electric vehicles to reduce carbon dioxide emissions. However, the lack of public charging terminals has deterred people from switching to electric cars. Shell uses AI to monitor and predict the demand for terminals to provide efficient supply. Multiple vehicles charging from a single terminal may create a considerable grid load, and predictions on demand can help make this process more efficient.

iii) Monitoring Service and Charging Stations

Another Shell initiative trialed in Thailand and Singapore is the use of computer vision cameras, which can think and understand to watch out for potentially hazardous activities like lighting cigarettes in the vicinity of the pumps while refueling. The model is built to process the content of the captured images and label and classify it. The algorithm can then alert the staff and hence reduce the risk of fires. You can further train the model to detect rash driving or thefts in the future.

Here is a project to help you understand multiclass image classification. You can use the Hourly Energy Consumption Dataset to build an energy consumption prediction model. You can use time series with XGBoost to develop your model.

10) Zomato Case Study on Data Analytics

Zomato was founded in 2010 and is currently one of the most well-known food tech companies. Zomato offers services like restaurant discovery, home delivery, online table reservation, online payments for dining, etc. Zomato partners with restaurants to provide tools to acquire more customers while also providing delivery services and easy procurement of ingredients and kitchen supplies. Currently, Zomato has over 2 lakh restaurant partners and around 1 lakh delivery partners. Zomato has closed over ten crore delivery orders as of date. Zomato uses ML and AI to boost their business growth, with the massive amount of data collected over the years from food orders and user consumption patterns. Here are a few examples of data analyst case study project developed by the data scientists at Zomato:

i) Personalized Recommendation System for Homepage

Zomato uses data analytics to create personalized homepages for its users. Zomato uses data science to provide order personalization, like giving recommendations to the customers for specific cuisines, locations, prices, brands, etc. Restaurant recommendations are made based on a customer's past purchases, browsing history, and what other similar customers in the vicinity are ordering. This personalized recommendation system has led to a 15% improvement in order conversions and click-through rates for Zomato. 

You can use the Restaurant Recommendation Dataset to build a restaurant recommendation system to predict what restaurants customers are most likely to order from, given the customer location, restaurant information, and customer order history.

ii) Analyzing Customer Sentiment

Zomato uses Natural language processing and Machine learning to understand customer sentiments using social media posts and customer reviews. These help the company gauge the inclination of its customer base towards the brand. Deep learning models analyze the sentiments of various brand mentions on social networking sites like Twitter, Instagram, Linked In, and Facebook. These analytics give insights to the company, which helps build the brand and understand the target audience.

iii) Predicting Food Preparation Time (FPT)

Food delivery time is an essential variable in the estimated delivery time of the order placed by the customer using Zomato. The food preparation time depends on numerous factors like the number of dishes ordered, time of the day, footfall in the restaurant, day of the week, etc. Accurate prediction of the food preparation time can help make a better prediction of the Estimated delivery time, which will help delivery partners less likely to breach it. Zomato uses a Bidirectional LSTM-based deep learning model that considers all these features and provides food preparation time for each order in real-time. 

Data scientists are companies' secret weapons when analyzing customer sentiments and behavior and leveraging it to drive conversion, loyalty, and profits. These 10 data science case studies projects with examples and solutions show you how various organizations use data science technologies to succeed and be at the top of their field! To summarize, Data Science has not only accelerated the performance of companies but has also made it possible to manage & sustain their performance with ease.

FAQs on Data Analysis Case Studies

A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

To create a data science case study, identify a relevant problem, define objectives, and gather suitable data. Clean and preprocess data, perform exploratory data analysis, and apply appropriate algorithms for analysis. Summarize findings, visualize results, and provide actionable recommendations, showcasing the problem-solving potential of data science techniques.

Access Solved Big Data and Data Science Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Currently taking bookings for January 2024 >>

in a case study the data analysis

The Convergence Blog

The convergence - an online community space that's dedicated to empowering operators in the data industry by providing news and education about evergreen strategies, late-breaking data & ai developments, and free or low-cost upskilling resources that you need to thrive as a leader in the data & ai space., data analysis case study: learn from humana’s automated data analysis project.

Lillian Pierson, P.E.

Lillian Pierson, P.E.

Playback speed:

Got data? Great! Looking for that perfect data analysis case study to help you get started using it? You’re in the right place.

If you’ve ever struggled to decide what to do next with your data projects, to actually find meaning in the data, or even to decide what kind of data to collect, then KEEP READING…

Deep down, you know what needs to happen. You need to initiate and execute a data strategy that really moves the needle for your organization. One that produces seriously awesome business results.

But how you’re in the right place to find out..

As a data strategist who has worked with 10 percent of Fortune 100 companies, today I’m sharing with you a case study that demonstrates just how real businesses are making real wins with data analysis. 

In the post below, we’ll look at:

  • A shining data success story;
  • What went on ‘under-the-hood’ to support that successful data project; and
  • The exact data technologies used by the vendor, to take this project from pure strategy to pure success

If you prefer to watch this information rather than read it, it’s captured in the video below:

Here’s the url too: https://youtu.be/xMwZObIqvLQ

3 Action Items You Need To Take

To actually use the data analysis case study you’re about to get – you need to take 3 main steps. Those are:

  • Reflect upon your organization as it is today (I left you some prompts below – to help you get started)
  • Review winning data case collections (starting with the one I’m sharing here) and identify 5 that seem the most promising for your organization given it’s current set-up
  • Assess your organization AND those 5 winning case collections. Based on that assessment, select the “QUICK WIN” data use case that offers your organization the most bang for it’s buck

Step 1: Reflect Upon Your Organization

Whenever you evaluate data case collections to decide if they’re a good fit for your organization, the first thing you need to do is organize your thoughts with respect to your organization as it is today.

Before moving into the data analysis case study, STOP and ANSWER THE FOLLOWING QUESTIONS – just to remind yourself:

  • What is the business vision for our organization?
  • What industries do we primarily support?
  • What data technologies do we already have up and running, that we could use to generate even more value?
  • What team members do we have to support a new data project? And what are their data skillsets like?
  • What type of data are we mostly looking to generate value from? Structured? Semi-Structured? Un-structured? Real-time data? Huge data sets? What are our data resources like?

Jot down some notes while you’re here. Then keep them in mind as you read on to find out how one company, Humana, used its data to achieve a 28 percent increase in customer satisfaction. Also include its 63 percent increase in employee engagement! (That’s such a seriously impressive outcome, right?!)

Step 2: Review Data Case Studies

Here we are, already at step 2. It’s time for you to start reviewing data analysis case studies  (starting with the one I’m sharing below). I dentify 5 that seem the most promising for your organization given its current set-up.

Humana’s Automated Data Analysis Case Study

The key thing to note here is that the approach to creating a successful data program varies from industry to industry .

Let’s start with one to demonstrate the kind of value you can glean from these kinds of success stories.

Humana has provided health insurance to Americans for over 50 years. It is a service company focused on fulfilling the needs of its customers. A great deal of Humana’s success as a company rides on customer satisfaction, and the frontline of that battle for customers’ hearts and minds is Humana’s customer service center.

Call centers are hard to get right. A lot of emotions can arise during a customer service call, especially one relating to health and health insurance. Sometimes people are frustrated. At times, they’re upset. Also, there are times the customer service representative becomes aggravated, and the overall tone and progression of the phone call goes downhill. This is of course very bad for customer satisfaction.

Humana wanted to use artificial intelligence to improve customer satisfaction (and thus, customer retention rates & profits per customer).

Humana wanted to find a way to use artificial intelligence to monitor their phone calls and help their agents do a better job connecting with their customers in order to improve customer satisfaction (and thus, customer retention rates & profits per customer ).

In light of their business need, Humana worked with a company called Cogito, which specializes in voice analytics technology.

Cogito offers a piece of AI technology called Cogito Dialogue. It’s been trained to identify certain conversational cues as a way of helping call center representatives and supervisors stay actively engaged in a call with a customer.

The AI listens to cues like the customer’s voice pitch.

If it’s rising, or if the call representative and the customer talk over each other, then the dialogue tool will send out electronic alerts to the agent during the call.

Humana fed the dialogue tool customer service data from 10,000 calls and allowed it to analyze cues such as keywords, interruptions, and pauses, and these cues were then linked with specific outcomes. For example, if the representative is receiving a particular type of cues, they are likely to get a specific customer satisfaction result.

The Outcome

Customers were happier, and customer service representatives were more engaged..

This automated solution for data analysis has now been deployed in 200 Humana call centers and the company plans to roll it out to 100 percent of its centers in the future.

The initiative was so successful, Humana has been able to focus on next steps in its data program. The company now plans to begin predicting the type of calls that are likely to go unresolved, so they can send those calls over to management before they become frustrating to the customer and customer service representative alike.

What does this mean for you and your business?

Well, if you’re looking for new ways to generate value by improving the quantity and quality of the decision support that you’re providing to your customer service personnel, then this may be a perfect example of how you can do so.

Humana’s Business Use Cases

Humana’s data analysis case study includes two key business use cases:

  • Analyzing customer sentiment; and
  • Suggesting actions to customer service representatives.

Analyzing Customer Sentiment

First things first, before you go ahead and collect data, you need to ask yourself who and what is involved in making things happen within the business.

In the case of Humana, the actors were:

  • The health insurance system itself
  • The customer, and
  • The customer service representative

As you can see in the use case diagram above, the relational aspect is pretty simple. You have a customer service representative and a customer. They are both producing audio data, and that audio data is being fed into the system.

Humana focused on collecting the key data points, shown in the image below, from their customer service operations.

By collecting data about speech style, pitch, silence, stress in customers’ voices, length of call, speed of customers’ speech, intonation, articulation, silence, and representatives’  manner of speaking, Humana was able to analyze customer sentiment and introduce techniques for improved customer satisfaction.

Having strategically defined these data points, the Cogito technology was able to generate reports about customer sentiment during the calls.

Suggesting actions to customer service representatives.

The second use case for the Humana data program follows on from the data gathered in the first case.

In Humana’s case, Cogito generated a host of call analyses and reports about key call issues.

In the second business use case, Cogito was able to suggest actions to customer service representatives, in real-time , to make use of incoming data and help improve customer satisfaction on the spot.

The technology Humana used provided suggestions via text message to the customer service representative, offering the following types of feedback:

  • The tone of voice is too tense
  • The speed of speaking is high
  • The customer representative and customer are speaking at the same time

These alerts allowed the Humana customer service representatives to alter their approach immediately , improving the quality of the interaction and, subsequently, the customer satisfaction.

The preconditions for success in this use case were:

  • The call-related data must be collected and stored
  • The AI models must be in place to generate analysis on the data points that are recorded during the calls

Evidence of success can subsequently be found in a system that offers real-time suggestions for courses of action that the customer service representative can take to improve customer satisfaction.

Thanks to this data-intensive business use case, Humana was able to increase customer satisfaction, improve customer retention rates, and drive profits per customer.

The Technology That Supports This Data Analysis Case Study

I promised to dip into the tech side of things. This is especially for those of you who are interested in the ins and outs of how projects like this one are actually rolled out.

Here’s a little rundown of the main technologies we discovered when we investigated how Cogito runs in support of its clients like Humana.

  • For cloud data management Cogito uses AWS, specifically the Athena product
  • For on-premise big data management, the company used Apache HDFS – the distributed file system for storing big data
  • They utilize MapReduce, for processing their data
  • And Cogito also has traditional systems and relational database management systems such as PostgreSQL
  • In terms of analytics and data visualization tools, Cogito makes use of Tableau
  • And for its machine learning technology, these use cases required people with knowledge in Python, R, and SQL, as well as deep learning (Cogito uses the PyTorch library and the TensorFlow library)

These data science skill sets support the effective computing, deep learning , and natural language processing applications employed by Humana for this use case.

If you’re looking to hire people to help with your own data initiative, then people with those skills listed above, and with experience in these specific technologies, would be a huge help.

Step 3: S elect The “Quick Win” Data Use Case

Still there? Great!

It’s time to close the loop.

Remember those notes you took before you reviewed the study? I want you to STOP here and assess. Does this Humana case study seem applicable and promising as a solution, given your organization’s current set-up…

YES ▶ Excellent!

Earmark it and continue exploring other winning data use cases until you’ve identified 5 that seem like great fits for your businesses needs. Evaluate those against your organization’s needs, and select the very best fit to be your “quick win” data use case. Develop your data strategy around that.

NO , Lillian – It’s not applicable. ▶  No problem.

Discard the information and continue exploring the winning data use cases we’ve categorized for you according to business function and industry. Save time by dialing down into the business function you know your business really needs help with now. Identify 5 winning data use cases that seem like great fits for your businesses needs. Evaluate those against your organization’s needs, and select the very best fit to be your “quick win” data use case. Develop your data strategy around that data use case.

More resources to get ahead...

Get income-generating ideas for data professionals, are you tired of relying on one employer for your income are you dreaming of a side hustle that won’t put you at risk of getting fired or sued well, my friend, you’re in luck..

ideas for data analyst side jobs

This 48-page listing is here to rescue you from the drudgery of corporate slavery and set you on the path to start earning more money from your existing data expertise. Spend just 1 hour with this pdf and I can guarantee you’ll be bursting at the seams with practical, proven & profitable ideas for new income-streams you can create from your existing expertise. Learn more here!

Get the convergence newsletter.

in a case study the data analysis

Income-Generating Ideas For Data Professionals

A 48-page listing of income-generating product and service ideas for data professionals who want to earn additional money from their data expertise without relying on an employer to make it happen..

in a case study the data analysis

Data Strategy Action Plan

A step-by-step checklist & collaborative trello board planner for data professionals who want to get unstuck & up-leveled into their next promotion by delivering a fail-proof data strategy plan for their data projects..

in a case study the data analysis

Get more actionable advice by joining The Convergence Newsletter for free below.

Machine Learning Security - how to protect your networks and applications in the ML environment

Machine Learning Security: Protecting Networks and Applications in Your ML Environment

Copy of Search Canva

AoF 64: The Role of Exec & Expectation Mngt in Data Science w/ Heather Smith

The generative ai ethics involved in RLHF seem iffy

Ugly Generative AI Ethics Concerns: RLHF Edition

using ai to streamline data collection has never been easier

5 Ways AI Helps Streamline Data Collection

learn important 2023 trends in cloud security consulting services

Cloud Security Consulting Services: Key Benefits, Trends & Important Cloud Strategy Trends for 2023

Proven evergreen data migration strategy for data professionals who want to GET PROMOTED FAST

Proven Evergreen Data Migration Strategy for Data Professionals Who Want to GET PROMOTED FAST

in a case study the data analysis

Fractional CMO for deep tech B2B businesses. Specializing in go-to-market strategy, SaaS product growth, and consulting revenue growth. American expat serving clients worldwide since 2012.

Get connected, © data-mania, 2012 - 2024+, all rights reserved - terms & conditions  -  privacy policy | products protected by copyscape, privacy overview.

in a case study the data analysis

Get The Newsletter

in a case study the data analysis

The New Equation

in a case study the data analysis

Executive leadership hub - What’s important to the C-suite?

in a case study the data analysis

Tech Effect

in a case study the data analysis

Shared success benefits

Loading Results

No Match Found

Data analytics case study data files

Inventory analysis case study data files:.

Beginning Inventory

Purchase Prices

Vendor Invoices

Ending Inventory

Inventory Analysis Case Study Instructor files:

Instructor guide

Phase 1 - Data Collection and Preparation

Phase 2 - Data Discovery and Visualization

Phase 3 - Introduction to Statistical Analysis

in a case study the data analysis

Stay up to date

Subscribe to our University Relations distribution list

Julie Peters

Julie Peters

University Relations leader, PwC US

Linkedin Follow

© 2017 - 2024 PwC. All rights reserved. PwC refers to the PwC network and/or one or more of its member firms, each of which is a separate legal entity. Please see www.pwc.com/structure for further details.

  • Data Privacy Framework
  • Cookie info
  • Terms and conditions
  • Site provider
  • Your Privacy Choices

FOR EMPLOYERS

Top 10 real-world data science case studies.

Data Science Case Studies

Aditya Sharma

Aditya is a content writer with 5+ years of experience writing for various industries including Marketing, SaaS, B2B, IT, and Edtech among others. You can find him watching anime or playing games when he’s not writing.

Frequently Asked Questions

Real-world data science case studies differ significantly from academic examples. While academic exercises often feature clean, well-structured data and simplified scenarios, real-world projects tackle messy, diverse data sources with practical constraints and genuine business objectives. These case studies reflect the complexities data scientists face when translating data into actionable insights in the corporate world.

Real-world data science projects come with common challenges. Data quality issues, including missing or inaccurate data, can hinder analysis. Domain expertise gaps may result in misinterpretation of results. Resource constraints might limit project scope or access to necessary tools and talent. Ethical considerations, like privacy and bias, demand careful handling.

Lastly, as data and business needs evolve, data science projects must adapt and stay relevant, posing an ongoing challenge.

Real-world data science case studies play a crucial role in helping companies make informed decisions. By analyzing their own data, businesses gain valuable insights into customer behavior, market trends, and operational efficiencies.

These insights empower data-driven strategies, aiding in more effective resource allocation, product development, and marketing efforts. Ultimately, case studies bridge the gap between data science and business decision-making, enhancing a company's ability to thrive in a competitive landscape.

Key takeaways from these case studies for organizations include the importance of cultivating a data-driven culture that values evidence-based decision-making. Investing in robust data infrastructure is essential to support data initiatives. Collaborating closely between data scientists and domain experts ensures that insights align with business goals.

Finally, continuous monitoring and refinement of data solutions are critical for maintaining relevance and effectiveness in a dynamic business environment. Embracing these principles can lead to tangible benefits and sustainable success in real-world data science endeavors.

Data science is a powerful driver of innovation and problem-solving across diverse industries. By harnessing data, organizations can uncover hidden patterns, automate repetitive tasks, optimize operations, and make informed decisions.

In healthcare, for example, data-driven diagnostics and treatment plans improve patient outcomes. In finance, predictive analytics enhances risk management. In transportation, route optimization reduces costs and emissions. Data science empowers industries to innovate and solve complex challenges in ways that were previously unimaginable.

Hire remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.

  • Cancer Nursing Practice
  • Emergency Nurse
  • Evidence-Based Nursing
  • Learning Disability Practice
  • Mental Health Practice
  • Nurse Researcher
  • Nursing Children and Young People
  • Nursing Management
  • Nursing Older People
  • Nursing Standard
  • Primary Health Care
  • RCN Nursing Awards
  • Nursing Live
  • Nursing Careers and Job Fairs
  • CPD webinars on-demand
  • --> Advanced -->

in a case study the data analysis

  • Clinical articles
  • Expert advice
  • Career advice
  • Revalidation

Data analysis Previous     Next

Qualitative case study data analysis: an example from practice, catherine houghton lecturer, school of nursing and midwifery, national university of ireland, galway, republic of ireland, kathy murphy professor of nursing, national university of ireland, galway, ireland, david shaw lecturer, open university, milton keynes, uk, dympna casey senior lecturer, national university of ireland, galway, ireland.

Aim To illustrate an approach to data analysis in qualitative case study methodology.

Background There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research.

Data sources The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software.

Review methods Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided.

Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources.

Conclusion By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis.

Implications for research/practice This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.

Nurse Researcher . 22, 5, 8-12. doi: 10.7748/nr.22.5.8.e1307

This article has been subject to double blind peer review

None declared

Received: 02 February 2014

Accepted: 16 April 2014

Case study data analysis - case study research methodology - clinical skills research - qualitative case study methodology - qualitative data analysis - qualitative research

User not found

Want to read more?

Already have access log in, 3-month trial offer for £5.25/month.

  • Unlimited access to all 10 RCNi Journals
  • RCNi Learning featuring over 175 modules to easily earn CPD time
  • NMC-compliant RCNi Revalidation Portfolio to stay on track with your progress
  • Personalised newsletters tailored to your interests
  • A customisable dashboard with over 200 topics

Alternatively, you can purchase access to this article for the next seven days. Buy now

Are you a student? Our student subscription has content especially for you. Find out more

in a case study the data analysis

15 May 2015 / Vol 22 issue 5

TABLE OF CONTENTS

DIGITAL EDITION

  • LATEST ISSUE
  • SIGN UP FOR E-ALERT
  • WRITE FOR US
  • PERMISSIONS

Share article: Qualitative case study data analysis: an example from practice

We use cookies on this site to enhance your user experience.

By clicking any link on this page you are giving your consent for us to set cookies.

  • Search Menu
  • Chemical Biology and Nucleic Acid Chemistry
  • Computational Biology
  • Critical Reviews and Perspectives
  • Data Resources and Analyses
  • Gene Regulation, Chromatin and Epigenetics
  • Genome Integrity, Repair and Replication
  • Methods Online
  • Molecular Biology
  • Nucleic Acid Enzymes
  • RNA and RNA-protein complexes
  • Structural Biology
  • Synthetic Biology and Bioengineering
  • Advance Articles
  • Breakthrough Articles
  • Special Collections
  • Scope and Criteria for Consideration
  • Author Guidelines
  • Data Deposition Policy
  • Database Issue Guidelines
  • Web Server Issue Guidelines
  • Submission Site
  • About Nucleic Acids Research
  • Editors & Editorial Board
  • Information of Referees
  • Self-Archiving Policy
  • Dispatch Dates
  • Advertising and Corporate Services
  • Journals Career Network
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

Introduction, overall design and workflow of metaboanalyst 6.0, supporting asari and ms2 spectra in lc–ms spectra processing workflow, ms2 peak annotation, causal analysis via two-sample mendelian randomization, dose–response analysis, updated compound database and knowledge libraries, other features, comparison with other tools, data availability, acknowledgements, metaboanalyst 6.0: towards a unified platform for metabolomics data processing, analysis and interpretation.

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Zhiqiang Pang, Yao Lu, Guangyan Zhou, Fiona Hui, Lei Xu, Charles Viau, Aliya F Spigelman, Patrick E MacDonald, David S Wishart, Shuzhao Li, Jianguo Xia, MetaboAnalyst 6.0: towards a unified platform for metabolomics data processing, analysis and interpretation, Nucleic Acids Research , 2024;, gkae253, https://doi.org/10.1093/nar/gkae253

  • Permissions Icon Permissions

We introduce MetaboAnalyst version 6.0 as a unified platform for processing, analyzing, and interpreting data from targeted as well as untargeted metabolomics studies using liquid chromatography - mass spectrometry (LC–MS). The two main objectives in developing version 6.0 are to support tandem MS (MS2) data processing and annotation, as well as to support the analysis of data from exposomics studies and related experiments. Key features of MetaboAnalyst 6.0 include: (i) a significantly enhanced Spectra Processing module with support for MS2 data and the asari algorithm; (ii) a MS2 Peak Annotation module based on comprehensive MS2 reference databases with fragment-level annotation; (iii) a new Statistical Analysis module dedicated for handling complex study design with multiple factors or phenotypic descriptors; (iv) a Causal Analysis module for estimating metabolite - phenotype causal relations based on two-sample Mendelian randomization, and (v) a Dose-Response Analysis module for benchmark dose calculations. In addition, we have also improved MetaboAnalyst's visualization functions, updated its compound database and metabolite sets, and significantly expanded its pathway analysis support to around 130 species. MetaboAnalyst 6.0 is freely available at https://www.metaboanalyst.ca .

Graphical Abstract

Metabolomics involves the comprehensive study of all small molecules in a biological system. It has diverse applications ranging from basic biochemical research to clinical investigation of diseases, food safety assessment, environmental monitoring, etc. ( 1–5 ). User-friendly and easily accessible bioinformatics tools are essential to deal with the complex data produced from metabolomics studies. MetaboAnalyst is a user-friendly, web-based platform developed to provide comprehensive support for metabolomics data analysis ( 6–10 ). The early versions (1.0–3.0) focused primarily on supporting statistical and functional analysis of targeted metabolomics data. Increasing support for untargeted metabolomics data from liquid chromatography–mass spectrometry (LC–MS) experiments have been gradually introduced in more recent versions of MetaboAnalyst. For instance, version 4.0 implemented a new module to support functional analysis directly from LC–MS peaks, while version 5.0 added an auto-optimized LC–MS spectral processing module that works seamlessly with the functional analysis module. A detailed protocol on how to use different modules for comprehensive analysis of untargeted metabolomics data was published in 2022 ( 11 ). According to Google Analytics, the MetaboAnalyst web server has processed over 2 million jobs, including 33 000 spectral processing jobs over the past 12 months. Many of these jobs are associated with untargeted metabolomics and exposomics studies.

Untargeted metabolomics data generated from high-resolution LC–MS instruments are typically characterized by thousands of peaks with unknown chemical identities. To assist with compound identification, tandem MS (called MS/MS or MS2) spectra are often collected from pooled QC samples during the experiments ( 12 ). The two commonly used MS2 methods are data-dependent acquisition (DDA) and data-independent acquisition (DIA), with sequential window acquisition of all theoretical mass spectra (SWATH) being a promising special case of the latter. DDA data usually have clear associations between the precursor ions and the corresponding MS2 spectra, while DIA data generally require deconvolution of the MS2 data to reconstruct associations with their precursor ions ( 13 ). Incorporating MS2 processing and annotation into untargeted metabolomics workflows can greatly improve compound annotations and functional interpretation.

Exposomics is an emerging field centered on profiling the complete set of exposures individuals encounter across their lifespan, which often involves MS analysis of chemical mixtures traditionally rooted in toxicology and public health ( 4 ). Untargeted LC–MS based metabolomics is increasingly applied to exposomics and toxicology studies. Exposomics data from human cohorts is often associated with complex phenotypic data due to their observational nature. This requires more sophisticated data analysis and visualization methods that can take into consideration of multiple factors or covariates. Exposomics studies typically produce long lists of potential biomarkers that are significantly associated with phenotypes of interest. Identification of causal links from this large number of metabolite-phenotype relations is a natural next step. It has become possible recently with the availability of many metabolomic genome-wide association studies (mGWAS) that link metabolites and genotypes ( 14–16 ). By integrating mGWAS data with comparable GWAS data that associate genotypes with various phenotypes ( 17 ), we can now estimate causal relationships between a metabolite and a phenotype of interest through Mendelian randomization (MR) ( 18 ). Dose-response experiments are often performed to further quantify cause-and-effect relationships. The experiments are often conducted at multiple dose levels using in vitro assays or animal models to calculate dose-response curves for risk assessment of chemical exposures ( 19–21 ).

To address these emerging needs from both the metabolomics and exposomics communities, we have developed MetaboAnalyst version 6.0. This version includes many key features:

A significantly enhanced spectra processing workflow with the addition of asari algorithm for LC–MS spectra processing ( 22 ), as well as support for MS2 (DDA or SWATH-DIA) data processing.

A new module for MS2 spectral database searching for compound identification and results visualization.

A new module for causal analysis between metabolites and phenotypes of interest based on two-sample MR (2SMR).

A new module for dose-response analysis including dose-response curve fitting and benchmark dose (BMD) calculation.

A new module for statistical analysis with complex metadata;

A number of other important updates including: improved functional analysis of untargeted metabolomics data by integrating MS2-based compound identification; updated compound database, pathways and metabolite sets; as well as improved data visualization support across multiple modules.

MetaboAnalyst 6.0 is feely accessible at https://www.metaboanalyst.ca , with comprehensive documentations and updated tutorials. To better engage with our users, a dedicated user forum ( https://omicsforum.ca ) has been operational since May 2022. To dates, this forum contains >4000 posts on ∼700 topics related to different aspects of using MetaboAnalyst.

MetaboAnalyst 6.0 accepts a total of five different data types across various modules encompassing spectra processing, statistical analysis, functional analysis, meta-analysis, and integration with other omics data. Once the data are uploaded, all analysis steps are conducted within a consistent framework including data integrity checks, parameter customization, and results visualization (Figure 1 ). Some of the key features in MetaboAnalyst 6.0 are described below.

MetaboAnalyst 6.0 workflow for targeted and untargeted metabolomics data. Multiple data input types are accepted. Untargeted metabolomics inputs require extra steps for spectra processing and peak annotation. The result table can be used for statistical and functional analysis within a consistent workflow in the same manner as for targeted metabolomics data.

MetaboAnalyst 6.0 workflow for targeted and untargeted metabolomics data. Multiple data input types are accepted. Untargeted metabolomics inputs require extra steps for spectra processing and peak annotation. The result table can be used for statistical and functional analysis within a consistent workflow in the same manner as for targeted metabolomics data.

LC–MS spectra processing remains an active research topic in the field of untargeted metabolomics. Many powerful tools have been developed over time, including XCMS ( 23 ), MZmine ( 24 ), MS-DIAL ( 13 ) and asari ( 22 ). In addition to using different peak detection algorithms, most tools require manual parameter tuning to ensure good results. Such practice often leads to results that vary significantly ( 25 ). To mitigate this issue, MetaboAnalyst 5.0 introduced an auto-optimized LC–MS processing pipeline to minimize the parameter-related effects ( 10 , 26 ). The asari software has introduced a set of quality metrics, concepts of mass tracks and composite mass tracks and new algorithmic design to minimize errors in feature correspondence. It requires minimal parameter tuning while achieving much faster computational performance ( 22 ). The asari algorithm is now available in the LC–MS spectra processing options, alongside the traditional approaches.

MS2 spectra processing and metabolite identification are important components of untargeted metabolomics. It is now recognized that MS2 spectral deconvolution is necessary to achieve high-quality compound identification results for both DDA and SWATH-DIA data ( 27–29 ). MetaboAnalyst 6.0 offers an efficient, auto-optimized pipeline for MS2 spectral deconvolution. The DDA data deconvolution method is derived from the DecoID algorithm ( 28 ), which employs a database-dependent regression model to deconvolve contaminated spectra. The SWATH-DIA data deconvolution algorithm is based on the DecoMetDIA method ( 29 ), with the core algorithm re-implemented using a Rcpp/C++ framework to achieve high performance. When MS2 spectra replicates are provided, an extra step will be performed to generate consensus spectra across replicates. The consensus spectra are searched against MetaboAnalyst's curated MS2 reference databases for compound identification based on dot product ( 28 ) or spectral entropy ( 30 ) similarity scores. The complete pipelines for DDA and SWATH-DIA are available from the Spectra Processing [LC–MS w/wo MS2] module.

Raw spectra must be saved in common open formats and uploaded individually as separate zip files. LC–MS spectra data is mandatory, while MS2 is optional. Upon data uploading, MetaboAnalyst 6.0 first validates the status of the MS files. For SWATH-DIA data, the SWATH window design is automatically extracted from the spectra. If the related information is missing, users will be prompted to manually enter the window design. On the parameters setting page, users can choose the auto-optimized centWave algorithm ( 26 ) or the asari algorithm for LC–MS data processing. If MS2 data is included, spectra deconvolution, consensus, and database searching will be performed using the identified MS features as target list. Once the spectra processing is complete, users can explore both MS and MS2 data processing results (Figure 2A - B ) and download the files or directly go to the Functional Analysis module.

Example outputs from MetaboAnalyst 6.0. (A) Integrated 3D PCA score and loading plots summarizing the raw spectra processing results. (B) An interactive mirror plot showing the MS2 matching result. Matched fragments are marked with a red diamond. (C) Functional analysis results with the top four significant pathways labelled. (D) A forest plot comparing the effect sizes calculated based on individual SNPs (black) or using all SNPs by different MR methods (red). (E) Bar plots of the dose response curve fitting results showing how many times each model type was identified as the best fit. (F) A dose-response curve fitting result showing each of the concentration values (black points), the fitted curve (solid blue line), and the estimated benchmark dose (solid red line) with its lower and upper 95% confidence intervals (dashed red lines), respectively.

Example outputs from MetaboAnalyst 6.0. ( A ) Integrated 3D PCA score and loading plots summarizing the raw spectra processing results. ( B ) An interactive mirror plot showing the MS2 matching result. Matched fragments are marked with a red diamond. ( C ) Functional analysis results with the top four significant pathways labelled. ( D ) A forest plot comparing the effect sizes calculated based on individual SNPs (black) or using all SNPs by different MR methods (red). ( E ) Bar plots of the dose response curve fitting results showing how many times each model type was identified as the best fit. ( F ) A dose-response curve fitting result showing each of the concentration values (black points), the fitted curve (solid blue line), and the estimated benchmark dose (solid red line) with its lower and upper 95% confidence intervals (dashed red lines), respectively.

MS2 data could be acquired independently from MS data acquisition. To accommodate this scenario and offer compatibility with MS2 spectra results from other popular tools such as MS-DIAL, we have added a Peak Annotation [MS2-DDA/DIA] module to allow users to directly upload MS2 spectra for database searching. Users can enter a single MS2 spectrum or upload an MSP or MGF file containing multiple MS2 spectra. For single spectrum searching, users must specify the m/z value of the precursor ion. However, for batch searching based on an MSP file, users do not need to specify the precursors’ m/z values. To ensure timely completion of database searching, the public server processes only 20 spectra for each submission (the first 20 spectra by default). Users can manually specify spectra for searching. After conducting this pilot analysis with 20 spectra, users can download the R command history and use our MetaboAnalystR package to annotate all MS2 spectra ( 26 ).

Multiple databases are available for compound identification. Database searching can be performed based on regular reference MS2 spectra and/or their corresponding neutral loss spectra. The results are visually summarized as mirror plots based on the matching scores (Figure 2B). Users can interactively explore the MS2 database matching results. The molecular formulas for the MS2 peaks in the reference database spectra are predicted using the BUDDY program ( 31 ). Users can download the complete compound identification table together with the mirror plots.

Understanding the causal relationships between metabolites and phenotypes is of great interest in both metabolomics and exposomics. GWAS have established links between genetic variants (e.g. single nucleotide polymorphism, or SNPs) and various phenotypes ( 32 ), while recent mGWAS provide connections between genotypes with metabolites or metabolite concentration changes. It becomes possible to estimate causal relationships between metabolites and a phenotype of interest. If a metabolite is causal for a given disease, genetic variants which influence the levels of that metabolite, either directly through affecting related enzymes or indirectly through influencing lifestyle choices (such as dietary habits), should result in a higher risk of the disease. These causal effects can be estimated through Mendelian randomization (MR) analysis ( 18 ). MR relies on the principle that genetic variants are randomly distributed across populations, similar to how treatments are randomly assigned in clinical trials. By leveraging this random allocation, MR can evaluate whether a relationship between a metabolite and a phenotype is causal, while reducing the impact of confounding factors and reverse causality that often plague observational studies.

MR analysis in MetaboAnalyst is based on the 2SMR approach (using the TwoSampleMR and MRInstruments R packages) which enables application of MR methods using summary statistics from non-overlapping individuals ( 17 , 33 ). Users should first select an exposure (i.e. a metabolite) and an outcome (i.e. a disease) of interest. Based on the selections, the program searches for potential instrumental variables (i.e. SNPs) that are associated with both the metabolite from our large collections of the recent mGWAS studies ( 14 ) and the disease from the OpenGWAS database ( 17 ). The next step is to perform SNP filtering and harmonization to identify independent SNPs through linkage disequilibrium (LD) clumping ( 34 ). When SNPs are absent in the GWAS database, proxy SNPs are identified using LD. In addition, it is critical to harmonize SNPs to make sure effect sizes for the SNPs on both exposures and the outcomes are for the same reference alleles. The last step before conducting MR analysis is to exclude SNPs affecting multiple metabolites to reduce horizontal pleiotropy which occurs when a genetic variant influences the outcome through pathways other than the exposure of interest ( 35 ). MetaboAnalyst's MR analysis page provides diverse statistical methods (currently 12), each of which has its own strengths and limitations. For instance, the weighted median method is robust to the violation of MR assumptions by some of the genetic variants, while Egger regression method is more robust to horizontal pleiotropy. Users can point their mouse over the corresponding question marks beside each method to learn more details.

Dose–response analysis is commonly used in toxicology and pharmacology for understanding how varying concentrations of a chemical can impact a biological system. It plays a pivotal role in risk assessment of chemical exposures ( 36 ). A key output of dose-response analysis is the benchmark dose (BMD), the minimum dose of a substance that produces a clear, low level health risk relative to the control group ( 37 ). Chemicals identified from exposomics are often followed up by dose–response studies to understand their mechanism of action or adverse outcome pathways ( 21 , 38 , 39 ).

Dose–response experiment design includes a control group (dose = 0) and at least three different dose groups, typically with the same number of replicates in each group. The data should be formatted as a csv file with their dose information included as the second row or column. The analysis workflow consists of four main steps: (i) data upload, integrity checking, processing and normalization; (ii) differential analysis to select features that vary with dose levels; (iii) curve fitting on the intensity or concentration values of those selected features against a suite of linear and non-linear models, and (iv) computing BMD values for each feature. The algorithm for dose–response analysis was adapted from the algorithm we developed for transcriptomics BMD analysis ( 40 , 41 ).

Compound database

The compound database has been updated based on HMDB 5.0 ( 42 ), with particular efforts made to synchronize with the IDs of other databases such as KEGG ( 43 ) and PubChem ( 44 ) to improve cross-references during compound mapping and pathway analysis. The compound database was expanded by ∼4000 compounds (after removing ∼10 000 deprecated HMDB entries and adding ∼14 000 new entries).

MS2 reference spectra database.

A total of 12 MS2 reference databases were collected and curated from public resources, including the HMDB experimental MS2 database ( 42 ), the HMDB predicted MS2 database ( 42 ), Global Natural Product Social Molecular Networking (GNPS) database ( 45 ), MoNA ( 46 ), MassBank ( 46 ), MINEs ( 47 ), LipidBlast ( 48 ), RIKEN ( 49 ), ReSpect ( 50 ), BMDMS ( 51 ), VaniyaNP ( 46 ) and the MS-DIAL database (v4.90) ( 52 ). The complete MS2 reference database currently comprises 10 420 215 MS2 records from 1 551 012 unique compounds. We also created a neutral loss spectra database calculated based on the algorithm implemented by the METLIN neutral loss database ( 53 ). The molecular formula of all MS2 fragments were pre-calculated using BUDDY ( 31 ).

Pathway and metabolite set libraries

The KEGG pathway libraries have been updated to their recent version (12/20/2023) via KEGG API. Based on user feedback, the pathway analysis for both targeted and untargeted metabolomics data now supports ∼130 species (up from 28 species in version 5.0), including many new mammals, plants, insects, fungi, and bacteria, etc. We also updated the metabolite set libraries based on HMDB 5.0, MarkerDB ( 54 ), as well as manual curation. For instance, a total of 62 metabolite sets associated with dietary and chemical exposures were added during this process. The metabolite set library also incorporated ∼3700 pathways downloaded from the RaMP-DB ( 55 ).

Statistical analysis with complex metadata

The Statistical Analysis [metadata table] module in MetaboAnalyst 6.0 now provides a comprehensive suite of methods for analyzing and visualizing metabolomics data in relation to various metadata, be it discrete or continuous. Users can quickly assess the correlation patterns among different experimental factors using the metadata overview heatmaps or interactive PCA visualization. The interactive heatmap visualization coupled with hierarchical clustering allows users to easily explore feature abundance variations across different samples and metadata variables. The statistical methods in this module include both univariate linear models with covariate adjustment as well as multivariate methods such as ANOVA Simultaneous Component Analysis ( 56 , 57 ). Random forest is offered for classification with consideration of different metadata variables of interest. More details about this module can be found in our recently published protocol ( 11 ).

Enhanced functional analysis for untargeted metabolomics

Functional analysis of untargeted metabolomics was initially established based on mummichog and Gene Set Enrichment Analysis (GSEA) since MetaboAnalyst 4.0 ( 58 ). It was further enhanced in MetaboAnalyst 5.0 by incorporating retention time into calculating empirical compounds. MetaboAnalyst 6.0 now allows users to upload an LC–MS peak list along with a corresponding MS2-based compound list to filter out unrealistic empirical compounds to further improve the accuracy in functional analysis ( 59 ).

Enhanced data visualization support

We have enhanced the quality of the interactive and synchronized 3D plots across the dimensionality reduction methods (PCA, PLS-DA, sPLS-DA) used in MetaboAnalyst based on the powerful three.js library ( https://threejs.org/ ). New features include customizable backgrounds, data point annotations and confidence ellipsoids (Figure 2A ). We have also implemented interactive plots for clustering heatmaps in the Statistical Analysis modules to better support visual exploration of large data matrices typical in untargeted metabolomics. Both mouse-over and zoom-in functionalities are supported to allow users to examine specific features or patterns of interest. In addition to these enhancements, we also updated the visualization for KEGG’s global metabolic network ( 43 ).

To illustrate the utility of the new features of MetaboAnalyst 6.0, we used a metabolomics dataset collected in-house that aimed at studying glucose-induced insulin secretion in isolated human islets. The dataset contains five samples of high-glucose (16.7 mM) exposures, five samples of low-glucose (2.8 mM) exposures, both for 30 min, and five quality control (QC) samples. The LC-MS spectra were collected using our Q-Exactive Orbitrap platform (Thermo Scientific, Waltham, MA USA), together with three SWATH-DIA acquisitions from the pooled QC. The spectra were first centroided and converted into mzML format using ProteoWizard ( 60 , 61 ) and uploaded to MetaboAnalyst 6.0. LC–MS spectra processing was performed using the asari algorithm. All detected MS1 features were used as a target list for MS2 deconvolution and database searching. A total of 27 209 MS1 features were detected, with 4959 of them identified with at least one potential named chemical identity. Functional analysis using the mummichog algorithm indicated compounds showing significant changes between the high-glucose and low-glucose groups were involved in the C arnitine shuttle , C affeine, Tryptophan , and C oenzyme A metabolism pathways (Figure 2C ). These pathways have been consistently identified in previous studies ( 62–65 ). Finally, we performed a causal analysis on the associations between one of the significant metabolites identified, L-Cystathionine and type 2 diabetes (GWAS ID: finn-b-E4_DM2). The default parameters were used for both SNP filtering and harmonization, as well as MR analysis. Based on these results, a significantly altered cystathionine level was found to have a causal effect on type 2 diabetes (Figure 2D ), which aligns well with a study published recently ( 66 ). This case study highlights how MetaboAnalyst 6.0 allows users to investigate the chemical identities of MS peaks, elucidate associations between metabolites and phenotypes to unveil previously unknown functional insights. To showcase the dose-response analysis module, we utilized a published data collected from BT549 breast cancer cells treated with four different doses of etomoxir ( 21 ). Figure 2E summarizes the results from dose-response modeling. Figure 2F shows an example feature-level BMD calculated based on the fitted curve. The workflow is included as a series of tutorials on our website.

Several web-based tools have been developed to address various aspects of metabolomics data processing, statistical analysis, functional interpretation, and results visualization. Table 1 compares the main features of MetaboAnalyst 6.0 with other popular tools including the previous version, XCMS online ( 23 ), GNPS ( 45 ), Workflow4Metabolomics (W4M) ( 67 ) and MetExplore ( 68 ). For raw data processing, MetaboAnalyst primarily focuses on supporting LC–MS data, whereas W4M also supports GC–MS and NMR raw data processing, and GNPS emphasizes MS2-based compound identification via molecular networks. In comparison, MetaboAnalyst provides an auto-optimized workflow along with an additional algorithm (asari) for efficient LC–MS spectra processing, together with more extensive MS2 spectra libraries for compound identification. In terms of statistical analysis, MetaboAnalyst 6.0 has introduced new modules for dealing with complex metadata, causal analysis and dose–response analysis, while maintaining all other functionalities. MetaboAnalyst contains unique features for enrichment and pathway analysis, and these strengths were further improved in version 6.0, with the addition of unique functions and supports for more species. For network analysis and integration, MetExplore specializes in metabolic network visualization and integration with other omics. These features are addressed by our companion tool, OmicsNet ( 69 ). Overall, MetaboAnalyst 6.0 continues to be the most comprehensive tool for metabolomics data processing, analysis and interpretation.

Comparison of MetaboAnalyst 6.0 with its previous version and other common web-based metabolomics tools. Symbols used for feature evaluations with ‘√’ for present, ‘-’ for absent, and ‘+’ for a more quantitative assessment (more ‘+’ indicate better support)

• XCMS online: https://xcmsonline.scripps.edu/ .

• GNPS: https://gnps.ucsd.edu/ .

• Workflow4Metabolomics (W4M): https://workflow4metabolomics.org/ .

• MetExplore: https://metexplore.toulouse.inra.fr/metexplore2/

By incorporating a new MS2 data processing workflow, MetaboAnalyst 6.0 now offers a web-based, end-to-end platform for metabolomics data analysis. The workflow spans from raw MS spectra processing to compound identification to functional analysis. A key motivation in developing version 6.0 was to support the data analysis needs emerging from exposomics and follow-up validation studies. The new statistical analysis module specifically takes into account of complex metadata to better identify robust associations. From these associations, users can perform causal analysis based on 2SMR to narrow down candidate compounds. The remaining compounds can be validated through dose-response studies based on in vitro or animal models. Our case study highlights the streamlined analysis workflow from raw spectra processing to compound annotation, to functional interpretation, and finally to causal insights. In conclusion, MetaboAnalyst 6.0 is a user-friendly platform for comprehensive analysis of metabolomics data and help address emerging needs from recent exposomics research. For future directions, we will continue to improve metabolome annotations, better integrate with other omics data, and explore new ways to interact with users via generative artificial intelligence technologies ( 70–73 ).

MetaboAnalyst 6.0 is freely available at https://www.metaboanalyst.ca . No log in required.

Human islets for research were provided by the Alberta Diabetes Institute IsletCore at the University of Alberta in Edmonton with the assistance of the Human Organ Procurement and Exchange (HOPE) program, Trillium Gift of Life Network (TGLN), and other Canadian organ procurement organizations. Islet isolation was approved by the Human Research Ethics Board at the University of Alberta (Pro00013094). All donors’ families gave informed consent for the use of pancreatic tissue in research.

This research was funded by Genome Canada, Canadian Foundation for Innovation (CFI), US National Institutes of Health (U01 CA235493), Canadian Institutes of Health Research (CIHR), Juvenile Diabetes Research Foundation (JDRF), Natural Sciences and Engineering Research Council of Canada (NSERC), and Diabetes Canada. Funding for open access charge: NSERC.

Conflict of interest statement . J. Xia is the founder of XiaLab Analytics.

Lloyd-Price   J. , Arze   C. , Ananthakrishnan   A.N. , Schirmer   M. , Avila-Pacheco   J. , Poon   T.W. , Andrews   E. , Ajami   N.J. , Bonham   K.S. , Brislawn   C.J.  et al. .   Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases . Nature . 2019 ; 569 : 655 – 662 .

Google Scholar

Utpott   M. , Rodrigues   E. , Rios   A.O. , Mercali   G.D. , Flores   S.H.   Metabolomics: an analytical technique for food processing evaluation . Food Chem.   2022 ; 366 : 130685 .

Wishart   D.S.   Metabolomics for investigating physiological and pathophysiological processes . Physiol. Rev.   2019 ; 99 : 1819 – 1875 .

Vermeulen   R. , Schymanski   E.L. , Barabasi   A.L. , Miller   G.W.   The exposome and health: where chemistry meets biology . Science . 2020 ; 367 : 392 – 396 .

Danzi   F. , Pacchiana   R. , Mafficini   A. , Scupoli   M.T. , Scarpa   A. , Donadelli   M. , Fiore   A.   To metabolomics and beyond: a technological portfolio to investigate cancer metabolism . Signal. Transduct. Target. Ther.   2023 ; 8 : 137 .

Xia   J. , Psychogios   N. , Young   N. , Wishart   D.S.   MetaboAnalyst: a web server for metabolomic data analysis and interpretation . Nucleic Acids Res.   2009 ; 37 : W652 – W660 .

Xia   J. , Mandal   R. , Sinelnikov   I.V. , Broadhurst   D. , Wishart   D.S.   MetaboAnalyst 2.0—A comprehensive server for metabolomic data analysis . Nucleic Acids Res.   2012 ; 40 : W127 – W133 .

Xia   J. , Sinelnikov   I.V. , Han   B. , Wishart   D.S.   MetaboAnalyst 3.0—Making metabolomics more meaningful . Nucleic Acids Res.   2015 ; 43 : W251 – W257 .

Chong   J. , Soufan   O. , Li   C. , Caraus   I. , Li   S. , Bourque   G. , Wishart   D.S. , Xia   J.   MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis . Nucleic Acids Res.   2018 ; 46 : W486 – W494 .

Pang   Z. , Chong   J. , Zhou   G. , de Lima Morais   D.A. , Chang   L. , Barrette   M. , Gauthier   C. , Jacques   P.-É. , Li   S. , Xia   J.   MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights . Nucleic Acids Res.   2021 ; 49 : W388 – W396 .

Pang   Z. , Zhou   G. , Ewald   J. , Chang   L. , Hacariz   O. , Basu   N. , Xia   J.   Using MetaboAnalyst 5.0 for LC–HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data . Nat. Protoc.   2022 ; 17 : 1735 – 1761 .

Frigerio   G. , Moruzzi   C. , Mercadante   R. , Schymanski   E.L. , Fustinoni   S.   Development and application of an LC–MS/MS untargeted exposomics method with a separated pooled quality control strategy . Molecules . 2022 ; 27 : 2580 .

Tsugawa   H. , Cajka   T. , Kind   T. , Ma   Y. , Higgins   B. , Ikeda   K. , Kanazawa   M. , VanderGheynst   J. , Fiehn   O. , Arita   M.   MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis . Nat. Methods . 2015 ; 12 : 523 – 526 .

Chang   L. , Zhou   G. , Xia   J.   mGWAS-Explorer 2.0: causal analysis and interpretation of metabolite-phenotype associations . Metabolites . 2023 ; 13 : 826 .

Shin   S.Y. , Fauman   E.B. , Petersen   A.K. , Krumsiek   J. , Santos   R. , Huang   J. , Arnold   M. , Erte   I. , Forgetta   V. , Yang   T.P.  et al. .   An atlas of genetic influences on human blood metabolites . Nat. Genet.   2014 ; 46 : 543 – 550 .

Chen   Y. , Lu   T. , Pettersson-Kymmer   U. , Stewart   I.D. , Butler-Laporte   G. , Nakanishi   T. , Cerani   A. , Liang   K.Y.H. , Yoshiji   S. , Willett   J.D.S.  et al. .   Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases . Nat. Genet.   2023 ; 55 : 44 – 53 .

Hemani   G. , Zheng   J. , Elsworth   B. , Wade   K.H. , Haberland   V. , Baird   D. , Laurin   C. , Burgess   S. , Bowden   J. , Langdon   R.  et al. .   The MR-Base platform supports systematic causal inference across the human phenome . eLife . 2018 ; 7 : e34408 .

Sanderson   E. , Glymour   M.M. , Holmes   M.V. , Kang   H. , Morrison   J. , Munafò   M.R. , Palmer   T. , Schooling   C.M. , Wallace   C. , Zhao   Q.  et al. .   Mendelian randomization . Nat. Rev. Methods Primers . 2022 ; 2 : 6 .

Zhao   H. , Liu   M. , Lv   Y. , Fang   M.   Dose-response metabolomics and pathway sensitivity to map molecular cartography of bisphenol A exposure . Environ. Int.   2022 ; 158 : 106893 .

Thomas   R.S. , Wesselkamper   S.C. , Wang   N.C.Y. , Zhao   Q.J. , Petersen   D.D. , Lambert   J.C. , Cote   I. , Yang   L. , Healy   E. , Black   M.B.  et al. .   Temporal concordance between apical and transcriptional points of departure for chemical risk assessment . Toxicol. Sci.   2013 ; 134 : 180 – 194 .

Yao   C.-H. , Wang   L. , Stancliffe   E. , Sindelar   M. , Cho   K. , Yin   W. , Wang   Y. , Patti   G.J.   Dose-response metabolomics to understand biochemical mechanisms and off-target drug effects with the TOXcms software . Anal. Chem.   2020 ; 92 : 1856 – 1864 .

Li   S. , Siddiqa   A. , Thapa   M. , Chi   Y. , Zheng   S.   Trackable and scalable LC–MS metabolomics data processing using asari . Nat. Commun.   2023 ; 14 : 4113 .

Tautenhahn   R. , Patti   G.J. , Rinehart   D. , Siuzdak   G.   XCMS Online: a web-based platform to process untargeted metabolomic data . Anal. Chem.   2012 ; 84 : 5035 – 5039 .

Schmid   R. , Heuckeroth   S. , Korf   A. , Smirnov   A. , Myers   O. , Dyrlund   T.S. , Bushuiev   R. , Murray   K.J. , Hoffmann   N. , Lu   M.  et al. .   Integrative analysis of multimodal mass spectrometry data in MZmine 3 . Nat. Biotechnol.   2023 ; 41 : 447 – 449 .

Myers   O.D. , Sumner   S.J. , Li   S. , Barnes   S. , Du   X.   Detailed investigation and comparison of the XCMS and MZmine 2 chromatogram construction and chromatographic peak detection methods for preprocessing mass spectrometry metabolomics data . Anal. Chem.   2017 ; 89 : 8689 – 8695 .

Pang   Z. , Chong   J. , Li   S. , Xia   J.   MetaboAnalystR 3.0: toward an optimized workflow for global metabolomics . Metabolites . 2020 ; 10 : 186 .

Xing   S. , Yu   H. , Liu   M. , Jia   Q. , Sun   Z. , Fang   M. , Huan   T.   Recognizing contamination fragment ions in liquid chromatography–Tandem mass spectrometry data . J. Am. Soc. Mass. Spectrom.   2021 ; 32 : 2296 – 2305 .

Stancliffe   E. , Schwaiger-Haber   M. , Sindelar   M. , Patti   G.J.   DecoID improves identification rates in metabolomics through database-assisted MS/MS deconvolution . Nat. Methods . 2021 ; 18 : 779 – 787 .

Yin   Y. , Wang   R. , Cai   Y. , Wang   Z. , Zhu   Z.-J.   DecoMetDIA: deconvolution of multiplexed MS/MS spectra for metabolite identification in SWATH-MS-based untargeted metabolomics . Anal. Chem.   2019 ; 91 : 11897 – 11904 .

Li   Y. , Kind   T. , Folz   J. , Vaniya   A. , Mehta   S.S. , Fiehn   O.   Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification . Nat. Methods . 2021 ; 18 : 1524 – 1531 .

Xing   S. , Shen   S. , Xu   B. , Li   X. , Huan   T.   BUDDY: molecular formula discovery via bottom-up MS/MS interrogation . Nat. Methods . 2023 ; 20 : 881 – 890 .

Sollis   E. , Mosaku   A. , Abid   A. , Buniello   A. , Cerezo   M. , Gil   L. , Groza   T. , Gunes   O. , Hall   P. , Hayhurst   J.  et al. .   The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource . Nucleic Acids Res.   2023 ; 51 : D977 – D985 .

Hemani   G. , Tilling   K. , Davey Smith   G.   Orienting the causal relationship between imprecisely measured traits using GWAS summary data . PLos Genet.   2017 ; 13 : e1007081 .

Marees   A.T. , de Kluiver   H. , Stringer   S. , Vorspan   F. , Curis   E. , Marie-Claire   C. , Derks   E.M.   A tutorial on conducting genome-wide association studies: quality control and statistical analysis . Int. J. Methods Psychiatr. Res.   2018 ; 27 : e1608 .

de Leeuw   C. , Savage   J. , Bucur   I.G. , Heskes   T. , Posthuma   D   Understanding the assumptions underlying mendelian randomization . Eur. J. Hum. Genet.   2022 ; 30 : 653 – 660 .

Altshuler   B.   Modeling of dose-response relationships . Environ. Health Perspect.   1981 ; 42 : 23 – 27 .

Thomas   R.S. , Wesselkamper   S.C. , Wang   N.C. , Zhao   Q.J. , Petersen   D.D. , Lambert   J.C. , Cote   I. , Yang   L. , Healy   E. , Black   M.B.  et al. .   Temporal concordance between apical and transcriptional points of departure for chemical risk assessment . Toxicol. Sci.   2013 ; 134 : 180 – 194 .

Kleensang   A. , Maertens   A. , Rosenberg   M. , Fitzpatrick   S. , Lamb   J. , Auerbach   S. , Brennan   R. , Crofton   K.M. , Gordon   B. , Fornace   A.J.  Jr  et al. .   Pathways of toxicity . ALTEX . 2014 ; 31 : 53 – 61 .

Ewald   J. , Soufan   O. , Xia   J. , Basu   N.   FastBMD: an online tool for rapid benchmark dose–response analysis of transcriptomics data . Bioinformatics . 2020 ; 37 : 1035 – 1036 .

Ewald   J. , Zhou   G. , Lu   Y. , Xia   J.   Using ExpressAnalyst for comprehensive gene expression analysis in model and non-model organisms . Curr Protoc . 2023 ; 3 : e922 .

Wishart   D.S. , Guo   A. , Oler   E. , Wang   F. , Anjum   A. , Peters   H. , Dizon   R. , Sayeeda   Z. , Tian   S. , Lee   B.L.  et al. .   HMDB 5.0: the Human Metabolome Database for 2022 . Nucleic Acids Res.   2021 ; 50 : D622 – D631 .

Kanehisa   M. , Furumichi   M. , Sato   Y. , Kawashima   M. , Ishiguro-Watanabe   M.   KEGG for taxonomy-based analysis of pathways and genomes . Nucleic Acids Res.   2022 ; 51 : D587 – D592 .

Kim   S.   Exploring chemical information in PubChem . Curr. Protoc.   2021 ; 1 : e217 .

Aron   A.T. , Gentry   E.C. , McPhail   K.L. , Nothias   L.-F. , Nothias-Esposito   M. , Bouslimani   A. , Petras   D. , Gauglitz   J.M. , Sikora   N. , Vargas   F.  et al. .   Reproducible molecular networking of untargeted mass spectrometry data using GNPS . Nat. Protoc.   2020 ; 15 : 1954 – 1991 .

Horai   H. , Arita   M. , Kanaya   S. , Nihei   Y. , Ikeda   T. , Suwa   K. , Ojima   Y. , Tanaka   K. , Tanaka   S. , Aoshima   K.  et al. .   MassBank: a public repository for sharing mass spectral data for life sciences . J. Mass Spectrom.   2010 ; 45 : 703 – 714 .

Jeffryes   J.G. , Colastani   R.L. , Elbadawi-Sidhu   M. , Kind   T. , Niehaus   T.D. , Broadbelt   L.J. , Hanson   A.D. , Fiehn   O. , Tyo   K.E. , Henry   C.S.   MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics . J Cheminform . 2015 ; 7 : 44 .

Kind   T. , Liu   K.-H. , Lee   D.Y. , DeFelice   B. , Meissen   J.K. , Fiehn   O.   LipidBlast in silico tandem mass spectrometry database for lipid identification . Nat. Methods . 2013 ; 10 : 755 – 758 .

Tsugawa   H. , Nakabayashi   R. , Mori   T. , Yamada   Y. , Takahashi   M. , Rai   A. , Sugiyama   R. , Yamamoto   H. , Nakaya   T. , Yamazaki   M.  et al. .   A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms . Nat. Methods . 2019 ; 16 : 295 – 298 .

Sawada   Y. , Nakabayashi   R. , Yamada   Y. , Suzuki   M. , Sato   M. , Sakata   A. , Akiyama   K. , Sakurai   T. , Matsuda   F. , Aoki   T.  et al. .   RIKEN tandem mass spectral database (ReSpect) for phytochemicals: a plant-specific MS/MS-based data resource and database . Phytochemistry . 2012 ; 82 : 38 – 45 .

Lee   S. , Hwang   S. , Seo   M. , Shin   K.B. , Kim   K.H. , Park   G.W. , Kim   J.Y. , Yoo   J.S. , No   K.T.   BMDMS-NP: a comprehensive ESI-MS/MS spectral library of natural compounds . Phytochemistry . 2020 ; 177 : 112427 .

Tsugawa   H. , Ikeda   K. , Takahashi   M. , Satoh   A. , Mori   Y. , Uchino   H. , Okahashi   N. , Yamada   Y. , Tada   I. , Bonini   P.  et al. .   A lipidome atlas in MS-DIAL 4 . Nat. Biotechnol.   2020 ; 38 : 1159 – 1163 .

Aisporna   A. , Benton   H.P. , Chen   A. , Derks   R.J.E. , Galano   J.M. , Giera   M. , Siuzdak   G.   Neutral loss mass spectral data enhances molecular similarity analysis in METLIN . J. Am. Soc. Mass. Spectrom.   2022 ; 33 : 530 – 534 .

Wishart   D.S. , Bartok   B. , Oler   E. , Liang   K.Y.H. , Budinski   Z. , Berjanskii   M. , Guo   A. , Cao   X. , Wilson   M.   MarkerDB: an online database of molecular biomarkers . Nucleic Acids Res.   2021 ; 49 : D1259 – D1267 .

Braisted   J. , Patt   A. , Tindall   C. , Sheils   T. , Neyra   J. , Spencer   K. , Eicher   T. , Mathé   E.A.   RaMP-DB 2.0: a renovated knowledgebase for deriving biological and chemical insight from metabolites, proteins, and genes . Bioinformatics . 2023 ; 39 : btac726 .

Smilde   A.K. , Jansen   J.J. , Hoefsloot   H.C. , Lamers   R.J. , van der Greef   J. , Timmerman   M.E.   ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data . Bioinformatics . 2005 ; 21 : 3043 – 3048 .

Ritchie   M.E. , Phipson   B. , Wu   D. , Hu   Y. , Law   C.W. , Shi   W. , Smyth   G.K.   limma powers differential expression analyses for RNA-sequencing and microarray studies . Nucleic Acids Res.   2015 ; 43 : e47 .

Li   S. , Park   Y. , Duraisingham   S. , Strobel   F.H. , Khan   N. , Soltow   Q.A. , Jones   D.P. , Pulendran   B.   Predicting network activity from high throughput metabolomics . PLoS Comput. Biol.   2013 ; 9 : e1003123 .

Lu   Y. , Pang   Z. , Xia   J.   Comprehensive investigation of pathway enrichment methods for functional interpretation of LC–MS global metabolomics data . Brief. Bioinform.   2023 ; 24 : bbac553 .

Chambers   M.C. , Maclean   B. , Burke   R. , Amodei   D. , Ruderman   D.L. , Neumann   S. , Gatto   L. , Fischer   B. , Pratt   B. , Egertson   J.  et al. .   A cross-platform toolkit for mass spectrometry and proteomics . Nat. Biotechnol.   2012 ; 30 : 918 – 920 .

Adusumilli   R. , Mallick   P.   Data conversion with ProteoWizard msConvert . Methods Mol. Biol.   2017 ; 1550 : 339 – 368 .

Bene   J. , Hadzsiev   K. , Melegh   B.   Role of carnitine and its derivatives in the development and management of type 2 diabetes . Nutr Diabetes . 2018 ; 8 : 8 .

Lane   J.D. , Barkauskas   C.E. , Surwit   R.S. , Feinglos   M.N.   Caffeine impairs glucose metabolism in type 2 diabetes . Diabetes Care.   2004 ; 27 : 2047 – 2048 .

Unluturk   U. , Erbas   T. Engin   A. , Engin   A.B.   Tryptophan Metabolism: Implications for Biological Processes, Health and Disease . 2015 ; Cham Springer International Publishing 147 – 171 .

Google Preview

Jackowski   S. , Leonardi   R.   Deregulated coenzyme A, loss of metabolic flexibility and diabetes . Biochem. Soc. Trans.   2014 ; 42 : 1118 – 1122 .

Cruciani-Guglielmacci   C. , Meneyrol   K. , Denom   J. , Kassis   N. , Rachdi   L. , Makaci   F. , Migrenne-Li   S. , Daubigney   F. , Georgiadou   E. , Denis   R.G.  et al. .   Homocysteine metabolism pathway is involved in the control of glucose homeostasis: a cystathionine beta synthase deficiency study in mouse . Cells . 2022 ; 11 : 1737 .

Giacomoni   F. , Le Corguillé   G. , Monsoor   M. , Landi   M. , Pericard   P. , Pétéra   M. , Duperier   C. , Tremblay-Franco   M. , Martin   J.-F. , Jacob   D.  et al. .   Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics . Bioinformatics . 2014 ; 31 : 1493 – 1495 .

Cottret   L. , Frainay   C. , Chazalviel   M. , Cabanettes   F. , Gloaguen   Y. , Camenen   E. , Merlet   B. , Heux   S. , Portais   J.C. , Poupin   N.  et al. .   MetExplore: collaborative edition and exploration of metabolic networks . Nucleic Acids Res.   2018 ; 46 : W495 – W502 .

Zhou   G. , Pang   Z. , Lu   Y. , Ewald   J. , Xia   J.   OmicsNet 2.0: a web-based platform for multi-omics integration and network visual analytics . Nucleic Acids Res.   2022 ; 50 : W527 – W533 .

Lu   Y. , Zhou   G. , Ewald   J. , Pang   Z. , Shiri   T. , Xia   J.   MicrobiomeAnalyst 2.0: comprehensive statistical, functional and integrative analysis of microbiome data . Nucleic Acids Res.   2023 ; 51 : W310 – W318 .

Liu   P. , Ewald   J. , Pang   Z. , Legrand   E. , Jeon   Y.S. , Sangiovanni   J. , Hacariz   O. , Zhou   G. , Head   J.A. , Basu   N.  et al. .   ExpressAnalyst: a unified platform for RNA-sequencing analysis in non-model species . Nat. Commun.   2023 ; 14 : 2995 .

Zhou   G. , Ewald   J. , Xia   J.   OmicsAnalyst: a comprehensive web-based platform for visual analytics of multi-omics data . Nucleic Acids Res.   2021 ; 49 : W476 – W482 .

Moor   M. , Banerjee   O. , Abad   Z.S.H. , Krumholz   H.M. , Leskovec   J. , Topol   E.J. , Rajpurkar   P.   Foundation models for generalist medical artificial intelligence . Nature . 2023 ; 616 : 259 – 265 .

Email alerts

Citing articles via.

  • Editorial Board

Affiliations

  • Online ISSN 1362-4962
  • Print ISSN 0305-1048
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

REVIEW article

The effect of covid-19 vaccine to the omicron variant in children and adolescents: a systematic review and meta-analysis.

Wenting Lu,

  • 1 Institute of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan, China
  • 2 Integrated Care Management Center, West China Hospital, Sichuan University, Chengdu, Sichuan, China
  • 3 Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), National Center for Healthcare Quality Management in Obstetrics, Peking University Third Hospital, Peking University, Beijing, China
  • 4 General Practice Ward/International Medical Center Ward, General Practice Medical Center, West China Hospital, Sichuan University, Chengdu, China
  • 5 Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, China
  • 6 Key Laboratory of Obstetrics & Gynecologic and Pediatric Diseases and Birth Defects of the Ministry of Education, Sichuan University, Chengdu, China

Background: Omicron (B.1.1.529), a variant of SARS-CoV-2, has emerged as a dominant strain in COVID-19 pandemic. This development has raised concerns about the effectiveness of vaccination to Omicron, particularly in the context of children and adolescents. Our study evaluated the efficacy of different COVID-19 vaccination regimens in children and adolescents during the Omicron epidemic phase.

Methods: We searched PubMed, Cochrane, Web of Science, and Embase electronic databases for studies published through March 2023 on the association between COVID-19 vaccination and vaccine effectiveness (VE) against SARS-CoV-2 infection in children and adolescents at the Omicron variant period. The effectiveness outcomes included mild COVID-19 and severe COVID-19. This study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines and was prospectively registered in PROSPERO (CRD42023390481).

Results: A total of 33 studies involving 16,532,536 children were included in the analysis. First, in children and adolescents aged 0–19 years, the overall VE of the COVID-19 vaccine is 45% (95% confidence interval [CI]: 40 to 50%). Subgroup analysis of VE during Omicron epidemic phase for different dosage regimens demonstrated that the VE was 50% (95% CI: 44 to 55%) for the 2-dose vaccination and 61% (95% CI: 45 to 73%) for the booster vaccination. Upon further analysis of different effectiveness outcomes during the 2-dose vaccination showed that the VE was 41% (95% CI: 35 to 47%) against mild COVID-19 and 71% (95% CI: 60 to 79%) against severe COVID-19. In addition, VE exhibited a gradual decrease over time, with the significant decline in the efficacy of Omicron for infection before and after 90 days following the 2-dose vaccination, registering 54% (95% CI: 48 to 59%) and 34% (95% CI: 21 to 56%), respectively.

Conclusion: During the Omicron variant epidemic, the vaccine provided protection against SARS-CoV-2 infection in children and adolescents aged 0–19 years. Two doses of vaccination can provide effective protection severe COVID-19, with booster vaccination additionally enhancing VE.

1 Introduction

Since the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in 2019, its global impact has been profound ( 1 ), causing millions of infections and significantly affecting both human lives and socio-economic stability ( 2 ). As the epidemic evolves, the Omicron variant became the predominant strain of novel coronavirus pneumonia worldwide since November 2021 ( 3 , 4 ). With the emergence of Omicron variant, the incidence of SARS-CoV-2 infections is growing among children ( 5 ), including mild COVID-19 (fever, fatigue, persistent dry cough, decreased or loss the sense of taste or smell and other symptoms) and severe COVID-19 (pneumonia, or life-threatening complications affecting the gastrointestinal, neurological, cardiovascular systems, or hospitalizations) ( 6 ).

Vaccination is the most economically efficient means to guard against COVID-19 ( 7 , 8 ). And the efficacy of vaccination is linked to the vaccination dosages and the vaccination interval ( 9 – 13 ). The vaccination regimen currently comprises complete vaccination (two doses), and the booster vaccination (three doses) in children ( 14 , 15 ). Comprehending the efficacy of vaccines in children is crucial for informed decision-making regarding vaccine policies, including the necessity, timing, and dosages of vaccination for children. Piechottal et al. found that in children aged 5–11 years, mRNA vaccines are moderately effective against infections with the omicron variant and protect well against COVID-19 hospitalizations ( 16 ). However, there remains limited understanding regarding the protective efficacy of vaccines against the omicron variant infection in children of a wider age range. In addition, it is unclear that the reasonable time interval after the administration of two doses vaccination and the efficacy of vaccination in preventing both mild and severe infections among individuals aged 0–19 years. Therefore, we conducted the meta-analysis to explore the efficacy of COVID-19 vaccine in children and adolescents aged 0–19 years during Omicron epidemic phase.

The study explored the association between vaccine effectiveness (VE) of COVID-19 vaccine and children SARS-CoV-2 infections during the Omicron variant outbreak. Additional, subgroup analyses were conducted to identify potential factors including various vaccination dosages, diverse SARS-CoV-2 outcomes, and different time intervals after the two doses vaccination. The findings provided a reference for the vaccination strategy of children against COVID-19 during the Omicron variant period and offered robust support for safeguarding the health and safety of the pediatric population.

2.1 Registration

The present investigation adhered to the guidelines outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and was prospectively registered in PROSPERO under the registration number CRD42023390481. Initially, the PROSPERO protocol was designed to evaluate the effectiveness of various vaccine types. However, the most articles meeting our inclusion criteria focused on the BNT162b2 vaccine, with limited data available for other vaccine types. Consequently, we modified the protocol to explore the vaccine efficacy concerning dosage, infection severity, and vaccination intervals.

2.2 Information sources and search strategies

We conducted comprehensive systematic literature searches utilizing the PubMed, Cochrane, Web of Science, and Embase electronic databases/platforms, spanning until Feb. 2023. A structured search strategy was meticulously devised, encompassing pertinent Medical Subject Headings (MeSH) search terms such as “COVID-19 Vaccines,” and text words such as “COVID19 Virus Vaccines,” “Coronavirus Disease 2019 Vaccine.” As well as Supplementary Concept “SARS-CoV-2 variants,” or text words like “Omicron,” “SARS-CoV-2 BA.5 variant,” “COVID-19 Virus variant B.1.1.529,” and “SARS-CoV-2 omicron variant.” Moreover, we included MeSH term “Child,” or text words like “Child,” “Children,” and MeSH term “Pediatrics,” with corresponding text words “pediatric.” To ensure comprehensive coverage, we adapted the search strategy accordingly for the other electronic databases employed. The specific search strategy for each database/platform is shown in the Supplementary materials . Additionally, we meticulously examined the reference lists of the included studies to identify further relevant literature for inclusion.

2.3 Eligibility criteria

We conducted a systematic review of studies that investigated the effectiveness of COVID-19 vaccines in preventing Omicron variant infections among children and adolescents. Our study population comprised individuals aged 0–19 years, with no restrictions on vaccine types or dosages administered. For precise analysis, included studies must explicitly specify COVID-19 infection attributed to the Omicron variant (PCR-confirmed or antigen-test confirmed) as the outcome measure and provide accessible data on VE. Our study excluded reviews, case series, case reports, and studies involving non-human subjects.

2.4 Study selection process

A single investigator conducted the initial database search and diligently screened for any duplicate entries. Following the elimination of duplicates, two reviewers (TR and WL) meticulously evaluated the titles and abstracts of all records, subsequently scrutinizing the full texts of the eligible articles.

2.5 Data collection

Data pertaining to study design and methodology, author names, publication year, study location, sample size, age range, dosages of vaccination, different outcomes of SARS-CoV-2 infection during Omicron-dominant period, and potential confounding variables were meticulously extracted from the incorporated studies. The extraction process was carried out by two independent reviewers (TR and WL).

2.6 Study risk of bias assessment

The risk of bias for all chosen studies was independently evaluated by two reviewers (TR and WL) utilizing the Newcastle-Ottawa Scale (NOS) score. Subsequently, the quality of each study was categorized into three grades: low (0–3), moderate ( 4 – 6 ), and high ( 7 – 9 ).

2.7 Statistical analysis

Data from the including studies were meticulously extracted into Microsoft Excel and then imported into Stata 12 software (Stata Corp) and Review Manager 5.3 for conducting the meta-analyses. VE is defined as the reduction in disease incidence among vaccinated individuals compared to unvaccinated individuals. The VE and its accompanying 95% confidence intervals (CIs) were computed utilizing either adjusted or unadjusted risk ratios (RR): VE = (1 - RR) × 100%. The VE expressed in percentage values exceeding 0% indicate a potential protective impact of the vaccine. We employed pooled RR and VE to evaluate the correlation between COVID-19 vaccination in children and adolescents and SARS-CoV-2 infections during the Omicron-dominant period. To quantify inconsistency across studies and ascertain the percentage of variability in effect estimates potentially arising from heterogeneity rather than sampling error, the I 2 statistic and Q test were used to evaluate each study heterogeneity. If the heterogeneity was significant and I 2  > 50%, a random effects model was used; otherwise, a fixed effects model was used. p  < 0.05 was considered statistically significant. Additionally, sensitivity analysis was performed to assess the robustness of associations by excluding one study at a time. To gauge publication bias, a funnel plot was constructed, and Egger’s and Begg’s tests were conducted.

Furthermore, we performed subgroup analyses by stratifying the different vaccination dosages, varying time intervals after the 2-dose vaccination, and distinct outcome of SARS-CoV-2 infections. Based on the information provided in the original studies, the dosages of vaccination were categorized into three subgroups: one dose indicating incomplete vaccination, two doses representing complete vaccination, and three doses administered as booster vaccination. The classification of outcomes was divided into two subgroups: mild COVID-19 (fever, fatigue, persistent dry cough, decreased or loss the sense of taste or smell and other symptoms) and severe COVID-19 (pneumonia, or life-threatening complications affecting the gastrointestinal, neurological, cardiovascular systems, or hospitalization) based on the outcome indicators reported in the original study data. The time intervals were divided into two subgroups, as reported in the original studies: ≤90 days and > 90 days following the two doses vaccination.

The study selection process is shown in Figure 1 . In our study, a total of 1731 records were searched in the databases to explore the efficacy of the COVID-19 vaccine in children aged 0–19 years during the Omicron-dominant period. In the course of our initial literature search and screening process, a total of 214 records underwent full-text evaluation. Among these, 13 records were recognized as reviews or editorials, 89 records lacked pertinent or valuable data, 55 records did not involve children or adolescents, and 25 records were unavailable in full text. Consequently, 32 records were eligible for inclusion in our study ( Supplementary Table 1 ). Among these, we identified 15 cohort studies ( 10 , 17 – 30 ) and 18 case–control studies ( 11 , 12 , 31 – 45 ), all utilizing non-vaccination as a control group, comprising an expansive cohort of 17,177,822 individuals. Of the 33 studies (one record contains two studies) included, 29 (87.88%) evaluated the effectiveness of the BNT162b2 vaccine, six (18.18%) involved the efficacy of the CoronaVac vaccine, two each on the effectiveness of the mRNA-1273 and the BBIBP-CorV vaccine, and one on the ChAdOx1nCoV-19 vaccine.

www.frontiersin.org

Figure 1 . PRISMA flowchart.

The NOS scores indicated that all the studies included in the analysis demonstrated moderate to high methodological quality. Among them, 17 studies ( 10 , 17 – 19 , 23 – 26 , 28 , 29 , 32 , 34 , 35 , 38 , 43 , 45 ) were rated as high quality, while 16 studies ( 11 , 12 , 20 – 22 , 27 , 30 , 31 , 33 , 36 , 37 , 39 – 42 , 44 ) were considered to be of medium quality ( Supplementary Table 2 ).

We next conducted a meta-analysis on 33 studies with eligible data to explore the VE for COVID-19 vaccine among children in Omicron-dominant period. The overall RR was 0.55 (95% CI: 0.50 to 0.60, I 2  = 89%, p  < 0.01; Figure 2 ; VE: 45, 95% CI: 40 to 50%; Table 1 ). Moreover, we evaluated the possibility of publication bias. The funnel plot resembles an asymmetrical distribution ( Supplementary Figure 1 ). Egger’s test ( p  = 0.01; Supplementary Figure 2 ) and Begg’s test ( p  = 0.04; Supplementary Figure 3 ) showed publication bias. Therefore, we employed the trim-and-fill method to address publication bias, and we found that the results remained statistically significant after applying the trim-and-fill method ( Supplementary Figure 4 ). This indicates the stability and reliability of our results, further supporting the validity of our conclusions. Then we performed sensitivity analysis indicated that the results were robust through removing a single study each time ( Supplementary Figure 5 ).

www.frontiersin.org

Figure 2 . Forest plot for risk ratios on preventing Omicron infections. The red square symbolizes the point estimate for each study, with its size proportional to the study’s weight relative to the summary estimate. The black diamond symbol represents the overall effect estimate derived from the meta-analysis. Meta-analysis based on Random Effects model, inverse variance method (IV). Effect size estimates expressed in Log Risk Ratio [95%CI].

www.frontiersin.org

Table 1 . Overall effectiveness and vaccine effectiveness results of different vaccination regimens.

Additionally, subgroup analysis was conducted to identify potential factors that may influence the relationship between children’s vaccination and vaccine efficacy in preventing infections during the Omicron-dominant period. These factors included the dosages of vaccination, the classification of outcomes, and the interval between vaccine dosages.

Regardless of vaccination type, 14 studies ( 20 , 26 , 27 , 29 , 34 , 37 – 41 , 43 – 45 ) investigated VE of incomplete vaccination (1-dose) compared to non-vaccination individuals, revealing an overall RR of 0.77 (95% CI: 0.73 to 0.82, I 2  = 75%, p  < 0.01; Figure 3 ; VE: 23, 95% CI: 18 to 27%; Table 1 ). And a total of 32 studies ( 10 – 12 , 17 – 41 , 43 – 45 ) explored complete vaccination (2-dose), yielding an overall RR of 0.50 (95% CI: 0.45 to 0.56, I 2  = 93%, p  < 0.01; Figure 3 ; VE: 50, 95% CI: 44 to 55%; Table 1 ). Additionally, 10 studies ( 12 , 17 , 19 , 25 , 28 – 30 , 36 , 39 , 42 ) focused on booster vaccination (3-dose), presenting an overall RR of 0.39 (95% CI: 0.27 to 0.55, I 2  = 97%, p  < 0.01; Figure 3 ; VE: 61, 95% CI: 45 to 73%; Table 1 ).

www.frontiersin.org

Figure 3 . Forest plot for risk ratios of different vaccines dosages on preventing Omicron infections. The red square symbolizes the point estimate for each study, with its size proportional to the study’s weight relative to the summary estimate. The black diamond symbol represents the overall effect estimate derived from the meta-analysis. Meta-analysis based on Random Effects model, inverse variance method (IV). Effect size estimates expressed in Log Risk Ratio [95%CI].

For the VE of vaccine dosages among different vaccine types (BNT162b2 and CoronaVac), we performed a subgroup analysis. 11 studies analyzed VE of 1-dose BNT162b2 vaccine, revealing an overall RR of 0.78 (95% CI: 0.73 to 0.83, I 2  = 72%, p  < 0.01; Supplementary Figure 6 ; VE: 22, 95% CI: 17 to 27%; Table 1 ) compared to non-vaccination individuals. And a total of 28 studies explored 2-dose BNT162b2 vaccine compared to non-vaccination individuals, yielding an overall RR of 0.50 (95% CI: 0.45 to 0.56, I 2  = 94%, p  < 0.01; Supplementary Figure 6 ; VE: 50, 95% CI: 44 to 55%; Table 1 ). Additionally, seven studies focused on 3-dose BNT162b2 vaccine, presenting an overall RR of 0.38 (95% CI: 0.27 to 0.55, I 2  = 95%, p  < 0.01; Supplementary Figure 6 ; VE: 62, 95% CI: 45 to 73%; Table 1 ).

The effectiveness of the CoronaVac vaccine with 1-dose was investigated in five studies, demonstrating an overall RR of 0.86 (95% CI: 0.75 to 0.99, I 2  = 14%, p  = 0.32; Supplementary Figure 7 ; VE: 14, 95% CI: 1 to 25%; Table 1 ). For the 2-dose regimen, a total of six studies were evaluated, revealing an overall RR of 0.48 (95% CI: 0.34 to 0.70, I 2  = 90%, p  < 0.01; Supplementary Figure 7 ; VE: 52, 95% CI: 30 to 66%; Table 1 ). In addition, only one study looked at the efficacy of 3-dose CoronaVac vaccine, yielding a RR of 0.62 (95% CI: 0.51 to 0.76; Supplementary Figure 7 ; VE: 38, 95% CI: 24 to 49%; Table 1 ).

We subsequently conducted subgroup analyses within the complete vaccination group, focusing on distinct outcome measures. Nine studies ( 12 , 20 , 21 , 26 , 29 , 31 , 34 , 39 , 45 ) made mild COVID-19 as the outcome, presenting an overall RR of 0.59 (95% CI: 0.53 to 0.65, I 2  = 90%, p < 0.01; Figure 4 ; VE: 41, 95% CI:35 to 47%; Table 1 ). Meanwhile, complete vaccination could decrease the risks of Omicron associated severe COVID-19 with RR of 0.29 (95% CI: 0.21 to 0.40, I 2 = 92%, p  < 0.01; Figure 4 ; VE: 71, 95% CI: 60 to 79%; Table 1 ).

www.frontiersin.org

Figure 4 . Forest plot for risk ratios on various outcomes following Omicron infections. The red square symbolizes the point estimate for each study, with its size proportional to the study’s weight relative to the summary estimate. The black diamond symbol represents the overall effect estimate derived from the meta-analysis. Meta-analysis based on Random Effects model, inverse variance method (IV). Effect size estimates expressed in Log Risk Ratio [95%CI].

Many studies showed that the vaccine demonstrates its optimal protective effect within about 90 days following the second dose ( 13 , 18 , 46 , 47 ). Therefore, we conducted subgroup analyses in the complete vaccinated group using 90 days as a reference time point. Among the 33 studies included, 23 studies incorporated time-based monitoring of outcome indicators, while the remaining 10 studies did not specify temporal conditions. When combining all VE evaluations of complete vaccination within 90 days, the vaccination decreased infection by an overall RR of 0.46 (95% CI: 0.41 to 0.52, I 2  = 85%, p  < 0.01; Figure 5 ; VE: 54, 95% CI: 48 to 59%; Table 1 ). The cumulative effectiveness of vaccination over 90 days after the complete vaccination was 0.66 (95% CI 0.55 to 0.79, I 2  = 91%, p  < 0.01; Figure 5 ; VE: 34, 95% CI: 21 to 45%; Table 1 ) in the vaccinated cohort.

www.frontiersin.org

Figure 5 . Forest plot for risk ratios of different time intervals after 2-dose vaccination on preventing Omicron infections. The red square symbolizes the point estimate for each study, with its size proportional to the study’s weight relative to the summary estimate. The black diamond symbol represents the overall effect estimate derived from the meta-analysis. Meta-analysis based on Random Effects model, inverse variance method (IV). Effect size estimates expressed in Log Risk Ratio [95%CI].

We also explored the effects of vaccination at different time intervals across two outcomes. The VE against omicron mild COVID-19 in 90 days before and after were 0.49 (95% CI: 0.44 to 0.55, I 2  = 85%, p  < 0.01; Figure 6 ; VE: 51, 95% CI: 45 to 56%; Table 1 ) and 0.75 (95% CI: 0.65 to 0.85, I 2  = 93%, p  < 0.01; Figure 6 ; VE: 25, 95% CI: 15 to 35%; Table 1 ), respectively. Studies evaluated the VE, which decreased with time after receipt of the second dose, over time for the recent vaccination. As for the severe COVID-19, intervals less than 90 days or more than 90 days was associated with a decreased risk for Omicron with RR 0.24 (95% CI 0.16 to 0.35, I 2  = 79%, p  < 0.01; Figure 7 ; VE: 76, 95% CI: 65 to 84%; Table 1 ) and RR 0.44 (95% CI 0.27 to 0.72, I 2  = 95%, p  < 0.01; Figure 7 ; VE: 56, 95% CI: 28 to 73%; Table 1 ), respectively.

www.frontiersin.org

Figure 6 . Forest plot for risk ratios of different time intervals after 2-dose vaccination on preventing mild COVID-19. The red square symbolizes the point estimate for each study, with its size proportional to the study’s weight relative to the summary estimate. The black diamond symbol represents the overall effect estimate derived from the meta-analysis. Meta-analysis based on Random Effects model, inverse variance method (IV). Effect size estimates expressed in Log Risk Ratio [95%CI].

www.frontiersin.org

Figure 7 . Forest plot for risk ratios of different time intervals after 2-dose vaccination on preventing severe COVID-19. The red square symbolizes the point estimate for each study, with its size proportional to the study’s weight relative to the summary estimate. The black diamond symbol represents the overall effect estimate derived from the meta-analysis. Meta-analysis based on Random Effects model, inverse variance method (IV). Effect size estimates expressed in Log Risk Ratio [95%CI].

4 Discussion

The study focuses on the efficacy of COVID-19 vaccination among children and adolescents aged 0–19 during the Omicron-dominant period. The study shows that the vaccine provided protection against SARS-CoV-2 infections during the Omicron-dominant period. The VE trend is increasing with the additional booster vaccination regimen, and its efficacy varies for distinct SARS-CoV-2 infections within the 2-dose vaccination. The vaccine offered greater protection against severe COVID-19 during Omicron epidemic phase compared to mild COVID-19. A gradual decline in the efficacy of the COVID-19 vaccination against both mild and severe COVID-19 was observed over time, with a notable decline occurring after 90 days.

Our study estimates the VE against SARS-CoV-2 infections during Omicron-dominant phase was significantly lower than the VE during Delta-dominant phase ( 14 , 48 ). The decline in VE during the Omicron-dominant period may be due to the increased incidence of breakthrough infections associated with the Omicron variant, along with the rapid and infectious transmission of this variant ( 49 , 50 ). Furthermore, the enhanced potential of Omicron variant for immune evasion compared to the Delta variant may be involved in this phenomenon ( 51 – 53 ).

According to our study findings, there was a positive correlation between VE and the number of vaccinations administered. As the number of vaccinations increases, the VE gradually increases as well. It is well established that vaccine-induced immunity decreases over time ( 32 ). However, the increased dosages of the vaccine could maintain and generate favorable antibody, B-cell, and T-cell responses, thus providing robust protection to the body ( 54 ).

For vaccination intervals, some studies have suggested that a 3-month interval may be preferable to a vaccination program with the shorter intervals, which protects the largest number of individuals in the population as early as possible in case of supply shortages. The vaccination also improving protection after receiving the second dose ( 13 ). Furthermore, regulatory authorities in countries such as the United Kingdom have approved 2-dose intervals of up to 3 months for viral vector and mRNA vaccines ( 55 , 56 ). The relative VE of the booster vaccination given more than 3 months after the second vaccination was 84.4% and the absolute VE for symptomatic SARS-CoV-2 infection was 94.0% in adults over 50 years of age compared with unvaccinated participants ( 57 ). However, some studies have found that the highest antibody response could occur in the first month after vaccination, but immunity declined rapidly in the next 3–4 months, with the peak antibody titers decreased by almost 4–5.5 times ( 58 , 59 ). They supported that vaccine protection against Omicron variant infection waned within 3 months after the second dose, suggesting that a shorter interval between the second vaccination and booster may be beneficial ( 60 ). However, our study found no significant decline in VE for severe COVID-19 observed over the 3-month interval after the second vaccination. The most significant decline of VE was observed against mild COVID-19, where efficacy decreased by approximately 50% over the 3-month interval after second vaccination.

While COVID-19 vaccines provide steady protection against severe SARS-CoV-2 infections, their efficacy in preventing mild SARS-CoV-2 infections has been reduced, particularly during the Omicron-dominant period ( 61 – 63 ). It is well-established that antibodies or localized memory immune responses are the primary determinants of infection. COVID-19 vaccines primarily generate a systemic immune response, where immunoglobulin G (IgG) circulates in serum and body fluids as the primary functional component ( 64 ). With prophylactic vaccination, IgG antibodies remain in the serum for a certain duration. To prevent viral infection effectively, vaccine-induced serum IgG antibodies must enter the respiratory tract, coming into direct contact with lung endothelial cell surfaces to neutralize a viral infection ( 65 , 66 ). However, due to a limited number of specific antibodies reaching the upper respiratory tract and gradual decreases in antibody concentrations over time ( 67 ), the COVID-19 vaccine’s immune response is ineffective to prevent virus replication in the upper respiratory tract. The reduction of local antibodies in the upper respiratory tract weakens the protective effect against antiviral infection, leading to decreased defense against mild COVID-19. However, the antibody library present in circulating blood in the lung efficiently blocks the virus from attacking the alveolar epithelium and capillary endothelium ( 68 , 69 ), limiting severe pulmonary infections. In addition, Omicron cross-reactive T-cells and immune memory B-cells located throughout the body can be swiftly engaged upon encounter with a viral infection ( 70 ), producing an effective B and T cell-specific immune response ( 71 ). The immune memory B-cells produce large amounts of targeted antibodies to protect against the spread and replication of viruses, helping to prevent the onset of severe COVID-19 ( 72 , 73 ). Accordingly, children who received complete vaccination during the Omicron-dominant period experienced a reduced risk of severe SARS-CoV-2 infection.

This meta-analysis has several merits. First, eligible studies were retrieved from current major literature databases to minimize the risk of omitting relevant studies. Second, all included studies were published after the emergence and spread of Omicron variants, and these data were representative of the Omicron epidemic period. Third, the research data of included studies was obtained from national electronic medical databases, providing a representative population sample and a large sample size. Finally, all included studies were of high or moderate methodological quality, providing high reliability for the meta-analysis.

Our study also had several limitations. Firstly, meta-analyses of VE show a high degree of heterogeneity. Although sensitivity and subgroup analyses were performed to identify possible sources of heterogeneity, it appears that heterogeneity was not because of the degrees of infection, dosages of vaccination, or vaccination intervals. The type of vaccination, the body’s immune response, and variations in population characteristics might be responsible for the heterogeneity. However, due to insufficient data available from the included studies to stratify these variations, identifying the source of heterogeneity was challenging. Secondly, the findings regarding VE against SARS-CoV-2 infections in children and adolescents during the Omicron-dominant period may exhibit minor bias. However, further studies with larger sample sizes are warranted. In addition, some of the included studies did not provide exact time after second vaccination to evaluation of VE. Therefore, the evaluation of efficacy at longer time nodes in this study is relatively limited. The VE at different time nodes is still uncertain. Although boosters may improve efficacy, the timing of boosters remain to be further investigated.

5 Conclusion

During the period dominated by the Omicron variant, vaccination has demonstrated its ability to reduce the risk of SARS-CoV-2 infection in children and adolescents aged 0 to 19 years. The effectiveness of the vaccine becomes more pronounced as the number of dosages increases. Two doses vaccination significantly reduces the risk of severe COVID-19. The protection was still present but decreased over 90 days after the second vaccine regimen.

Author contributions

WL: Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing. SZ: Data curation, Investigation, Writing – original draft. YY: Data curation, Investigation, Writing – original draft. YL: Data curation, Formal analysis, Writing – original draft. TR: Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This project was funded by 1.3.5 project for disciplines of excellence, West China Hospital, Sichuan University (ZYGD22009).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2024.1338208/full#supplementary-material

1. McKee, M, and Stuckler, D. If the world fails to protect the economy, COVID-19 will damage health not just now but also in the future. Nat Med . (2020) 26:640–2. doi: 10.1038/s41591-020-0863-y

PubMed Abstract | Crossref Full Text | Google Scholar

2. Johns Hopkins University Coronavirus Resource Center. COVID-19 dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) (2023). Available from: https://coronavirus.jhu.edu/map.html .

Google Scholar

3. Singh, J, Pandit, P, McArthur, AG, Banerjee, A, and Mossman, K. Evolutionary trajectory of SARS-CoV-2 and emerging variants. Virol J . (2021) 18:166. doi: 10.1186/s12985-021-01633-w

4. Karim, SSA, and Karim, QA. Omicron SARS-CoV-2 variant: a new chapter in the COVID-19 pandemic. Lancet . (2021) 398:2126–8. doi: 10.1016/S0140-6736(21)02758-6

5. Kozlov, M . Does omicron hit kids harder? Scientists are trying to find out. Nature . (2022). doi: 10.1038/d41586-022-00309-x

Crossref Full Text | Google Scholar

6. Shi, DS, Whitaker, M, Marks, KJ, Anglin, O, Milucky, J, Patel, K, et al. Hospitalizations of children aged 5-11 years with laboratory-confirmed COVID-19 - COVID-NET, 14 states, march 2020-February 2022. MMWR Morb Mortal Wkly Rep . (2022) 71:574–81. doi: 10.15585/mmwr.mm7116e1

7. Watson, OJ, Barnsley, G, Toor, J, Hogan, AB, Winskill, P, and Ghani, AC. Global impact of the first year of COVID-19 vaccination: a mathematical modelling study. Lancet Infect Dis . (2022) 22:1293–302. doi: 10.1016/S1473-3099(22)00320-6

8. Voysey, M, Clemens, SAC, Madhi, SA, Weckx, LY, Folegatti, PM, Aley, PK, et al. Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK. Lancet . (2021) 397:99–111. doi: 10.1016/S0140-6736(20)32661-1

9. Yang, ZR, Jiang, YW, Li, FX, Liu, D, Lin, TF, Zhao, ZY, et al. Efficacy of SARS-CoV-2 vaccines and the dose-response relationship with three major antibodies: a systematic review and meta-analysis of randomised controlled trials. Lancet Microbe . (2023) 4:e236–46. doi: 10.1016/S2666-5247(22)00390-1

10. Jara, A, Undurraga, EA, Zubizarreta, JR, González, C, Acevedo, J, Pizarro, A, et al. Effectiveness of CoronaVac in children 3-5 years of age during the SARS-CoV-2 omicron outbreak in Chile. Nat Med . (2022) 28:1377–80. doi: 10.1038/s41591-022-01874-4

11. Price, AM, Olson, SM, Newhams, MM, Halasa, NB, Boom, JA, Sahni, LC, et al. BNT162b2 protection against the omicron variant in children and adolescents. N Engl J Med . (2022) 386:1899–909. doi: 10.1056/NEJMoa2202826

12. Fleming-Dutra, KE, Britton, A, Shang, N, Derado, G, Link-Gelles, R, Accorsi, EK, et al. Association of Prior BNT162b2 COVID-19 vaccination with symptomatic SARS-CoV-2 infection in children and adolescents during omicron predominance. JAMA . (2022) 327:2210–9. doi: 10.1001/jama.2022.7493

13. Voysey, M, Costa Clemens, SA, Madhi, SA, Weckx, LY, Folegatti, PM, Aley, PK, et al. Single-dose administration and the influence of the timing of the booster dose on immunogenicity and efficacy of ChAdOx1 nCoV-19 (AZD1222) vaccine: a pooled analysis of four randomised trials. Lancet . (2021) 397:881–91. doi: 10.1016/S0140-6736(21)00432-3

14. Polack, FP, Thomas, SJ, Kitchin, N, Absalon, J, Gurtman, A, Lockhart, S, et al. Safety and efficacy of the BNT162b2 mRNA Covid-19 vaccine. N Engl J Med . (2020) 383:2603–15. doi: 10.1056/NEJMoa2034577

15. Olson, SM, Newhams, MM, Halasa, NB, Price, AM, Boom, JA, Sahni, LC, et al. Effectiveness of BNT162b2 vaccine against critical Covid-19 in adolescents. N Engl J Med . (2022) 386:713–23. doi: 10.1056/NEJMoa2117995

16. Piechotta, V, Siemens, W, Thielemann, I, Toews, M, Koch, J, Vygen-Bonnet, S, et al. Safety and effectiveness of vaccines against COVID-19 in children aged 5-11 years: a systematic review and meta-analysis. Lancet Child & adolescent heal . (2023) 7:379–91. doi: 10.1016/S2352-4642(23)00078-0

17. Amir, O, Goldberg, Y, Mandel, M, Bar-On, YM, Bodenheimer, O, Freedman, L, et al. Initial protection against SARS-CoV-2 omicron lineage infection in children and adolescents by BNT162b2 in Israel: an observational study. Lancet Infect Dis . (2023) 23:67–73. doi: 10.1016/S1473-3099(22)00527-8

18. Chemaitelly, H, AlMukdad, S, Ayoub, HH, Altarawneh, HN, Coyle, P, Tang, P, et al. Covid-19 vaccine protection among children and adolescents in Qatar. N Engl J Med . (2022) 387:1865–76. doi: 10.1056/NEJMoa2210058

19. Chiew, CJ, Premikha, M, Chong, CY, Wei, WE, Ong, B, Lye, DC, et al. Effectiveness of primary series and booster vaccination against SARS-CoV-2 infection and hospitalisation among adolescents aged 12-17 years in Singapore: a national cohort study. Lancet Infect Dis . (2022) 23:177–82. doi: 10.1016/S1473-3099(22)00573-4

20. Cohen-Stavi, CJ, Magen, O, Barda, N, Yaron, S, Peretz, A, Netzer, D, et al. BNT162b2 vaccine effectiveness against omicron in children 5 to 11 years of age. N Engl J Med . (2022) 387:227–36. doi: 10.1056/NEJMoa2205011

21. Dorabawila, V, Hoefer, D, Bauer, UE, Bassett, MT, Lutterloh, E, and Rosenberg, ES. Risk of infection and hospitalization among vaccinated and unvaccinated children and adolescents in New York after the emergence of the omicron variant. JAMA . (2022) 327:2242–4. doi: 10.1001/jama.2022.7319

22. Fowlkes, AL, Yoon, SK, Lutrick, K, Gwynn, L, Burns, J, Grant, L, et al. Effectiveness of 2-dose BNT162b2 (Pfizer BioNTech) mRNA vaccine in preventing SARS-CoV-2 infection among children aged 5-11 years and adolescents aged 12-15 years - PROTECT cohort, July 2021-February 2022. MMWR Morb Mortal Wkly Rep . (2022) 71:422–8. doi: 10.15585/mmwr.mm7111e1

23. González, S, Olszevicki, S, Gaiano, A, Baino, ANV, Regairaz, L, Salazar, M, et al. Effectiveness of BBIBP-CorV, BNT162b2 and mRNA-1273 vaccines against hospitalisations among children and adolescents during the omicron outbreak in Argentina: a retrospective cohort study. Lancet Reg Health Am . (2022) 13:100316. doi: 10.2139/ssrn.4087375

24. Nordström, P, Ballin, M, and Nordström, A. Safety and effectiveness monovalent of COVID-19 mRNA vaccination and risk factors for hospitalisation caused by the omicron variant in 0.8 million adolescents: A nationwide cohort study in Sweden. PLoS Med . (2023) 20:e1004127. doi: 10.1371/journal.pmed.1004127

25. Risk, M, Miao, H, Freed, G, and Shen, C. Vaccine effectiveness, school reopening, and Risk of omicron infection among adolescents aged 12–17 years. J Adolesc Health . (2022) 72:147–52. doi: 10.1016/j.jadohealth.2022.09.006

26. Rudan, I, Millington, T, Antal, K, Grange, Z, Fenton, L, Sullivan, C, et al. BNT162b2 COVID-19 vaccination uptake, safety, effectiveness and waning in children and young people aged 12-17 years in Scotland. Lancet Regional Health-Europe . (2022) 23:100513. doi: 10.1016/j.lanepe.2022.100513

27. Tan, SHX, Cook, AR, Heng, D, Ong, B, Lye, DC, and Tan, KB. Effectiveness of BNT162b2 vaccine against omicron in children 5 to 11 years of age. N Engl J Med . (2022) 387:525–32. doi: 10.1056/NEJMoa2203209

28. Tartof, SY, Frankland, TB, Slezak, JM, Puzniak, L, Hong, V, Xie, F, et al. Effectiveness associated with BNT162b2 vaccine against emergency department and urgent care encounters for Delta and omicron SARS-CoV-2 infection among adolescents aged 12 to 17 years. JAMA Netw Open . (2022) 5:E2225162. doi: 10.1001/jamanetworkopen.2022.25162

29. Tsang, NNY, So, HC, Cowling, BJ, Leung, GM, and Ip, DKM. Effectiveness of BNT162b2 and CoronaVac COVID-19 vaccination against asymptomatic and symptomatic infection of SARS-CoV-2 omicron BA.2 in Hong Kong: a prospective cohort study. Lancet Infect Dis . (2023) 23:421–434. doi: 10.2139/ssrn.4200539

30. Wanlapakorn, N, Kanokudom, S, Phowatthanasathian, H, Chansaenroj, J, Suntronwong, N, Assawakosri, S, et al. Comparison of the reactogenicity and immunogenicity between two-dose mRNA COVID-19 vaccine and inactivated COVID-19 vaccine followed by an mRNA vaccine in children aged 5-11 years. J Med Virol . (2023). 95:e28758. doi: 10.1002/jmv.28758

31. Buchan, SA, Nguyen, L, Wilson, SE, Kitchen, SA, and Kwong, JC. Vaccine effectiveness of BNT162b2 against Delta and Omicron Variants in Adolescents. Pediatrics . (2022) 150:e2022057634. doi: 10.1542/peds.2022-057634

32. Castelli, JM, Rearte, A, Olszevicki, S, Voto, C, Del Valle, JM, Pesce, M, et al. Effectiveness of mRNA-1273, BNT162b2, and BBIBP-CorV vaccines against infection and mortality in children in Argentina, during predominance of delta and omicron covid-19 variants: test negative, case-control study. BMJ . (2022) 379:e073070. doi: 10.1136/bmj-2022-073070

33. Cocchio, S, Zabeo, F, Tremolada, G, Facchin, G, Venturato, G, Marcon, T, et al. COVID-19 vaccine effectiveness against omicron variant among underage subjects: the Veneto Region’s experience. Vaccine . (2022) 10:1362. doi: 10.3390/vaccines10081362

34. Florentino, PTV, Millington, T, Cerqueira-Silva, T, Robertson, C, de Araújo, OV, Júnior, JBS, et al. Vaccine effectiveness of two-dose BNT162b2 against symptomatic and severe COVID-19 among adolescents in Brazil and Scotland over time: a test-negative case-control study. Lancet Infect Dis . (2022) 22:1577–86. doi: 10.1016/S1473-3099(22)00451-0

35. Jang, EJ, Choe, YJ, Kim, RK, and Park, YJ. BNT162b2 vaccine effectiveness against the SARS-CoV-2 omicron variant in children aged 5 to 11 years. JAMA Pediatr . (2023) 177:319–20. doi: 10.1001/jamapediatrics.2022.5221

36. Klein, NP, Stockwell, MS, Demarco, M, Gaglani, M, Kharbanda, AB, Irving, SA, et al. Effectiveness of COVID-19 Pfizer-BioNTech BNT162b2 mRNA vaccination in preventing COVID-19-associated emergency department and urgent care encounters and hospitalizations among nonimmunocompromised children and adolescents aged 5-17 years - VISION network, 10 states, April 2021-January 2022. MMWR Morb Mortal Wkly Rep . (2022) 71:352–8. doi: 10.15585/mmwr.mm7109e3

37. Leung, D, Rosa Duque, JS, Yip, KM, So, HK, Wong, WHS, and Lau, YL. Effectiveness of BNT162b2 and CoronaVac in children and adolescents against SARS-CoV-2 infection during omicron BA.2 wave in Hong Kong. Commun Med . (2023) 3:3. doi: 10.1038/s43856-022-00233-1

38. Oliveira, EA, Oliveira, MCL, Colosimo, EA, Simões, ESAC, Mak, RH, Vasconcelos, MA, et al. Vaccine effectiveness against SARS-CoV-2 variants in adolescents from 15 to 90 days after second dose: a population-based test-negative case-control study. J Pediatr . (2022) 253:189–196.e2. doi: 10.1016/j.jpeds.2022.09.039

39. Powell, AA, Kirsebom, F, Stowe, J, Ramsay, ME, Lopez-Bernal, J, Andrews, N, et al. Protection against symptomatic infection with delta (B.1.617.2) and omicron (B.1.1.529) BA.1 and BA.2 SARS-CoV-2 variants after previous infection and vaccination in adolescents in England, august, 2021-march, 2022: a national, observational, test-negative, case-control study. Lancet Infect Dis . (2022) 23:435–44. doi: 10.1016/S1473-3099(22)00729-0

40. Rosa Duque, JS, Leung, D, Yip, KM, Lee, DHL, So, HK, Wong, WHS, et al. COVID-19 vaccines versus pediatric hospitalization. Cell Rep Med . (2023). 4:100936. doi: 10.1016/j.xcrm.2023.100936

41. Sacco, C, Del Manso, M, Mateo-Urdiales, A, Rota, MC, Petrone, D, Riccardo, F, et al. Effectiveness of BNT162b2 vaccine against SARS-CoV-2 infection and severe COVID-19 in children aged 5-11 years in Italy: a retrospective analysis of January-April, 2022. Lancet . (2022) 400:97–103. doi: 10.1016/S0140-6736(22)01185-0

42. Saito, Y, Yamamoto, K, Takita, M, Kami, M, Tsubokura, M, and Shibuya, K. Effectiveness of the booster of SARS-CoV-2 vaccine among Japanese adolescents: a cohort study. Vaccine . (2022) 10:1914. doi: 10.3390/vaccines10111914

43. Simmons, AE, Amoako, A, Grima, AA, Murison, KR, Buchan, SA, Fisman, DN, et al. Vaccine effectiveness against hospitalization among adolescent and pediatric SARS-CoV-2 cases between May 2021 and January 2022 in Ontario, Canada: A retrospective cohort study. PLoS One . (2023) 18:e0283715. doi: 10.1371/journal.pone.0283715

44. Wang, X, Chang, H, Tian, H, Zhu, Y, Li, J, Wei, Z, et al. Epidemiological and clinical features of SARS-CoV-2 infection in children during the outbreak of omicron variant in Shanghai, march 7-31, 2022. Influenza Other Respir Viruses . (2022) 16:1059–65. doi: 10.1111/irv.13044

45. Florentino, PTV, Alves, FJO, Cerqueira-Silva, T, Oliveira, VA, Junior, JBS, Jantsch, AG, et al. Vaccine effectiveness of CoronaVac against COVID-19 among children in Brazil during the omicron period. Nat Commun . (2022) 13:4756. doi: 10.1038/s41467-022-32524-5

46. Prunas, O, Warren, JL, Crawford, FW, Gazit, S, Patalon, T, Weinberger, DM, et al. Vaccination with BNT162b2 reduces transmission of SARS-CoV-2 to household contacts in Israel. Science . (2022) 375:1151–4. doi: 10.1126/science.abl4292

47. Creech, CB, Anderson, E, Berthaud, V, Yildirim, I, Atz, AM, Melendez Baez, I, et al. Evaluation of mRNA-1273 Covid-19 vaccine in children 6 to 11 years of age. N Engl J Med . (2022) 386:2011–23. doi: 10.1056/NEJMoa2203315

48. Frenck, RW Jr, Klein, NP, Kitchin, N, Gurtman, A, Absalon, J, Lockhart, S, et al. Safety, immunogenicity, and efficacy of the BNT162b2 Covid-19 vaccine in adolescents. N Engl J Med . (2021) 385:239–50. doi: 10.1056/NEJMoa2107456

49. Viana, R, Moyo, S, Amoako, DG, Tegally, H, Scheepers, C, Althaus, CL, et al. Rapid epidemic expansion of the SARS-CoV-2 omicron variant in southern Africa. Nature . (2022) 603:679–86. doi: 10.1038/s41586-022-04411-y

50. Lyngse, FP, Mortensen, LH, Denwood, MJ, Christiansen, LE, Møller, CH, Skov, RL, et al. Household transmission of the SARS-CoV-2 omicron variant in Denmark. Nat Commun . (2022) 13:5573. doi: 10.1038/s41467-022-33328-3

51. Hoffmann, M, Krüger, N, Schulz, S, Cossmann, A, Rocha, C, Kempf, A, et al. The omicron variant is highly resistant against antibody-mediated neutralization: implications for control of the COVID-19 pandemic. Cell . (2022) 185:447–56.e11. doi: 10.1016/j.cell.2021.12.032

52. Planas, D, Saunders, N, Maes, P, Guivel-Benhassine, F, Planchais, C, Buchrieser, J, et al. Considerable escape of SARS-CoV-2 omicron to antibody neutralization. Nature . (2022) 602:671–5. doi: 10.1038/s41586-021-04389-z

53. Ai, J, Zhang, H, Zhang, Y, Lin, K, Zhang, Y, Wu, J, et al. Omicron variant showed lower neutralizing sensitivity than other SARS-CoV-2 variants to immune sera elicited by vaccines after boost. Emerg Microbes Infect . (2022) 11:337–43. doi: 10.1080/22221751.2021.2022440

54. Payne, RP, Longet, S, Austin, JA, Skelly, DT, Dejnirattisai, W, Adele, S, et al. Immunogenicity of standard and extended dosing intervals of BNT162b2 mRNA vaccine. Cell . (2021) 184:5699–714.e11. doi: 10.1016/j.cell.2021.10.011

55. Department of Health & Social Care GU. Statement from the UK chief medical officers on the prioritisation of first doses of COVID-19 vaccines (2020) Available from: https://www.gov.uk/government/news/statement-from-the-uk-chief-medical-officers-on-the-prioritisation-of-first-doses-of-covid-19-vaccines .

56. Department of Health & Social Care GU. Oxford university/AstraZeneca vaccine authorised by UK medicines regulator (2020). Available from: https://www.gov.uk/government/news/oxford-universityastrazeneca-vaccine-authorised-by-uk-medicines-regulator .

57. Andrews, N, Stowe, J, Kirsebom, F, Toffa, S, Sachdeva, R, Gower, C, et al. Effectiveness of COVID-19 booster vaccines against COVID-19-related symptoms, hospitalization and death in England. Nat Med . (2022) 28:831–7. doi: 10.1038/s41591-022-01699-1

58. Regev-Yochay, G, Gonen, T, Gilboa, M, Mandelboim, M, Indenbaum, V, Amit, S, et al. Efficacy of a fourth dose of Covid-19 mRNA vaccine against omicron. N Engl J Med . (2022) 386:1377–80. doi: 10.1056/NEJMc2202542

59. Agrawal, U, Bedston, S, McCowan, C, Oke, J, Patterson, L, Robertson, C, et al. Severe COVID-19 outcomes after full vaccination of primary schedule and initial boosters: pooled analysis of national prospective cohort studies of 30 million individuals in England, Northern Ireland, Scotland, and Wales. Lancet . (2022) 400:1305–20. doi: 10.1016/S0140-6736(22)01656-7

60. Tseng, HF, Ackerson, BK, Luo, Y, Sy, LS, Talarico, CA, Tian, Y, et al. Effectiveness of mRNA-1273 against SARS-CoV-2 omicron and Delta variants. Nat Med . (2022) 28:1063–71. doi: 10.1038/s41591-022-01753-y

61. Rosenberg, ES, Holtgrave, DR, Dorabawila, V, Conroy, M, Greene, D, Lutterloh, E, et al. New COVID-19 cases and hospitalizations among adults, by vaccination status - New York, may 3-July 25, 2021. MMWR Morb Mortal Wkly Rep . (2021) 70:1150–5. doi: 10.15585/mmwr.mm7034e1

62. Feikin, DR, Higdon, MM, Abu-Raddad, LJ, Andrews, N, Araos, R, Goldberg, Y, et al. Duration of effectiveness of vaccines against SARS-CoV-2 infection and COVID-19 disease: results of a systematic review and meta-regression. Lancet . (2022) 399:924–44. doi: 10.1016/S0140-6736(22)00152-0

63. Hodgson, SH, Mansatta, K, Mallett, G, Harris, V, Emary, KRW, and Pollard, AJ. What defines an efficacious COVID-19 vaccine? A review of the challenges assessing the clinical efficacy of vaccines against SARS-CoV-2. Lancet Infect Dis . (2021) 21:e26–35. doi: 10.1016/S1473-3099(20)30773-8

64. Lund, FE, and Randall, TD. Scent of a vaccine. Science . (2021) 373:397–9. doi: 10.1126/science.abg9857

65. Reynolds, HY . Immunoglobulin G and its function in the human respiratory tract. Mayo Clin Proc . (1988) 63:161–74. doi: 10.1016/S0025-6196(12)64949-0

66. Mades, A, Chellamathu, P, Kojima, N, Lopez, L, MacMullan, MA, Denny, N, et al. Detection of persistent SARS-CoV-2 IgG antibodies in oral mucosal fluid and upper respiratory tract specimens following COVID-19 mRNA vaccination. Sci Rep . (2021) 11:24448. doi: 10.1038/s41598-021-03931-3

67. Yang, H, Xie, Y, and Li, C. Understanding the mechanisms for COVID-19 vaccine's protection against infection and severe disease. Expert Rev Vaccines . (2023) 22:186–92. doi: 10.1080/14760584.2023.2174529

68. Varona, JF, Muñiz, J, Balboa-Barreiro, V, Peñalver, F, Abarca, E, Almirall, C, et al. Persistence and waning of natural SARS-CoV-2 antibodies over 18 months: long-term durability of IgG humoral response in healthcare workers. J Gen Intern Med . (2022) 37:2614–6. doi: 10.1007/s11606-022-07652-9

69. Hewitt, RJ, and Lloyd, CM. Regulation of immune responses by the airway epithelial cell landscape. Nat Rev Immunol . (2021) 21:347–62. doi: 10.1038/s41577-020-00477-9

70. Hawman, DW, Meade-White, K, Archer, J, Leventhal, SS, Wilson, D, Shaia, C, et al. SARS-CoV2 variant-specific replicating RNA vaccines protect from disease following challenge with heterologous variants of concern. eLife . (2022) 11:11. doi: 10.7554/eLife.75537

71. Cinicola, BL, Piano Mortari, E, Zicari, AM, Agrati, C, Bordoni, V, Albano, C, et al. The BNT162b2 vaccine induces humoral and cellular immune memory to SARS-CoV-2 Wuhan strain and the omicron variant in children 5 to 11 years of age. Front Immunol . (2022) 13:1094727. doi: 10.3389/fimmu.2022.1094727

72. Khoury, DS, Docken, SS, Subbarao, K, Kent, SJ, Davenport, MP, and Cromer, D. Predicting the efficacy of variant-modified COVID-19 vaccine boosters. Nat Med . (2023) 29:574–8. doi: 10.1038/s41591-023-02228-4

73. Yegiazaryan, A, Abnousian, A, Alexander, LJ, Badaoui, A, Flaig, B, Sheren, N, et al. Recent developments in the understanding of immunity, pathogenesis and management of COVID-19. Int J Mol Sci . (2022) 23:9297. doi: 10.3390/ijms23169297

Keywords: SARS-CoV-2 variants, Omicron, COVID-19 vaccines, child, adolescent

Citation: Lu W, Zeng S, Yao Y, Luo Y and Ruan T (2024) The effect of COVID-19 vaccine to the Omicron variant in children and adolescents: a systematic review and meta-analysis. Front. Public Health . 12:1338208. doi: 10.3389/fpubh.2024.1338208

Received: 14 November 2023; Accepted: 27 March 2024; Published: 10 April 2024.

Reviewed by:

Copyright © 2024 Lu, Zeng, Yao, Luo and Ruan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Tiechao Ruan, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Read our research on: Gun Policy | International Conflict | Election 2024

Regions & Countries

Changing partisan coalitions in a politically divided nation, party identification among registered voters, 1994-2023.

Pew Research Center conducted this analysis to explore partisan identification among U.S. registered voters across major demographic groups and how voters’ partisan affiliation has shifted over time. It also explores the changing composition of voters overall and the partisan coalitions.

For this analysis, we used annual totals of data from Pew Research Center telephone surveys (1994-2018) and online surveys (2019-2023) among registered voters. All telephone survey data was adjusted to account for differences in how people respond to surveys on the telephone compared with online surveys (refer to Appendix A for details).

All online survey data is from the Center’s nationally representative American Trends Panel . The surveys were conducted in both English and Spanish. Each survey is weighted to be representative of the U.S. adult population by gender, age, education, race and ethnicity and other categories. Read more about the ATP’s methodology , as well as how Pew Research Center measures many of the demographic categories used in this report .

The contours of the 2024 political landscape are the result of long-standing patterns of partisanship, combined with the profound demographic changes that have reshaped the United States over the past three decades.

Many of the factors long associated with voters’ partisanship remain firmly in place. For decades, gender, race and ethnicity, and religious affiliation have been important dividing lines in politics. This continues to be the case today.

Pie chart showing that in 2023, 49% of registered voters identify as Democrats or lean toward the Democratic Party, while 48% identify as Republicans or lean Republican.

Yet there also have been profound changes – in some cases as a result of demographic change, in others because of dramatic shifts in the partisan allegiances of key groups.

The combined effects of change and continuity have left the country’s two major parties at virtual parity: About half of registered voters (49%) identify as Democrats or lean toward the Democratic Party, while 48% identify as Republicans or lean Republican.

In recent decades, neither party has had a sizable advantage, but the Democratic Party has lost the edge it maintained from 2017 to 2021. (Explore this further in Chapter 1 . )

Pew Research Center’s comprehensive analysis of party identification among registered voters – based on hundreds of thousands of interviews conducted over the past three decades – tracks the changes in the country and the parties since 1994. Among the major findings:

Bar chart showing that growing racial and ethnic diversity among voters has had a far greater impact on the composition of the Democratic Party than the Republican Party.

The partisan coalitions are increasingly different. Both parties are more racially and ethnically diverse than in the past. However, this has had a far greater impact on the composition of the Democratic Party than the Republican Party.

The share of voters who are Hispanic has roughly tripled since the mid-1990s; the share who are Asian has increased sixfold over the same period. Today, 44% of Democratic and Democratic-leaning voters are Hispanic, Black, Asian, another race or multiracial, compared with 20% of Republicans and Republican leaners. However, the Democratic Party’s advantages among Black and Hispanic voters, in particular, have narrowed somewhat in recent years. (Explore this further in Chapter 8 .)

Trend chart comparing voters in 1996 and 2023, showing that since 1996, voters without a college degree have declined as a share of all voters, and they have shifted toward the Republican Party. It’s the opposite for college graduate voters.

Education and partisanship: The share of voters with a four-year bachelor’s degree keeps increasing, reaching 40% in 2023. And the gap in partisanship between voters with and without a college degree continues to grow, especially among White voters. More than six-in-ten White voters who do not have a four-year degree (63%) associate with the Republican Party, which is up substantially over the past 15 years. White college graduates are closely divided; this was not the case in the 1990s and early 2000s, when they mostly aligned with the GOP. (Explore this further in Chapter 2 .)

Beyond the gender gap: By a modest margin, women voters continue to align with the Democratic Party (by 51% to 44%), while nearly the reverse is true among men (52% align with the Republican Party, 46% with the Democratic Party). The gender gap is about as wide among married men and women. The gap is wider among men and women who have never married; while both groups are majority Democratic, 37% of never-married men identify as Republicans or lean toward the GOP, compared with 24% of never-married women. (Explore this further in Chapter 3 .)

A divide between old and young: Today, each younger age cohort is somewhat more Democratic-oriented than the one before it. The youngest voters (those ages 18 to 24) align with the Democrats by nearly two-to-one (66% to 34% Republican or lean GOP); majorities of older voters (those in their mid-60s and older) identify as Republicans or lean Republican. While there have been wide age divides in American politics over the last two decades, this wasn’t always the case; in the 1990s there were only very modest age differences in partisanship. (Explore this further in Chapter 4 .)

Dot plot chart by income tier showing that registered voters without a college degree differ substantially by income in their party affiliation. Non-college voters with middle, upper-middle and upper family incomes tend to align with the GOP. A majority with lower and lower-middle incomes identify as Democrats or lean Democratic.

Education and family income: Voters without a college degree differ substantially by income in their party affiliation. Those with middle, upper-middle and upper family incomes tend to align with the GOP. A majority with lower and lower-middle incomes identify as Democrats or lean Democratic. There are no meaningful differences in partisanship among voters with at least a four-year bachelor’s degree; across income categories, majorities of college graduate voters align with the Democratic Party. (Explore this further in Chapter 6 .)

Rural voters move toward the GOP, while the suburbs remain divided: In 2008, when Barack Obama sought his first term as president, voters in rural counties were evenly split in their partisan loyalties. Today, Republicans hold a 25 percentage point advantage among rural residents (60% to 35%). There has been less change among voters in urban counties, who are mostly Democratic by a nearly identical margin (60% to 37%). The suburbs – perennially a political battleground – remain about evenly divided. (Explore this further in Chapter 7 . )

Growing differences among religious groups: Mirroring movement in the population overall, the share of voters who are religiously unaffiliated has grown dramatically over the past 15 years. These voters, who have long aligned with the Democratic Party, have become even more Democratic over time: Today 70% identify as Democrats or lean Democratic. In contrast, Republicans have made gains among several groups of religiously affiliated voters, particularly White Catholics and White evangelical Protestants. White evangelical Protestants now align with the Republican Party by about a 70-point margin (85% to 14%). (Explore this further in Chapter 5 .)

What this report tells us – and what it doesn’t

In most cases, the partisan allegiances of voters do not change a great deal from year to year. Yet as this study shows, the long-term shifts in party identification are substantial and say a great deal about how the country – and its political parties – have changed since the 1990s.

Bar chart showing that certain demographic groups are strengths and weaknesses for the Republican and Democratic coalitions of registered voters. For example, White evangelical Protestands, White non-college voters and veterans tend to associate with the GOP, while Black voters and religiously unaffiliated voters favor the Democrats

The steadily growing alignment between demographics and partisanship reveals an important aspect of steadily growing partisan polarization. Republicans and Democrats do not just hold different beliefs and opinions about major issues , they are much more different racially, ethnically, geographically and in educational attainment than they used to be.

Yet over this period, there have been only modest shifts in overall partisan identification. Voters remain evenly divided, even as the two parties have grown further apart. The continuing close division in partisan identification among voters is consistent with the relatively narrow margins in the popular votes in most national elections over the past three decades.

Partisan identification provides a broad portrait of voters’ affinities and loyalties. But while it is indicative of voters’ preferences, it does not perfectly predict how people intend to vote in elections, or whether they will vote. In the coming months, Pew Research Center will release reports analyzing voters’ preferences in the presidential election, their engagement with the election and the factors behind candidate support.

Next year, we will release a detailed study of the 2024 election, based on validated voters from the Center’s American Trends Panel. It will examine the demographic composition and vote choices of the 2024 electorate and will provide comparisons to the 2020 and 2016 validated voter studies.

The partisan identification study is based on annual totals from surveys conducted on the Center’s American Trends Panel from 2019 to 2023 and telephone surveys conducted from 1994 to 2018. The survey data was adjusted to account for differences in how the surveys were conducted. For more information, refer to Appendix A .

Previous Pew Research Center analyses of voters’ party identification relied on telephone survey data. This report, for the first time, combines data collected in telephone surveys with data from online surveys conducted on the Center’s nationally representative American Trends Panel.

Directly comparing answers from online and telephone surveys is complex because there are differences in how questions are asked of respondents and in how respondents answer those questions. Together these differences are known as “mode effects.”

As a result of mode effects, it was necessary to adjust telephone trends for leaned party identification in order to allow for direct comparisons over time.

In this report, telephone survey data from 1994 to 2018 is adjusted to align it with online survey responses. In 2014, Pew Research Center randomly assigned respondents to answer a survey by telephone or online. The party identification data from this survey was used to calculate an adjustment for differences between survey mode, which is applied to all telephone survey data in this report.

Please refer to Appendix A for more details.

Add Pew Research Center to your Alexa

Say “Alexa, enable the Pew Research Center flash briefing”

Report Materials

Table of contents, behind biden’s 2020 victory, a voter data resource: detailed demographic tables about verified voters in 2016, 2018, what the 2020 electorate looks like by party, race and ethnicity, age, education and religion, interactive map: the changing racial and ethnic makeup of the u.s. electorate, in changing u.s. electorate, race and education remain stark dividing lines, most popular.

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

How long until building complaints are dispositioned? A survival analysis case study

Learn how to use tidymodels for survival analysis.

Introduction

To use code in this article, you will need to install the following packages: aorsf, censored, glmnet, modeldatatoo, and tidymodels.

Survival analysis is a field of statistics and machine learning for analyzing the time to an event. While it has its roots in medical research, the event of interest can be anything from customer churn to machine failure. Methods from survival analysis take into account that some observations may not yet have experienced the event of interest and are thus censored .

Here we want to predict the time it takes for a complaint to be dispositioned 1 by the Department of Buildings in New York City. We are going to walk through a complete analysis from beginning to end, showing how to analyze time-to-event data.

Let’s start with loading the tidymodels and censored packages (the parsnip extension package for survival analysis models).

The buildings complaints data

The city of New York publishes data on the complaints received by the Department of Buildings. The data includes information on the type of complaint, the date it was entered in their records, the date it was dispositioned, and the location of the building the complaint was about. We are using a subset of the data, available in the modeldatatoo package.

Before we dive into survival analysis, let’s get a impression of how the complaints are distributed across the city. We have complaints in all five boroughs, albeit with a somewhat lower density of complaints in Staten Island.

Building complaints in New York City (closed complaints in purple, active complaints in pink).

In the dataset, we can see the days_to_disposition as well as the status of the complaint. For a complaint with the status "ACTIVE" , the time to disposition is censored, meaning we do know that it has taken at least that long, but not how long for it to be completely resolved.

The standard form for time-to-event data are Surv objects which capture the time as well as the event status. As with all transformations of the response, it is advisable to do this before heading into the model fitting process with tidymodels.

Data splitting and resampling

For our resampling strategy, let’s use a 3-way split into training, validation, and test set.

First, let’s pull out the training data and have a brief look at the response using a Kaplan-Meier curve .

We can see that the majority of complaints is dispositioned relatively quickly, but some complaints are still active after 100 days.

A first model

The censored package includes parametric, semi-parametric, and tree-based models for this type of analysis. To start, we are fitting a parametric survival model with the default of assuming a Weibull distribution on the time to disposition. We’ll explore the more flexible models once we have a sense of how well this more restrictive model performs on this dataset.

We have several missing values in complaint_priority that we are turning into a separate category, "unknown" . We are also combining the less common categories for community_board and unit into an "other" category to reduce the number of levels in the predictors. The complaint category often does not tell us much more than the unit, with several complaint categories being handled by a specific unit only. This can lead to the model being unable to estimate some of the coefficients. Since our goal here is only to get a rough idea of how well the model performs, we are removing the complaint category for now.

We combine the recipe and the model into a workflow. This allows us to easily resample the model because all preprocessing steps are applied to the training set and the validation set for us.

To fit and evaluate the model, we need the training and validation sets. While we can access them each on their own, validation_set() extracts them both, in a manner that emulates a single resample of the data. This enables us to use fit_resamples() and other tuning functions in the same way as if we had used some other resampling scheme (such as cross-validation).

We are calculating several performance metrics: the Brier score, its integrated version, the area under the ROC curve, and the concordance index. Note that all of these are used in a version tailored to survival analysis. The concordance index uses the predicted event time to measure the model’s ability to rank the observations correctly. The Brier score and the ROC curve use the predicted probability of survival at a given time. We evaluate these metrics every 30 days up to 300 days, as provided in the eval_time argument. The Brier score is a measure of the accuracy of the predicted probabilities, while the ROC curve is a measure of the model’s ability to discriminate between events and non-events at the given time point. Because these metrics are defined “at a given time,” they are also referred to as dynamic metrics .

For more information see the Dynamic Performance Metrics for Event Time Data article.

The structure of survival model predictions is slightly different from classification and regression model predictions:

The predicted survival time is in the .pred_time column and the predicted survival probabilities are in the .pred list column.

For each observation, .pred contains a tibble with the evaluation time .eval_time and the corresponding survival probability .pred_survival . The column .weight_censored contains the weights used in the calculation of the dynamic performance metrics.

For details on the weights see the Accounting for Censoring in Performance Metrics for Event Time Data article.

Of the metrics we calculated with these predictions, let’s take a look at the AUC ROC first.

We can discriminate between events and non-events reasonably well, especially in the first 30 and 60 days. How about the probabilities that the categorization into event and non-event is based on?

The accuracy of the predicted probabilities is generally good, albeit lowest for evaluation times of 30 and 60 days. The integrated Brier score is a measure of the overall accuracy of the predicted probabilities.

Which metric to optimise for depends on whether separation or calibration is more important in the modeling problem at hand. We’ll go with calibration here. Since we don’t have a particular evaluation time that we want to predict well at, we are going to use the integrated Brier score as our main performance metric.

Try out more models

Lumping factor levels together based on frequencies can lead to a loss of information so let’s also try some different approaches. We can let a random forest model group the factor levels via the tree splits. Alternatively, we can turn the factors into dummy variables and use a regularized model to select relevant factor levels.

First, let’s create the recipes for these two approaches:

Next, let’s create the model specifications and tag several hyperparameters for tuning. For the random forest, we are using the "aorsf" engine for accelerated oblique random survival forests. An oblique tree can split on linear combinations of the predictors, i.e., it provides more flexibility in the splits than a tree which splits on a single predictor. For the regularized model, we are using the "glmnet" engine for a semi-parametric Cox proportional hazards model.

We can tune workflows with any of the tune_*() functions such as tune_grid() for grid search or tune_bayes() for Bayesian optimization. Here we are using grid search for simplicity.

So do any of these models perform better than the parametric survival model?

The best regularized Cox model performs a little better than the parametric survival model, with an integrated Brier score of 0.0496 compared to 0.0512 for the parametric model. The random forest performs yet a little better with an integrated Brier score of 0.0468.

The final model

We chose the random forest model as the final model. So let’s finalize the workflow by replacing the tune() placeholders with the best hyperparameters.

We can now fit the final model on the training data and evaluate it on the test data.

The Brier score across the different evaluation time points is also very similar between the validation set and the test set.

To finish, we can extract the fitted workflow to either predict directly on new data or deploy the model.

For more information on survival analysis with tidymodels see the survival analysis tag .

Session information

In this context, the term disposition means that there has been a decision or resolution regarding the complaint that is the conclusion of the process. ↩︎

  • Share full article

Advertisement

Supported by

What Researchers Discovered When They Sent 80,000 Fake Résumés to U.S. Jobs

Some companies discriminated against Black applicants much more than others, and H.R. practices made a big difference.

Claire Cain Miller

By Claire Cain Miller and Josh Katz

A group of economists recently performed an experiment on around 100 of the largest companies in the country, applying for jobs using made-up résumés with equivalent qualifications but different personal characteristics. They changed applicants’ names to suggest that they were white or Black, and male or female — Latisha or Amy, Lamar or Adam.

On Monday, they released the names of the companies . On average, they found, employers contacted the presumed white applicants 9.5 percent more often than the presumed Black applicants.

Yet this practice varied significantly by firm and industry. One-fifth of the companies — many of them retailers or car dealers — were responsible for nearly half of the gap in callbacks to white and Black applicants.

Two companies favored white applicants over Black applicants significantly more than others. They were AutoNation, a used car retailer, which contacted presumed white applicants 43 percent more often, and Genuine Parts Company, which sells auto parts including under the NAPA brand, and called presumed white candidates 33 percent more often.

In a statement, Heather Ross, a spokeswoman for Genuine Parts, said, “We are always evaluating our practices to ensure inclusivity and break down barriers, and we will continue to do so.” AutoNation did not respond to a request for comment.

Companies With the Largest and Smallest Racial Contact Gaps

Of the 97 companies in the experiment, two stood out as contacting presumed white job applicants significantly more often than presumed Black ones. At 14 companies, there was little or no difference in how often they called back the presumed white or Black applicants.

Source: Patrick Kline, Evan K. Rose and Christopher R. Walters

Known as an audit study , the experiment was the largest of its kind in the United States: The researchers sent 80,000 résumés to 10,000 jobs from 2019 to 2021. The results demonstrate how entrenched employment discrimination is in parts of the U.S. labor market — and the extent to which Black workers start behind in certain industries.

“I am not in the least bit surprised,” said Daiquiri Steele, an assistant professor at the University of Alabama School of Law who previously worked for the Department of Labor on employment discrimination. “If you’re having trouble breaking in, the biggest issue is the ripple effect it has. It affects your wages and the economy of your community going forward.”

Some companies showed no difference in how they treated applications from people assumed to be white or Black. Their human resources practices — and one policy in particular (more on that later) — offer guidance for how companies can avoid biased decisions in the hiring process.

A lack of racial bias was more common in certain industries: food stores, including Kroger; food products, including Mondelez; freight and transport, including FedEx and Ryder; and wholesale, including Sysco and McLane Company.

“We want to bring people’s attention not only to the fact that racism is real, sexism is real, some are discriminating, but also that it’s possible to do better, and there’s something to be learned from those that have been doing a good job,” said Patrick Kline, an economist at the University of California, Berkeley, who conducted the study with Evan K. Rose at the University of Chicago and Christopher R. Walters at Berkeley.

The researchers first published details of their experiment in 2021, but without naming the companies. The new paper, which is set to run in the American Economic Review, names the companies and explains the methodology developed to group them by their performance, while accounting for statistical noise.

Sample Résumés From the Experiment

Fictitious résumés sent to large U.S. companies revealed a preference, on average, for candidates whose names suggested that they were white.

Sample resume

To assign names, the researchers started with a prior list that had been assembled using Massachusetts birth certificates from 1974 to 1979. They then supplemented this list with names found in a database of speeding tickets issued in North Carolina between 2006 and 2018, classifying a name as “distinctive” if more than 90 percent of people with that name were of a particular race.

The study includes 97 firms. The jobs the researchers applied to were entry level, not requiring a college degree or substantial work experience. In addition to race and gender, the researchers tested other characteristics protected by law , like age and sexual orientation.

They sent up to 1,000 applications to each company, applying for as many as 125 jobs per company in locations nationwide, to try to uncover patterns in companies’ operations versus isolated instances. Then they tracked whether the employer contacted the applicant within 30 days.

A bias against Black names

Companies requiring lots of interaction with customers, like sales and retail, particularly in the auto sector, were most likely to show a preference for applicants presumed to be white. This was true even when applying for positions at those firms that didn’t involve customer interaction, suggesting that discriminatory practices were baked in to corporate culture or H.R. practices, the researchers said.

Still, there were exceptions — some of the companies exhibiting the least bias were retailers, like Lowe’s and Target.

The study may underestimate the rate of discrimination against Black applicants in the labor market as a whole because it tested large companies, which tend to discriminate less, said Lincoln Quillian, a sociologist at Northwestern who analyzes audit studies. It did not include names intended to represent Latino or Asian American applicants, but other research suggests that they are also contacted less than white applicants, though they face less discrimination than Black applicants.

The experiment ended in 2021, and some of the companies involved might have changed their practices since. Still, a review of all available audit studies found that discrimination against Black applicants had not changed in three decades. After the Black Lives Matter protests in 2020, such discrimination was found to have disappeared among certain employers, but the researchers behind that study said the effect was most likely short-lived.

Gender, age and L.G.B.T.Q. status

On average, companies did not treat male and female applicants differently. This aligns with other research showing that gender discrimination against women is rare in entry-level jobs, and starts later in careers.

However, when companies did favor men (especially in manufacturing) or women (mostly at apparel stores), the biases were much larger than for race. Builders FirstSource contacted presumed male applicants more than twice as often as female ones. Ascena, which owns brands like Ann Taylor, contacted women 66 percent more than men.

Neither company responded to requests for comment.

The consequences of being female differed by race. The differences were small, but being female was a slight benefit for white applicants, and a slight penalty for Black applicants.

The researchers also tested several other characteristics protected by law, with a smaller number of résumés. They found there was a small penalty for being over 40.

Overall, they found no penalty for using nonbinary pronouns. Being gay, as indicated by including membership in an L.G.B.T.Q. club on the résumé, resulted in a slight penalty for white applicants, but benefited Black applicants — although the effect was small, when this was on their résumés, the racial penalty disappeared.

Under the Civil Rights Act of 1964, discrimination is illegal even if it’s unintentional . Yet in the real world, it is difficult for job applicants to know why they did not hear back from a company.

“These practices are particularly challenging to address because applicants often do not know whether they are being discriminated against in the hiring process,” Brandalyn Bickner, a spokeswoman for the Equal Employment Opportunity Commission, said in a statement. (It has seen the data and spoken with the researchers, though it could not use an academic study as the basis for an investigation, she said.)

What companies can do to reduce discrimination

Several common measures — like employing a chief diversity officer, offering diversity training or having a diverse board — were not correlated with decreased discrimination in entry-level hiring, the researchers found.

But one thing strongly predicted less discrimination: a centralized H.R. operation.

The researchers recorded the voice mail messages that the fake applicants received. When a company’s calls came from fewer individual phone numbers, suggesting that they were originating from a central office, there tended to be less bias . When they came from individual hiring managers at local stores or warehouses, there was more. These messages often sounded frantic and informal, asking if an applicant could start the next day, for example.

“That’s when implicit biases kick in,” Professor Kline said. A more formalized hiring process helps overcome this, he said: “Just thinking about things, which steps to take, having to run something by someone for approval, can be quite important in mitigating bias.”

At Sysco, a wholesale restaurant food distributor, which showed no racial bias in the study, a centralized recruitment team reviews résumés and decides whom to call. “Consistency in how we review candidates, with a focus on the requirements of the position, is key,” said Ron Phillips, Sysco’s chief human resources officer. “It lessens the opportunity for personal viewpoints to rise in the process.”

Another important factor is diversity among the people hiring, said Paula Hubbard, the chief human resources officer at McLane Company. It procures, stores and delivers products for large chains like Walmart, and showed no racial bias in the study. Around 40 percent of the company’s recruiters are people of color, and 60 percent are women.

Diversifying the pool of people who apply also helps, H.R. officials said. McLane goes to events for women in trucking and puts up billboards in Spanish.

So does hiring based on skills, versus degrees . While McLane used to require a college degree for many roles, it changed that practice after determining that specific skills mattered more for warehousing or driving jobs. “We now do that for all our jobs: Is there truly a degree required?” Ms. Hubbard said. “Why? Does it make sense? Is experience enough?”

Hilton, another company that showed no racial bias in the study, also stopped requiring degrees for many jobs, in 2018.

Another factor associated with less bias in hiring, the new study found, was more regulatory scrutiny — like at federal contractors, or companies with more Labor Department citations.

Finally, more profitable companies were less biased, in line with a long-held economics theory by the Nobel Prize winner Gary Becker that discrimination is bad for business. Economists said that could be because the more profitable companies benefit from a more diverse set of employees. Or it could be an indication that they had more efficient business processes, in H.R. and elsewhere.

Claire Cain Miller writes about gender, families and the future of work for The Upshot. She joined The Times in 2008 and was part of a team that won a Pulitzer Prize in 2018 for public service for reporting on workplace sexual harassment issues. More about Claire Cain Miller

Josh Katz is a graphics editor for The Upshot, where he covers a range of topics involving politics, policy and culture. He is the author of “Speaking American: How Y’all, Youse, and You Guys Talk,” a visual exploration of American regional dialects. More about Josh Katz

From The Upshot: What the Data Says

Analysis that explains politics, policy and everyday life..

Employment Discrimination: Researchers sent 80,000 fake résumés to some of the largest companies in the United States. They found that some discriminated against Black applicants much more than others .

Pandemic School Closures: ​A variety of data about children’s academic outcomes and about the spread of Covid-19 has accumulated since the start of the pandemic. Here is what we learned from it .

Affirmative Action: The Supreme Court effectively ended race-based preferences in admissions. But will selective schools still be able to achieve diverse student bodies? Here is how they might try .

N.Y.C. Neighborhoods: We asked New Yorkers to map their neighborhoods and to tell us what they call them . The result, while imperfect, is an extremely detailed map of the city .

Dialect Quiz:  What does the way you speak say about where you’re from? Answer these questions to find out .

IMAGES

  1. The case study data collection and analysis process (an author's view)

    in a case study the data analysis

  2. case study data analysis in r

    in a case study the data analysis

  3. case study data interpretation

    in a case study the data analysis

  4. case analysis of data

    in a case study the data analysis

  5. PPT

    in a case study the data analysis

  6. Data collection and data analysis process Single case study Data...

    in a case study the data analysis

VIDEO

  1. [R18] Case study 2 data analysis using R Language

  2. 3 simple steps to self-study data analysis in #2024

  3. Data Analyst Case Study Interview

  4. What is Data Analysis in research

  5. (Mastering JMP) Visualizing and Exploring Data

  6. Capstone Project Modul 1 Data Scientist "Data Karyawan"

COMMENTS

  1. Qualitative case study data analysis: an example from practice

    Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising.

  2. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  3. What Is a Case Study?

    Revised on November 20, 2023. A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are ...

  4. Data Analysis Techniques for Case Studies

    Qualitative analysis involves analyzing non-numerical data from sources like interviews, observations, documents, and images in a case study. It helps explore context, meaning, and patterns to ...

  5. Case Study

    A single-case study is an in-depth analysis of a single case. This type of case study is useful when the researcher wants to understand a specific phenomenon in detail. ... Rich data: Case study research can generate rich and detailed data, including qualitative data such as interviews, observations, and documents. This can provide a nuanced ...

  6. Learning to Do Qualitative Data Analysis: A Starting Point

    On the basis of Rocco (2010), Storberg-Walker's (2012) amended list on qualitative data analysis in research papers included the following: (a) the article should provide enough details so that reviewers could follow the same analytical steps; (b) the analysis process selected should be logically connected to the purpose of the study; and (c ...

  7. Chapter 5: DATA ANALYSIS AND INTERPRETATION

    As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence.

  8. Case Study

    Case studies tend to focus on qualitative data using methods such as interviews, observations, and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data. Example: Mixed methods case study. For a case study of a wind farm development in a ...

  9. Case Study Methods and Examples

    The purpose of case study research is twofold: (1) to provide descriptive information and (2) to suggest theoretical relevance. Rich description enables an in-depth or sharpened understanding of the case. It is unique given one characteristic: case studies draw from more than one data source. Case studies are inherently multimodal or mixed ...

  10. Google Data Analytics Capstone: Complete a Case Study

    There are 4 modules in this course. This course is the eighth and final course in the Google Data Analytics Certificate. You'll have the opportunity to complete a case study, which will help prepare you for your data analytics job hunt. Case studies are commonly used by employers to assess analytical skills. For your case study, you'll ...

  11. PDF Analyzing Case Study Evidence

    For case study analysis, one of the most desirable techniques is to use a pattern-matching logic. Such a logic (Trochim, 1989) compares an empiri-cally based pattern with a predicted one (or with several alternative predic-tions). If the patterns coincide, the results can help a case study to strengthen its internal validity. If the case study ...

  12. Case Study Method: A Step-by-Step Guide for Business Researchers

    Case study protocol is a formal document capturing the entire set of procedures involved in the collection of empirical material . It extends direction to researchers for gathering evidences, empirical material analysis, and case study reporting . This section includes a step-by-step guide that is used for the execution of the actual study.

  13. Writing a Case Study Analysis

    Identify the key problems and issues in the case study. Formulate and include a thesis statement, summarizing the outcome of your analysis in 1-2 sentences. Background. Set the scene: background information, relevant facts, and the most important issues. Demonstrate that you have researched the problems in this case study. Evaluation of the Case

  14. Data Analytics Case Study Guide (Updated for 2024)

    Step 1: With Data Analytics Case Studies, Start by Making Assumptions. Hint: Start by making assumptions and thinking out loud. With this question, focus on coming up with a metric to support the hypothesis. If the question is unclear or if you think you need more information, be sure to ask.

  15. Writing a Case Analysis Paper

    Case study is unbounded and relies on gathering external information; case analysis is a self-contained subject of analysis. The scope of a case study chosen as a method of research is bounded. However, the researcher is free to gather whatever information and data is necessary to investigate its relevance to understanding the research problem.

  16. 10 Real World Data Science Case Studies Projects with Example

    A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

  17. Four Steps to Analyse Data from a Case Study Method

    propose an approach to the analysis of case study data by logically linking the data to a series of propositions and then interpreting the subsequent information. Like the Yin (1994) strategy, the Miles and Huberman (1994) process of analysis of case study data, although quite detailed, may still be insufficient to guide the novice researcher.

  18. Data Analysis Case Study: Learn From These Winning Data Projects

    Humana's Automated Data Analysis Case Study. The key thing to note here is that the approach to creating a successful data program varies from industry to industry. Let's start with one to demonstrate the kind of value you can glean from these kinds of success stories. Humana has provided health insurance to Americans for over 50 years.

  19. PDF Open Case Studies: Statistics and Data Science Education through Real

    question and to create an illustrative data analysis - and the domain expertise needed. As a result, case studies based on realistic challenges, not toy examples, are scarce. To address this, we developed the Open Case Studies (opencasestudies.org) project, which offers a new statistical and data science education case study model.

  20. Communications in Statistics: Case Studies, Data Analysis and

    The three journals in this series are: Communications in Statistics: Case Studies, Data Analysis and Applications. Communications in Statistics - Simulation and Computation. Communications in Statistics - Theory and Methods. The prestigious and experienced members of our international Editorial Board will guide you from submission to publication.

  21. Data analytics case study data files

    Inventory Analysis Case Study Data files: Purchases. Beginning Inventory. Purchase Prices. Vendor Invoices. Ending Inventory. Sales. Inventory Analysis Case Study Instructor files: Instructor guide. Phase 1 - Data Collection and Preparation. Phase 2 - Data Discovery and Visualization. Phase 3 - Introduction to Statistical Analysis.

  22. 10 Real-World Data Science Case Studies Worth Reading

    Data quality issues, including missing or inaccurate data, can hinder analysis. Domain expertise gaps may result in misinterpretation of results. Resource constraints might limit project scope or access to necessary tools and talent. ... Real-world data science case studies play a crucial role in helping companies make informed decisions. By ...

  23. Policing during a pandemic: A case study analysis of body-worn camera

    This case study uses BWC footage derived from a police agency in Washington state. Methods. Using a population of 136 interactions involving suspected violations of COVID-19 ordinance violations between March 2020 and November 2020, this study uses convergent holistic triangulation within a mixed-method research design to extract data for ...

  24. Qualitative case study data analysis: an example from practice

    Data sources The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising.

  25. MetaboAnalyst 6.0: towards a unified platform for metabolomics data

    Our case study highlights the streamlined analysis workflow from raw spectra processing to compound annotation, to functional interpretation, and finally to causal insights. In conclusion, MetaboAnalyst 6.0 is a user-friendly platform for comprehensive analysis of metabolomics data and help address emerging needs from recent exposomics research.

  26. Frontiers

    For precise analysis, included studies must explicitly specify COVID-19 infection attributed to the Omicron variant (PCR-confirmed or antigen-test confirmed) as the outcome measure and provide accessible data on VE. Our study excluded reviews, case series, case reports, and studies involving non-human subjects. 2.4 Study selection process

  27. Changing Partisan Coalitions in a Politically Divided Nation

    The partisan identification study is based on annual totals from surveys conducted on the Center's American Trends Panel from 2019 to 2023 and telephone surveys conducted from 1994 to 2018. The survey data was adjusted to account for differences in how the surveys were conducted. For more information, refer to Appendix A.

  28. tidymodels

    Learn how to use tidymodels for survival analysis. In the dataset, we can see the days_to_disposition as well as the status of the complaint. For a complaint with the status "ACTIVE", the time to disposition is censored, meaning we do know that it has taken at least that long, but not how long for it to be completely resolved.. The standard form for time-to-event data are Surv objects which ...

  29. What Researchers Discovered When They Sent 80,000 Fake Résumés to U.S

    Known as an audit study, the experiment was the largest of its kind in the United States: The researchers sent 80,000 résumés to 10,000 jobs from 2019 to 2021.The results demonstrate how ...

  30. [2404.06616] Binary Trees and Taxicab Correspondence Analysis of

    This is a case study, where Taxicab Correspondence Analysis reveals that the underlying structure of an extremely sparse binary textual data set can be represented by a binary tree, where the nodes representing clusters of words can be interpreted as topics. The textual data set represents Israel's Declaration of Independence text and 40 diverse Israeli Interviewees. The analysis provides for ...