A Guide To Secondary Data Analysis

What is secondary data analysis? How do you carry it out? Find out in this post.  

Historically, the only way data analysts could obtain data was to collect it themselves. This type of data is often referred to as primary data and is still a vital resource for data analysts.   

However, technological advances over the last few decades mean that much past data is now readily available online for data analysts and researchers to access and utilize. This type of data—known as secondary data—is driving a revolution in data analytics and data science.

Primary and secondary data share many characteristics. However, there are some fundamental differences in how you prepare and analyze secondary data. This post explores the unique aspects of secondary data analysis. We’ll briefly review what secondary data is before outlining how to source, collect and validate them. We’ll cover:

  • What is secondary data analysis?
  • How to carry out secondary data analysis (5 steps)
  • Summary and further reading

Ready for a crash course in secondary data analysis? Let’s go!

1. What is secondary data analysis?

Secondary data analysis uses data collected by somebody else. This contrasts with primary data analysis, which involves a researcher collecting predefined data to answer a specific question. Secondary data analysis has numerous benefits, not least that it is a time and cost-effective way of obtaining data without doing the research yourself.

It’s worth noting here that secondary data may be primary data for the original researcher. It only becomes secondary data when it’s repurposed for a new task. As a result, a dataset can simultaneously be a primary data source for one researcher and a secondary data source for another. So don’t panic if you get confused! We explain exactly what secondary data is in this guide . 

In reality, the statistical techniques used to carry out secondary data analysis are no different from those used to analyze other kinds of data. The main differences lie in collection and preparation. Once the data have been reviewed and prepared, the analytics process continues more or less as it usually does. For a recap on what the data analysis process involves, read this post . 

In the following sections, we’ll focus specifically on the preparation of secondary data for analysis. Where appropriate, we’ll refer to primary data analysis for comparison. 

2. How to carry out secondary data analysis

Step 1: define a research topic.

The first step in any data analytics project is defining your goal. This is true regardless of the data you’re working with, or the type of analysis you want to carry out. In data analytics lingo, this typically involves defining:

  • A statement of purpose
  • Research design

Defining a statement of purpose and a research approach are both fundamental building blocks for any project. However, for secondary data analysis, the process of defining these differs slightly. Let’s find out how.

Step 2: Establish your statement of purpose

Before beginning any data analytics project, you should always have a clearly defined intent. This is called a ‘statement of purpose.’ A healthcare analyst’s statement of purpose, for example, might be: ‘Reduce admissions for mental health issues relating to Covid-19′. The more specific the statement of purpose, the easier it is to determine which data to collect, analyze, and draw insights from.

A statement of purpose is helpful for both primary and secondary data analysis. It’s especially relevant for secondary data analysis, though. This is because there are vast amounts of secondary data available. Having a clear direction will keep you focused on the task at hand, saving you from becoming overwhelmed. Being selective with your data sources is key.

Step 3: Design your research process

After defining your statement of purpose, the next step is to design the research process. For primary data, this involves determining the types of data you want to collect (e.g. quantitative, qualitative, or both ) and a methodology for gathering them.

For secondary data analysis, however, your research process will more likely be a step-by-step guide outlining the types of data you require and a list of potential sources for gathering them. It may also include (realistic) expectations of the output of the final analysis. This should be based on a preliminary review of the data sources and their quality.

Once you have both your statement of purpose and research design, you’re in a far better position to narrow down potential sources of secondary data. You can then start with the next step of the process: data collection.

Step 4: Locate and collect your secondary data

Collecting primary data involves devising and executing a complex strategy that can be very time-consuming to manage. The data you collect, though, will be highly relevant to your research problem.

Secondary data collection, meanwhile, avoids the complexity of defining a research methodology. However, it comes with additional challenges. One of these is identifying where to find the data. This is no small task because there are a great many repositories of secondary data available. Your job, then, is to narrow down potential sources. As already mentioned, it’s necessary to be selective, or else you risk becoming overloaded.  

Some popular sources of secondary data include:  

  • Government statistics , e.g. demographic data, censuses, or surveys, collected by government agencies/departments (like the US Bureau of Labor Statistics).
  • Technical reports summarizing completed or ongoing research from educational or public institutions (colleges or government).
  • Scientific journals that outline research methodologies and data analysis by experts in fields like the sciences, medicine, etc.
  • Literature reviews of research articles, books, and reports, for a given area of study (once again, carried out by experts in the field).
  • Trade/industry publications , e.g. articles and data shared in trade publications, covering topics relating to specific industry sectors, such as tech or manufacturing.
  • Online resources: Repositories, databases, and other reference libraries with public or paid access to secondary data sources.

Once you’ve identified appropriate sources, you can go about collecting the necessary data. This may involve contacting other researchers, paying a fee to an organization in exchange for a dataset, or simply downloading a dataset for free online .

Step 5: Evaluate your secondary data

Secondary data is usually well-structured, so you might assume that once you have your hands on a dataset, you’re ready to dive in with a detailed analysis. Unfortunately, that’s not the case! 

First, you must carry out a careful review of the data. Why? To ensure that they’re appropriate for your needs. This involves two main tasks:

Evaluating the secondary dataset’s relevance

  • Assessing its broader credibility

Both these tasks require critical thinking skills. However, they aren’t heavily technical. This means anybody can learn to carry them out.

Let’s now take a look at each in a bit more detail.  

The main point of evaluating a secondary dataset is to see if it is suitable for your needs. This involves asking some probing questions about the data, including:

What was the data’s original purpose?

Understanding why the data were originally collected will tell you a lot about their suitability for your current project. For instance, was the project carried out by a government agency or a private company for marketing purposes? The answer may provide useful information about the population sample, the data demographics, and even the wording of specific survey questions. All this can help you determine if the data are right for you, or if they are biased in any way.

When and where were the data collected?

Over time, populations and demographics change. Identifying when the data were first collected can provide invaluable insights. For instance, a dataset that initially seems suited to your needs may be out of date.

On the flip side, you might want past data so you can draw a comparison with a present dataset. In this case, you’ll need to ensure the data were collected during the appropriate time frame. It’s worth mentioning that secondary data are the sole source of past data. You cannot collect historical data using primary data collection techniques.

Similarly, you should ask where the data were collected. Do they represent the geographical region you require? Does geography even have an impact on the problem you are trying to solve?

What data were collected and how?

A final report for past data analytics is great for summarizing key characteristics or findings. However, if you’re planning to use those data for a new project, you’ll need the original documentation. At the very least, this should include access to the raw data and an outline of the methodology used to gather them. This can be helpful for many reasons. For instance, you may find raw data that wasn’t relevant to the original analysis, but which might benefit your current task.

What questions were participants asked?

We’ve already touched on this, but the wording of survey questions—especially for qualitative datasets—is significant. Questions may deliberately be phrased to preclude certain answers. A question’s context may also impact the findings in a way that’s not immediately obvious. Understanding these issues will shape how you perceive the data.  

What is the form/shape/structure of the data?

Finally, to practical issues. Is the structure of the data suitable for your needs? Is it compatible with other sources or with your preferred analytics approach? This is purely a structural issue. For instance, if a dataset of people’s ages is saved as numerical rather than continuous variables, this could potentially impact your analysis. In general, reviewing a dataset’s structure helps better understand how they are categorized, allowing you to account for any discrepancies. You may also need to tidy the data to ensure they are consistent with any other sources you’re using.  

This is just a sample of the types of questions you need to consider when reviewing a secondary data source. The answers will have a clear impact on whether the dataset—no matter how well presented or structured it seems—is suitable for your needs.

Assessing secondary data’s credibility

After identifying a potentially suitable dataset, you must double-check the credibility of the data. Namely, are the data accurate and unbiased? To figure this out, here are some key questions you might want to include:

What are the credentials of those who carried out the original research?

Do you have access to the details of the original researchers? What are their credentials? Where did they study? Are they an expert in the field or a newcomer? Data collection by an undergraduate student, for example, may not be as rigorous as that of a seasoned professor.  

And did the original researcher work for a reputable organization? What other affiliations do they have? For instance, if a researcher who works for a tobacco company gathers data on the effects of vaping, this represents an obvious conflict of interest! Questions like this help determine how thorough or qualified the researchers are and if they have any potential biases.

Do you have access to the full methodology?

Does the dataset include a clear methodology, explaining in detail how the data were collected? This should be more than a simple overview; it must be a clear breakdown of the process, including justifications for the approach taken. This allows you to determine if the methodology was sound. If you find flaws (or no methodology at all) it throws the quality of the data into question.  

How consistent are the data with other sources?

Do the secondary data match with any similar findings? If not, that doesn’t necessarily mean the data are wrong, but it does warrant closer inspection. Perhaps the collection methodology differed between sources, or maybe the data were analyzed using different statistical techniques. Or perhaps unaccounted-for outliers are skewing the analysis. Identifying all these potential problems is essential. A flawed or biased dataset can still be useful but only if you know where its shortcomings lie.

Have the data been published in any credible research journals?

Finally, have the data been used in well-known studies or published in any journals? If so, how reputable are the journals? In general, you can judge a dataset’s quality based on where it has been published. If in doubt, check out the publication in question on the Directory of Open Access Journals . The directory has a rigorous vetting process, only permitting journals of the highest quality. Meanwhile, if you found the data via a blurry image on social media without cited sources, then you can justifiably question its quality!  

Again, these are just a few of the questions you might ask when determining the quality of a secondary dataset. Consider them as scaffolding for cultivating a critical thinking mindset; a necessary trait for any data analyst!

Presuming your secondary data holds up to scrutiny, you should be ready to carry out your detailed statistical analysis. As we explained at the beginning of this post, the analytical techniques used for secondary data analysis are no different than those for any other kind of data. Rather than go into detail here, check out the different types of data analysis in this post.

3. Secondary data analysis: Key takeaways

In this post, we’ve looked at the nuances of secondary data analysis, including how to source, collect and review secondary data. As discussed, much of the process is the same as it is for primary data analysis. The main difference lies in how secondary data are prepared.

Carrying out a meaningful secondary data analysis involves spending time and effort exploring, collecting, and reviewing the original data. This will help you determine whether the data are suitable for your needs and if they are of good quality.

Why not get to know more about what data analytics involves with this free, five-day introductory data analytics short course ? And, for more data insights, check out these posts:

  • Discrete vs continuous data variables: What’s the difference?
  • What are the four levels of measurement? Nominal, ordinal, interval, and ratio data explained
  • What are the best tools for data mining?

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What is Secondary Research? | Definition, Types, & Examples

What is Secondary Research? | Definition, Types, & Examples

Published on January 20, 2023 by Tegan George . Revised on January 12, 2024.

Secondary research is a research method that uses data that was collected by someone else. In other words, whenever you conduct research using data that already exists, you are conducting secondary research. On the other hand, any type of research that you undertake yourself is called primary research .

Secondary research can be qualitative or quantitative in nature. It often uses data gathered from published peer-reviewed papers, meta-analyses, or government or private sector databases and datasets.

Table of contents

When to use secondary research, types of secondary research, examples of secondary research, advantages and disadvantages of secondary research, other interesting articles, frequently asked questions.

Secondary research is a very common research method, used in lieu of collecting your own primary data. It is often used in research designs or as a way to start your research process if you plan to conduct primary research later on.

Since it is often inexpensive or free to access, secondary research is a low-stakes way to determine if further primary research is needed, as gaps in secondary research are a strong indication that primary research is necessary. For this reason, while secondary research can theoretically be exploratory or explanatory in nature, it is usually explanatory: aiming to explain the causes and consequences of a well-defined problem.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Secondary research can take many forms, but the most common types are:

Statistical analysis

Literature reviews, case studies, content analysis.

There is ample data available online from a variety of sources, often in the form of datasets. These datasets are often open-source or downloadable at a low cost, and are ideal for conducting statistical analyses such as hypothesis testing or regression analysis .

Credible sources for existing data include:

  • The government
  • Government agencies
  • Non-governmental organizations
  • Educational institutions
  • Businesses or consultancies
  • Libraries or archives
  • Newspapers, academic journals, or magazines

A literature review is a survey of preexisting scholarly sources on your topic. It provides an overview of current knowledge, allowing you to identify relevant themes, debates, and gaps in the research you analyze. You can later apply these to your own work, or use them as a jumping-off point to conduct primary research of your own.

Structured much like a regular academic paper (with a clear introduction, body, and conclusion), a literature review is a great way to evaluate the current state of research and demonstrate your knowledge of the scholarly debates around your topic.

A case study is a detailed study of a specific subject. It is usually qualitative in nature and can focus on  a person, group, place, event, organization, or phenomenon. A case study is a great way to utilize existing research to gain concrete, contextual, and in-depth knowledge about your real-world subject.

You can choose to focus on just one complex case, exploring a single subject in great detail, or examine multiple cases if you’d prefer to compare different aspects of your topic. Preexisting interviews , observational studies , or other sources of primary data make for great case studies.

Content analysis is a research method that studies patterns in recorded communication by utilizing existing texts. It can be either quantitative or qualitative in nature, depending on whether you choose to analyze countable or measurable patterns, or more interpretive ones. Content analysis is popular in communication studies, but it is also widely used in historical analysis, anthropology, and psychology to make more semantic qualitative inferences.

Primary Research and Secondary Research

Secondary research is a broad research approach that can be pursued any way you’d like. Here are a few examples of different ways you can use secondary research to explore your research topic .

Secondary research is a very common research approach, but has distinct advantages and disadvantages.

Advantages of secondary research

Advantages include:

  • Secondary data is very easy to source and readily available .
  • It is also often free or accessible through your educational institution’s library or network, making it much cheaper to conduct than primary research .
  • As you are relying on research that already exists, conducting secondary research is much less time consuming than primary research. Since your timeline is so much shorter, your research can be ready to publish sooner.
  • Using data from others allows you to show reproducibility and replicability , bolstering prior research and situating your own work within your field.

Disadvantages of secondary research

Disadvantages include:

  • Ease of access does not signify credibility . It’s important to be aware that secondary research is not always reliable , and can often be out of date. It’s critical to analyze any data you’re thinking of using prior to getting started, using a method like the CRAAP test .
  • Secondary research often relies on primary research already conducted. If this original research is biased in any way, those research biases could creep into the secondary results.

Many researchers using the same secondary research to form similar conclusions can also take away from the uniqueness and reliability of your research. Many datasets become “kitchen-sink” models, where too many variables are added in an attempt to draw increasingly niche conclusions from overused data . Data cleansing may be necessary to test the quality of the research.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

research based on secondary data analysis

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Sources in this article

We strongly encourage students to use sources in their work. You can cite our article (APA Style) or take a deep dive into the articles below.

George, T. (2024, January 12). What is Secondary Research? | Definition, Types, & Examples. Scribbr. Retrieved June 17, 2024, from https://www.scribbr.com/methodology/secondary-research/
Largan, C., & Morris, T. M. (2019). Qualitative Secondary Research: A Step-By-Step Guide (1st ed.). SAGE Publications Ltd.
Peloquin, D., DiMaio, M., Bierer, B., & Barnes, M. (2020). Disruptive and avoidable: GDPR challenges to secondary research uses of data. European Journal of Human Genetics , 28 (6), 697–705. https://doi.org/10.1038/s41431-020-0596-x

Is this article helpful?

Tegan George

Tegan George

Other students also liked, primary research | definition, types, & examples, how to write a literature review | guide, examples, & templates, what is a case study | definition, examples & methods, what is your plagiarism score.

How to Analyse Secondary Data for a Dissertation

Secondary data refers to data that has already been collected by another researcher. For researchers (and students!) with limited time and resources, secondary data, whether qualitative or quantitative can be a highly viable source of data.  In addition, with the advances in technology and access to peer reviewed journals and studies provided by the internet, it is increasingly popular as a form of data collection.  The question that frequently arises amongst students however, is: how is secondary data best analysed?

The process of data analysis in secondary research

Secondary analysis (i.e., the use of existing data) is a systematic methodological approach that has some clear steps that need to be followed for the process to be effective.  In simple terms there are three steps:

  • Step One: Development of Research Questions
  • Step Two: Identification of dataset
  • Step Three: Evaluation of the dataset.

Let’s look at each of these in more detail:

Step One: Development of research questions

Using secondary data means you need to apply theoretical knowledge and conceptual skills to be able to use the dataset to answer research questions.  Clearly therefore, the first step is thus to clearly define and develop your research questions so that you know the areas of interest that you need to explore for location of the most appropriate secondary data.

Step Two: Identification of Dataset

This stage should start with identification, through investigation, of what is currently known in the subject area and where there are gaps, and thus what data is available to address these gaps.  Sources can be academic from prior studies that have used quantitative or qualitative data, and which can then be gathered together and collated to produce a new secondary dataset.  In addition, other more informal or “grey” literature can also be incorporated, including consumer report, commercial studies or similar.  One of the values of using secondary research is that original survey works often do not use all the data collected which means this unused information can be applied to different settings or perspectives.

Key point: Effective use of secondary data means identifying how the data can be used to deliver meaningful and relevant answers to the research questions.  In other words that the data used is a good fit for the study and research questions.

Step Three: Evaluation of the dataset for effectiveness/fit

A good tip is to use a reflective approach for data evaluation.  In other words, for each piece of secondary data to be utilised, it is sensible to identify the purpose of the work, the credentials of the authors (i.e., credibility, what data is provided in the original work and how long ago it was collected).  In addition, the methods used and the level of consistency that exists compared to other works. This is important because understanding the primary method of data collection will impact on the overall evaluation and analysis when it is used as secondary source. In essence, if there is no understanding of the coding used in qualitative data analysis to identify key themes then there will be a mismatch with interpretations when the data is used for secondary purposes.  Furthermore, having multiple sources which draw similar conclusions ensures a higher level of validity than relying on only one or two secondary sources.

A useful framework provides a flow chart of decision making, as shown in the figure below.

Analyse Secondary Data

Following this process ensures that only those that are most appropriate for your research questions are included in the final dataset, but also demonstrates to your readers that you have been thorough in identifying the right works to use.

Writing up the Analysis

Once you have your dataset, writing up the analysis will depend on the process used.  If the data is qualitative in nature, then you should follow the following process.

Pre-Planning

  • Read and re-read all sources, identifying initial observations, correlations, and relationships between themes and how they apply to your research questions.
  • Once initial themes are identified, it is sensible to explore further and identify sub-themes which lead on from the core themes and correlations in the dataset, which encourages identification of new insights and contributes to the originality of your own work.

Structure of the Analysis Presentation

Introduction.

The introduction should commence with an overview of all your sources. It is good practice to present these in a table, listed chronologically so that your work has an orderly and consistent flow. The introduction should also incorporate a brief (2-3 sentences) overview of the key outcomes and results identified.

The body text for secondary data, irrespective of whether quantitative or qualitative data is used, should be broken up into sub-sections for each argument or theme presented. In the case of qualitative data, depending on whether content, narrative or discourse analysis is used, this means presenting the key papers in the area, their conclusions and how these answer, or not, your research questions. Each source should be clearly cited and referenced at the end of the work. In the case of qualitative data, any figures or tables should be reproduced with the correct citations to their original source. In both cases, it is good practice to give a main heading of a key theme, with sub-headings for each of the sub themes identified in the analysis.

Do not use direct quotes from secondary data unless they are:

  • properly referenced, and
  • are key to underlining a point or conclusion that you have drawn from the data.

All results sections, regardless of whether primary or secondary data has been used should refer back to the research questions and prior works. This is because, regardless of whether the results back up or contradict previous research, including previous works shows a wider level of reading and understanding of the topic being researched and gives a greater depth to your own work.

Summary of results

The summary of the results section of a secondary data dissertation should deliver a summing up of key findings, and if appropriate a conceptual framework that clearly illustrates the findings of the work. This shows that you have understood your secondary data, how it has answered your research questions, and furthermore that your interpretation has led to some firm outcomes.

Study Site Homepage

  • Request new password
  • Create a new account

The Essential Guide to Doing Your Research Project

Student resources, steps in secondary data analysis, stepping your way through effective secondary data analysis.

Determine your research question  – As indicated above, knowing exactly what you are looking for

Locating data – Knowing what is out there and whether you can gain access to it. A quick Internet search, possibly with the help of a librarian, will reveal a wealth of options.

Evaluating relevance of the data  – Considering things like the data’s original purpose, when it was collected, population, sampling strategy/sample, data collection protocols, operationalization of concepts, questions asked, and form/shape of the data.

Assessing credibility of the data  – Establishing the credentials of the original researchers, searching for full explication of methods including any problems encountered, determining how consistent the data is with data from other sources, and discovering whether the data has been used in any credible published research.

Analysis –  This will generally involve a range of statistical processes as discussed in Chapter 13.

Use of secondary data analyses in research: Pros and Cons

  • Journal of Addiction Medicine and Therapeutic Science
  • CC BY-NC 4.0

Linda Pederson at The University of Western Ontario

  • The University of Western Ontario

Evelyn Vingilis at The University of Western Ontario

  • Centre for Addiction and Mental Health

John Koval at The University of Western Ontario

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Deepika Faugoo Faugoo

  • Hamad Saleh Mofleh Ali Alshehhi

Roziah Sidik Mat Sidek

  • Ermy Azziaty Rozali

Anathi Lwabi

  • Omoding Jacob

Suzan Luyiga

  • Namhla Mata
  • Fundiswa Dyonase

Lawrance Seseni

  • Samuel Oladokun
  • Sulemana Abubakari

Stephen Kwankye

  • Rita Oliveira

Matilde Monteiro-Soares

  • José Pedro Guerreiro
  • António Teixeira-Rodrigues

Linda Pederson

  • ACCIDENT ANAL PREV

Ole Rogeberg

  • Robert E. Mann

Melissa Johnston

  • DRUG ALCOHOL DEPEN

Bruna Brands

  • Sarah Boslaugh

Deb Niemeier

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Secondary analysis: theoretical, methodological, and practical considerations

Affiliation.

  • 1 Center for Health Outcomes and Policy Research, University of Pennsylvania School of Nursing, Philadelphia, USA. [email protected]
  • PMID: 11928128

Secondary analysis, which involves the use of existing data sets to answer new research questions, is an increasingly popular methodological choice among researchers who wish to investigate particular research questions but lack the resources to undertake primary data collections. Much time loss and considerable frustration may result, however, if researchers begin secondary analyses without an awareness of the distinctive methodological and practical challenges involved. This article highlights difficulties that may arise when researchers use data from previous clinical research projects, including theoretical issues and problems involving sampling, measurement, and external and ecological validity. It also offers practical suggestions for undertaking a secondary analysis and criteria for evaluating secondary analyses.

PubMed Disclaimer

Similar articles

  • Methodological challenges during 20 years of adolescent research. Yarcheski A, Mahon NE. Yarcheski A, et al. J Pediatr Nurs. 2007 Jun;22(3):169-75. doi: 10.1016/j.pedn.2006.08.001. J Pediatr Nurs. 2007. PMID: 17524961
  • Clinical research 3: Sample selection. Endacott R, Botti M. Endacott R, et al. Accid Emerg Nurs. 2007 Oct;15(4):234-8. doi: 10.1016/j.aaen.2006.12.006. Epub 2007 Apr 8. Accid Emerg Nurs. 2007. PMID: 17420129
  • Methodological issues associated with group intervention research. Murphy SA, Johnson LC. Murphy SA, et al. Arch Psychiatr Nurs. 2006 Dec;20(6):276-81. doi: 10.1016/j.apnu.2006.05.003. Arch Psychiatr Nurs. 2006. PMID: 17145455 Review.
  • Methodological considerations with secondary analyses. Pollack CD. Pollack CD. Outcomes Manag Nurs Pract. 1999 Oct-Dec;3(4):147-52. Outcomes Manag Nurs Pract. 1999. PMID: 10876539 Review.
  • Evaluating sources for secondary analysis. Leske JS. Leske JS. Heart Lung. 1990 Sep;19(5 Pt 1):537-9. Heart Lung. 1990. PMID: 2211163
  • Student Health Services at Historically Black Colleges and Universities and Predominantly Black Institutions in the United States. Mueller SD, Sutherland MA, Hutchinson MK, Si B, Ding Y, Connolly SL. Mueller SD, et al. Health Equity. 2024 Mar 25;8(1):226-234. doi: 10.1089/heq.2023.0219. eCollection 2024. Health Equity. 2024. PMID: 38559842 Free PMC article.
  • Antidepressive Effectiveness of Amisulpride, Aripiprazole, and Olanzapine in Patients With Schizophrenia Spectrum Disorders: A Secondary Outcome Analysis of a Pragmatic, Randomized Trial (BeSt InTro). Kjelby E, Gjestad R, Fathian F, Sinkeviciute I, Alisauskiene R, Anda L, Løberg EM, Reitan SK, Joa I, Larsen TK, Rettenbacher M, Berle JØ, Fasmer OB, Kroken RA, Johnsen E. Kjelby E, et al. J Clin Psychopharmacol. 2023 May-Jun 01;43(3):246-258. doi: 10.1097/JCP.0000000000001679. J Clin Psychopharmacol. 2023. PMID: 37083542 Free PMC article. Clinical Trial.
  • Large-scale North American cancer survivorship surveys: 2011-2019 update. Jung A, Kay SS, Robinson JL, Sheppard BB, Mayer DK. Jung A, et al. J Cancer Surviv. 2022 Dec;16(6):1236-1267. doi: 10.1007/s11764-021-01111-w. Epub 2021 Nov 4. J Cancer Surviv. 2022. PMID: 34734367 Review.
  • Recommendations for delivering oral health advice: a qualitative supplementary analysis of dental teams, parents' and children's experiences. Bhatti A, Vinall-Collier K, Duara R, Owen J, Gray-Burrows KA, Day PF. Bhatti A, et al. BMC Oral Health. 2021 Apr 26;21(1):210. doi: 10.1186/s12903-021-01560-w. BMC Oral Health. 2021. PMID: 33902541 Free PMC article.
  • Apples and Oranges? Considerations for EHR-Based Analyses Aggregating Data From Interventional Clinical Trials and Point-of-Care Encounters in Oncology. Lavery JA, Callahan MK, Panageas KS. Lavery JA, et al. JCO Clin Cancer Inform. 2021 Jan;5:21-23. doi: 10.1200/CCI.20.00096. JCO Clin Cancer Inform. 2021. PMID: 33411618 Free PMC article. No abstract available.

Publication types

  • Search in MeSH

Grants and funding

  • T32 NR07104/NR/NINR NIH HHS/United States
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Secondary data analysis and combining with primary data

Explore the power of secondary data analysis to get deep insights and make informed decisions with existing data. Learn methods and applications

Secondary Data Analysis for Comrehensive Insight

In the ever-evolving landscape of research and data-driven decision making, secondary data analysis has emerged as a powerful tool for unlocking valuable insights. Unlike primary data collection, where researchers gather information directly from the source, secondary data analysis involves the systematic examination of data that has been previously collected for a different purpose. This approach offers a unique opportunity to leverage existing resources, uncover hidden patterns, and generate new perspectives on complex issues.

Understanding Secondary Data Analysis

Secondary data analysis is the process of analyzing data that was originally collected by another researcher, organization, or entity. This data can come from a variety of sources, such as government databases, academic studies, market research reports, or even internal company records. By repurposing this existing information, researchers can gain a deeper understanding of the topic at hand, identify new research questions, and potentially uncover findings that were not initially apparent.

The Benefits of Secondary Data Analysis

  • Cost-Effectiveness : Conducting primary data collection can be time-consuming and resource-intensive. By utilizing secondary data, researchers can save on the costs associated with data collection, allowing them to allocate more resources towards analysis and interpretation.
  • Faster Turnaround : Secondary data is readily available, which means researchers can start the analysis process immediately, without the need to wait for data collection to be completed. This can be particularly beneficial in time-sensitive situations or when quick insights are required.
  • Broader Perspective : Secondary data often encompasses a larger sample size or a more diverse population than what a single primary study can provide. By analyzing this data, researchers can gain a more comprehensive understanding of the topic and identify trends or patterns that may not be evident in smaller-scale studies.
  • Longitudinal Insights : Many secondary data sources, such as government statistics or industry reports, provide historical data that can be used to analyze trends and changes over time. This longitudinal perspective can be invaluable for understanding the dynamics of a particular phenomenon or making informed predictions about future developments.
  • Validation and Replication : Secondary data analysis can be used to validate the findings of previous studies or to replicate research in different contexts. This can help strengthen the reliability and generalizability of the original conclusions.

Challenges and Considerations in Secondary Data Analysis

While secondary data analysis offers numerous benefits, it also comes with its own set of challenges and considerations:

  • Data Quality and Relevance : Researchers must carefully evaluate the quality, accuracy, and relevance of the secondary data to ensure it aligns with the research objectives. Issues such as missing data, measurement errors, or outdated information can compromise the validity of the analysis.
  • Ethical Considerations : When using secondary data, researchers must be mindful of ethical concerns, such as protecting the privacy and confidentiality of the individuals or organizations represented in the data.
  • Contextual Understanding : Secondary data may lack the contextual information or background knowledge that was available to the original researchers. Interpreting the data without this context can lead to misunderstandings or incorrect conclusions.
  • Compatibility and Integration : Researchers may need to invest time and effort in cleaning, transforming, and integrating data from multiple sources to ensure compatibility and consistency for the analysis.

Conducting Effective Secondary Data Analysis

To maximize the benefits of secondary data analysis, researchers should follow a structured approach:

  • Define the Research Objectives : Clearly articulate the research questions or hypotheses that will guide the secondary data analysis.
  • Identify Relevant Data Sources : Conduct a thorough search to identify reputable and reliable data sources that can provide the necessary information to address the research objectives.
  • Evaluate Data Quality and Relevance : Assess the data's accuracy, completeness, and relevance to the research questions. Consider factors such as the data collection methods, sample size, and potential biases.
  • Prepare and Organize the Data : Clean, transform, and integrate the data as needed to ensure compatibility and consistency for the analysis.
  • Analyze the Data : Apply appropriate statistical techniques or qualitative methods to uncover patterns, trends, and insights within the secondary data.
  • Interpret the Findings : Contextualize the results by considering the original research objectives, the limitations of the secondary data, and any external factors that may have influenced the findings.
  • Communicate the Insights : Present the findings in a clear and compelling manner, highlighting the implications and potential applications of the secondary data analysis.

Secondary Data Analysis Examples

One example of secondary data analysis in the domain of upskilling is a study that examines data from national workforce development programs. Researchers could analyze data on program participation, completion rates, and employment outcomes to identify trends and patterns in upskilling efforts across different demographic groups or geographic regions. 1 2 This type of analysis could provide valuable insights to policymakers and program administrators on the effectiveness of upskilling initiatives and inform decisions about resource allocation and program design.

STEM Education

Secondary data analysis can also be useful in the field of STEM (Science, Technology, Engineering, and Mathematics) education. Researchers could analyze data from national or international assessments, such as the Programme for International Student Assessment (PISA) or the Trends in International Mathematics and Science Study (TIMSS), to investigate factors that influence student performance in STEM subjects. 1

This could include examining the relationship between teaching practices, school resources, and student outcomes, or exploring differences in STEM achievement across different socioeconomic or demographic groups.

Youth and Senior Care

In the domain of youth and senior care, secondary data analysis could be used to examine trends and patterns in the utilization of healthcare services, social services, and community-based programs. Researchers could analyze data from government agencies, healthcare providers, or nonprofit organizations to identify gaps in service delivery, evaluate the effectiveness of interventions, and inform the development of policies and programs that better meet the needs of these populations. 1 4

Primary and Secondary Data

Primary data is collected firsthand by researchers through surveys, interviews, experiments, or observations, specifically tailored to their research objectives. This direct data collection allows researchers to control the process and ensure relevance to their goals, providing detailed insights into specific challenges, such as workforce upskilling needs.

Conversely, secondary data involves the analysis of pre-existing data collected by others, such as industry reports or government databases. This data helps identify broader trends and benchmarks, offering a wider context for strategic decision-making.

By integrating both primary and secondary data, organizations can develop a holistic understanding of issues like upskilling, tailor their initiatives to meet specific employee needs, and align their efforts with industry standards. This comprehensive approach enhances the effectiveness of training programs and boosts organizational competitiveness, ensuring resources are optimally allocated and interventions are precisely targeted.

Primary and Secondary Data Integration for Enhanced Insights

I n today’s data-driven world, the strategic use of primary and secondary data is crucial for organizations aiming to enhance their operational efficiency and adaptability. By combining these two types of data, organizations can gain a comprehensive view of both internal dynamics and external trends, facilitating more informed decision-making.

Primary Data Collection :

Primary data is collected firsthand by researchers and is tailored to specific research objectives. For instance, an organization looking to enhance its workforce skills might conduct detailed employee surveys or focus groups to pinpoint specific training needs. These methods provide rich insights into employee perspectives, challenges, and preferences, offering a nuanced understanding of the internal landscape.

Secondary Data Utilization :

Alongside primary data, organizations should also harness the power of secondary data. This includes reviewing industry reports, academic studies, or government labor statistics to understand broader trends in workforce development across the sector or region. Secondary data aids in benchmarking the organization’s efforts against industry norms and uncovering prevalent strategies in workforce training.

Combining Data for Strategic Decision Making :

Integrating the specific, actionable insights derived from primary data with the broader context provided by secondary data enables organizations to tailor their training programs more effectively. This combined approach ensures that internal initiatives are aligned with external realities, optimizing resource allocation and program design.

Through this integrated approach to data utilization, organizations can address not only their immediate internal needs but also align their initiatives with broader industry movements. This leads to more effective and impactful outcomes, enabling organizations to thrive in a competitive environment.

Secondary data analysis is a powerful tool that allows researchers to leverage existing data to generate new insights and address complex research questions. By carefully selecting and analyzing secondary data sources, researchers can uncover valuable information, validate previous findings, and inform decision-making processes. As the volume and availability of data continue to grow, the importance of secondary data analysis will only increase, making it an essential skill for researchers, policymakers, and data-driven professionals across various industries.

How helpful was this?

We're sorry to hear that. How can we improve?

Thanks for your feedback! Let us know how this article helped:

Looking for something else?

Still need help.

research based on secondary data analysis

Why You Should Consider Secondary Data Analysis for Your Next Study

  • Survey Tips

Alchemer is an incredibly robust online survey software platform. It’s continually voted one of the best survey tools available on G2, FinancesOnline, and others. To make it even easier, we’ve created a series of blogs to help you better understand how to get the most from your Alchemer account.

What is Secondary Data Analysis?

Secondary data analysis involves a researcher using the information that someone else has gathered for his or her own purposes. Researchers leverage secondary data analysis in an attempt to answer a new research question, or to examine an alternative perspective on the original question of a previous study.

In order to fully understand secondary data analysis, it’s essential to familiarize yourself with the difference between primary and secondary data.

Primary Data vs. Secondary Data

Primary data is original data that researchers collect for a specific purpose.

Secondary data, on the other hand, is collected for a different purpose other than the one for which it is used. 

To add context to the definition of secondary data, let’s consider an example.

If an entrepreneur is considering opening a new business, he or she could leverage census data that has been collected by the government. 

Although the entrepreneur would not be collecting the data his or herself, census data includes information that could greatly benefit the entrepreneur, such as the average age, household income and education level in a particular geographical region.

By digging into this census data to inform the decision of whether or not the entrepreneur should open the new business, the entrepreneur is performing secondary data analysis.

Factors to Consider Before Conducting Secondary Data Analysis

There are certain factors that a researcher must consider before deciding to move forward with secondary data analysis. 

Because the researcher did not collect the data that he or she will be working with, it’s imperative for him or her to become familiar with the data set. This familiarization process entails:

  • Learning about how the data was collected
  • Learning who the population of the study was
  • Learning what the objective of the original study was
  • Determining what the response categories were for each question displayed to survey respondents
  • Evaluating whether or not weights need to be applied during the analysis of the data
  • Deciding whether or not clusters or stratification need to be accounted for during the analysis of the data

The Advantages of Secondary Data Analysis

One of the most noticeable advantages of using secondary data analysis is its cost effectiveness.

Because someone else has already collected the data, the researcher does not need to invest any money, time, or effort into the data collection stages of his or her study.  

While sometimes secondary data must be purchased by a researcher looking to use it to inform a study they’re working on, these costs are almost always lower than what the expenses would be if the researcher were to create the same data set from scratch. 

Also, the data from a secondary data set is typically already cleaned and stored in an electronic format, so the researcher can spend his or her time rolling up their sleeves and analyzing the data instead of spending time having to prepare the data for analysis.

Another benefit of analyzing secondary data instead of collecting and analyzing primary data is the sheer volume and breadth of data that is publicly available today. 

For instance, leveraging the findings from studies that the government has conducted provides researchers with access to a volume of data that would have simply been impossible for the researcher to amass themselves. 

Longitudinal data at this scale is extremely powerful. The government could have been collecting data on a single population for long, extended periods of time. 

Instead of investing that time, by using the government’s publically available data to perform secondary data analysis, the researcher has avoided years of intensive labor. 

The Disadvantages of Secondary Data Analysis

The biggest disadvantage of performing secondary data analysis is that the secondary data set might not answer the researcher’s specific research question to the degree that the research would have hoped. 

If a researcher sets out to perform a study with a very particular question in mind, a secondary data set might not contain the precisely specific information that would allow the researcher to answer his or her question.

Similarly, when a researcher has a specific question or goal in mind, it can sometimes be difficult to identify secondary data that is valid for use, as the data might not have been collected during the timeframe the researcher was hoping for, or in correct the geographical region, etc.

Another disadvantage is that no matter what a researcher does to vet a secondary data set, they will never be able to know exactly how the data was collected, and how well that process was executed. 

Without being the one who is actually developing surveys and distributing them to the appropriate populations, it’s impossible to know the extent to which the researchers that collected the data went to ensure validity or quality, or if they experienced issues such as low response rates or respondents misunderstanding what a question was truly asking.

Simply put, since the researcher conducting the study did not collect the data he or she will be using, he or she ultimately has no control over what their secondary data set contains. 

Secondary data analysis is a convenient and powerful tool for researchers looking to ask broad questions at a large scale. 

While it has its benefits, such as its cost effectiveness and the breadth and depth of data that it provides access to, secondary data analysis can also force researchers to alter their original question, or work with a data set that otherwise is not ideal for their goals.

The next time you’re looking to perform a large-scale research study, consider secondary data analysis.

research based on secondary data analysis

See all blog posts >

research based on secondary data analysis

  • AI , Alchemer Pulse , Customer Experience , Customer Feedback

research based on secondary data analysis

  • AI , Alchemer Pulse , Press Release , Product News

professional man viewing mobile device in the city

  • Alchemer Digital , Customer Emotion & Sentiment

See it in Action

research based on secondary data analysis

  • Privacy Overview
  • Strictly Necessary Cookies
  • 3rd Party Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 10 June 2024

A meta-analysis of the effects of design thinking on student learning

  • Qing Yu   ORCID: orcid.org/0000-0003-1889-1481 1 ,
  • Kun Yu 1 &
  • Rongri Lin 1  

Humanities and Social Sciences Communications volume  11 , Article number:  742 ( 2024 ) Cite this article

331 Accesses

1 Altmetric

Metrics details

  • Science, technology and society

Design thinking (DT) is becoming an innovative and popular teaching method. Recently, DT has been used as an unconventional method to develop skills of problem-solving, creativity, and innovation. However, its effects on student learning are unclear . This research aimed to examine the DT’s effects on student learning. The meta-analytic result based on 25 articles showed that DT positively affected student learning ( r  = 0.436, p  < 0.001). Moreover, the learning outcome, treatment duration, grade level, DT model, and region had moderating effects. Additionally, moderator analysis suggested that DT instruction was more effective: (1) when class size is <=30; (2) on multidiscipline; (3) with long-term duration (>=3 months); (4) for secondary school and university students; (5) on student learning engagement, motivation, problem-solving skills, and academic achievement; (6) with the model of Observe, Synthesize, Ideate, and Prototype, and Empathize, Define, Ideate, Prototype, Test; (7) when team size is <=7; (8) on African and Asian students.

Similar content being viewed by others

research based on secondary data analysis

Using design thinking for interdisciplinary curriculum design and teaching: a case study in higher education

research based on secondary data analysis

The effectiveness of collaborative problem solving in promoting students’ critical thinking: A meta-analysis based on empirical literature

research based on secondary data analysis

Blended knowledge sharing model in design professional

Introduction.

Design thinking (DT) is attracting more and more attention and interest worldwide (Aris et al., 2022 ). DT was introduced by Rowe ( 1987 ) and was first applied in education in 2005 (Çeviker-Çınar et al., 2017 ). Today, DT has been widely applied in nearly all stages of education (Pande and Bharathi, 2020 ), from formal to informal educational contexts (Aris et al., 2022 ). DT is a process, a method (Rowe, 1987 ), or a “philosophy” (Çeviker-Çınar et al., 2017 ). In education, DT is a teaching method and a learning orientation that enables learners to generate creative ideas and impactful change and actively explore problem solutions (Beckman and Barry, 2007 ; Lor, 2017 ; Retna, 2016 ). DT can help solve many fundamental educational issues (Koh et al., 2015 ). However, previous studies did not reach consensus about DT’s effects on student learning. Moreover, eliciting DT is not always easy because of its complexity and open-endedness (Becker and Mentzer, 2015 ). Therefore, this study carried out a meta-analysis to examine the relationship between DT and student learning.

Conceptual framework

Design thinking.

DT has various definitions. The most widely used definition in education is proposed by Razzouk and Shute ( 2012 ): “an analytic and creative process that engages a person in opportunities to experiment, create and prototype models, gather feedback, and redesign.” DT is a promising, practical method that can be applied to education (Brown, 2008 ; Rusmann and Ejsing-Duun, 2022 ). It is often integrated into the teaching process as an instructional method. DT consists of a set of logically organized stages or processes, each pointing to cultivating students’ key competencies. When students are engaged in DT instruction, they need to follow DT’s steps to move forward with their projects, thereby increasing their ability to perform better. DT also points to addressing problems in real situations (Xu et al., 2024 ), which could increase their interest, motivation, and engagement (Grau and Rockett, 2022 ; Lin et al., 2020a ). In sum, DT has become a dynamic, nonlinear, and spiraling process that can facilitate deep learning (Liu and Li, 2023 ) and eventually result in better student performance (Howard et al., 2021 ).

DT emphasizes learner-centeredness (Glen et al., 2014 ), which can help teachers and students cope with 21st century challenges and complex real-world problems (Gleason and Jaramillo Cherrez, 2021 ; Xu et al., 2024 ; Yande, 2023 ). For teachers , DT provides a framework for solving complex and emerging problems (Henriksen et al., 2020a ); DT also provides good solution strategies and guidance for teachers to design innovative instruction and improve instruction. For students , DT can improve students’ class participation and learning intention, create favorable atmospheres and enjoyment, enhance interaction between each other and creative confidence, deepen their discussion on projects, and eventually improve teachers’ instruction (Balakrishnan, 2022 ; Tu et al., 2018 ). Moreover, DT can also nurture the competencies necessary for students, such as communication, collaboration, teamwork, problem-solving skills, creativity, empathy, critical thinking, and metacognition (Abolhasani et al., 2021 ; Balakrishnan, 2022 ; Guaman-Quintanilla et al., 2023 ; Retna, 2016 ; Rusmann and Ejsing-Duun, 2022 ). In general, the value of DT in education is to help students grow, empower teachers’ development, and promote teaching change.

DT has gradually become the new normal, with students readily embracing the DT process and appreciating its merits (Retna, 2016 ). Meanwhile, a variety of DT models are proposed for use in different domains. Simon ( 1969 ) proposed the first DT model, which entails a one-way linear process of three steps: analysis, synthesis, and evaluation. The most widely applied model in education is that of the Stanford model (Liu et al., 2024a ), which has five stages: empathize, define, ideate, prototype, and test (EDIPT) (Plattner, 2009 ), especially in school and university educational settings. IDEO ( 2013 ) defined five stages of DT for educators: discovery, interpretation, ideation, experimentation, and evolution. To apply DT in K-12 (Liu and Li, 2023 ), Carroll et al. ( 2010 ) extend the EDIPT model to six stages, e.g., understand, observe, point of view, ideate, prototype, and test. Brown’s DT model has been widely used, with three stages: inspiration, ideation, and implementation (Brown, 2008 ). The Design Council’s DT model assists designers or non-designers in solving some of the most complex social, economic, and environmental problems. It has four stages: discover, define, develop, and implement (Design Council, 2015 ). The DT model selected should aim to meet both students’ needs and instructional goals (Brannon, 2022 ). It should be noted that the processes contained in different DT models may vary and therefore produce different results.

DT’s effects and research gaps

Recently, there have been gradually increasing explorations to investigate the impacts of DT on students’ learning performance in education. However, there is no consensus on the effectiveness of DT. The results can be classified into three types: (a) DT can promote students’ learning significantly (Albay and Eisma, 2021 ; Bawaneh and Alnamshan, 2023 ; Chang and Tsai, 2021 ; Dawbin et al., 2021 ; Hsiao et al., 2017 ; Kuo et al., 2022 ; Ladachart et al., 2022 ; Lin et al., 2020a ; Liu and Ko, 2021 ; Nazim and Mohammad, 2022 ; Padagas, 2021 ; Pratomo and Wardani, 2021 ; Simeon et al., 2022 ; Tsai, 2015 ; Ziadat and Sakarneh, 2021 ); (b) DT does not significantly enhance student learning (Khongprakob and Petsangsri, 2022 ; Kuo et al., 2022 ; Lin et al., 2020b ; Yalçın and Erden, 2021 ); (c) there are negative correlations between DT and learning outcomes (Chou and Shih, 2022 ; Lake et al., 2021 ).

It can be seen that DT’s effectiveness is still questionable . DT is an emerging topic that needs in-depth investigation (Baker III and Moukhliss, 2020 ). Some research gaps need to be addressed urgently. First , it lacks specific guidance and references on DT instruction. In-service teachers are unfamiliar with DT (Bressler and Annetta, 2022 ; Liu et al., 2024a ), which may reduce DT’s effects. Students may also undergo confusion and frustration when participating in DT courses (Glen et al., 2015 ; Razali et al., 2022 ). Therefore, it is crucial to explore where the DT approach may be more appropriate for the classroom setting (Lor, 2017 ). For instance, what is the most effective class size, team size, duration, or DT model? Second , DT’s effects are questioned (Rao et al., 2022 ). Namely, a systematic assessment of DT’s effectiveness is limited (Liedtka, 2015 ). There is no meta-analysis to deliver robust evidence on the effectiveness of DT in education. To summarize, with DT’s widespread introduction into education, performing a meta-analysis to reveal DT’s overall effects on student performance and possible influencing moderators is necessary and valuable.

Research purpose

Considering that there is no quantitative, comprehensive evidence on DT’s effects in education, we tried to solve the following questions:

RQ1 . What are the research characteristics of the included empirical studies of DT on student learning (e.g., publication year, research design, class size, grade level, duration, subject, team size, DT model, and region/countries)?

RQ2 . What is the overall effect of DT on student learning?

RQ3 . What are the DT’s effects on student learning under the potential moderators (e.g., learning outcome, class size, grade level, duration, subject, team size, DT model, and region)?

Compared to a mere literature review, meta-analysis can provide precise quantitative effects (Grant and Booth, 2009 ). Meta-analysis can integrate various empirical research results to calculate the overall effect value (Lipsey and Wilson, 2001 ). This research was conducted based on the process proposed by Field and Gillett ( 2010 ).

Literature searching

We mainly retrieved the documents from the Web of Science (Core Collection), Scopus, and Google Scholar. Some topic words, i.e., (“Design Thinking”) AND (“Learning Performance” OR “Learning Outcomes” OR “Academic Achievement” OR “Academic Performance”), were combined to search for the target documents. The search span was confined from January 2005 to June 2023. 1204 articles were retrieved preliminarily through the search, and 1059 articles were obtained after removing duplicated literature.

Selecting criteria and process

We selected literature based on the below criteria:

(1) It must report the relationship between DT and student learning performance;

(2) It must be empirical studies (experimental, quasi-experimental, or correlational research);

(3) The research participants should receive intervention through DT teaching;

(4) It should provide necessary data for calculating effect sizes in targeted papers (e.g., sample size, mean, standard deviation, the value of t or p );

(5) It should be peer-reviewed and published in English .

After the initial of screening of titles and abstracts and the removal of duplicates, 296 articles were selected. Whole-text articles were initially assessed for eligibility, and 84 articles that met the inclusion criteria remained. Finally, after the articles were read in full, 25 peer-reviewed studies were obtained. The literature searching and selection were conducted strictly according to the standard processes (Moher et al., 2009 ) (Fig. 1 ).

figure 1

Flow diagram.

Literature Quality and Bias Assessment

One database cannot include all the published literature, so searching multiple authoritative databases can control the literature search bias (Stang, 2010 ). Higgins et al. ( 2019 ) recommend searching at least two databases. So, we selected three databases to reduce the search literature search bias (Kelley and Kelley, 2019 ).

The included criteria’s inaccuracy will result in literature selection bias (Sterne et al., 2016 ). We strictly drew up the selection criteria to reduce this bias, e.g., study purpose and design, intervention of DT, and published language (Liu et al., 2024b ).

We assessed the literature quality based on the criteria of Downs and Black ( 1998 ), which have 27 questions and five categories. We found that all selected studies got majority points in more than four of the above categories (range 18 to 21), so they were high-quality (Carter et al., 2017 ).

Coding potential moderators

Moderators are possible factors that influence DT’s effects. The eight moderators were divided into the background and method.

Background moderators

Learning outcome: DT’s learning outcomes are less examined. Examining DT’s effectiveness on different learning outcomes is necessary (Razzouk and Shute, 2012 ). It was coded into academic achievement, self-efficacy, learning motivation, problem-solving ability, creative thinking, and learning engagement.

Treatment duration: The DT process could take a long time to explore (Carroll et al., 2010 ), and it may moderate DT’s effect on learning. It was divided into <1, 1–3, and >3 months.

Class size: It is an important index of teaching effects (Retna, 2016 ). So, it may moderate DT’s effect on student learning. It was divided into 1–30, 31–50, 51–100, and >100.

Grade level: There should be a clear distinction regarding how DT is applied to different learning stages (Lor, 2017 ). It was divided into kindergarten, primary, junior high, high school, and university.

Subject: DT was not always useful across all subjects (Retna, 2016 ), and van de Grift and Kroeze ( 2016 ) found that it could enhance interdisciplinary education. Namely, the subject may moderate DT’s effects. It was divided into STEM, No-STEM, and multidiscipline.

Region: It refers to the area where the study was performed. The education system’s cultural context must also be considered when applying DT (Retna, 2016 ). So, the region is also considered a potential moderator. It was divided into Asia, America, Austria, Europe, and Africa.

Method moderators

DT model: It refers to DT’s specific processes or stages. The implementation of DT relies on specific models, and different models contain different operations. Therefore, the role of DT models should be considered. We coded the DT model into 9 types:

3IE =Inspiration, Ideation, Implementation, and Evaluation;

UOPIPT =Understand, Observe, Perspective, Imagination, Prototype, and Test;

EDIPT =Empathize, Define, Ideate, Prototype, and Test;

EDEIPT =Empathize, Define, Elaborate, Ideate, Prototype, and Test;

OSIP =Observation, Synthesis, Ideation, and Prototype;

PAS =Preparation, Assimilation, Strategic control;

2UPPI =User focus (User as an information source and User as a codeveloper), Problem framing, Prototype, and Iteration;

CTC =Copy, Tinker, and Create;

LAUNCH =Look, listen and learn, Ask, Understand, Navigate ideas, Create, and Highlight and fix.

Team size: This variable refers to the number of team members. DT pedagogy emphasizes the use of student teams (Beckman and Barry, 2007 ), and team size is one of the causes of conflicts around teamwork (Aflatoony et al., 2018 ). So, the team size may moderate DT’s effect. It was divided into 1–4, 5–7, and >=8.

Data analysis

CMA 3.0 was used to analyze the effect sizes and moderators’ effects. In order to overcome the differences in different studies, the Pearson correlation coefficient r was selected as the effect size (Borenstein et al., 2005 ). Since the paper sample sizes varied widely, the authors employed the Fisher Z -transformation based on the weighted study sample sizes to calculate the ultimate r and 95% confidence intervals (Lei et al., 2020 ).

Publication Bias

We used the funnel plot, classic fail-safe N , and trim-and-fill method to examine the publication bias. If there is no publication bias in the data, the scatter of the funnel would be spread symmetrically. First , the funnel plot showed that the samples in this study were not evenly distributed (Fig. 2 ). Second , fail-safe Nfs quantifies the threshold at which publication bias becomes an issue. CMA can calculate the threshold ( Nfs ). Next, the fail-safe Nfs indicated that Nfs  = 9179 was far larger than 220 (5* K  + 10, K = 42). Last , the trim-and-fill method can create plots of potentially missing studies to search for symmetry between the literature (Duval and Tweedie, 2000 ). This method found just five missing values on the right of the funnel plot (Fig. 3 ). In sum, it can be concluded that the data included were free from publication bias.

figure 2

Funnel plot.

figure 3

Funnel plot after trill-and-fill.

Actually, literature selection may cause publication bias. To minimize this bias, we strictly developed the selection criteria, e.g., study purpose and design, intervention of DT, necessary data, and peer-review. Especially, we limited the language of publication to English. This may exclude some potential literature published in other languages; it is one limitation of the current research and could be addressed in the future.

Homogeneity test and sensitivity analysis

The values of Q and I 2 can be used to determine whether heterogeneity exists. The result was Q  = 554.908 ( p  < 0.001) (Table 1 ), which was significant. Moreover, I 2  = 92.611% > 75%, according to Higgins et al. ( 2003 ), meant the heterogeneity was high. Thus, the random-effects model should be selected (Borenstein et al., 2009 ; Wilson et al., 2020 ). Moreover, moderating analyses were also necessary to be analyzed.

To confirm the robustness of this research, we used the one-study-removal method to examine the sensitivity. The result suggested that each overall effect size fell within a reasonable range (from 0.418 to 0.467). Thus, this study is robust.

General characteristics of the included 25 studies

To answer RQ1, reveal the current state of empirical research on DT, and provide complementary evidence for subsequent meta-analyses, a descriptive analysis of the included literature was conducted. The literature included was published between 2015 and 2023, e.g., 1 in 2015 (4.00%), 1 in 2017 (4.00%), 3 in 2020 (12.00%), 8 in 2021 (32.00%), 6 in 2022 (24.00%), and 6 in 2023 (24.00%). The result indicated a growing interest in empirical research on the use of DT for teaching and learning in education. In terms of study design, only 2 were correlational studies (Lin et al., 2020a ; Roth et al., 2020 ), while the other 23 were experimental studies (including pre-experiment, quasi-experiment, and true-experiment). Descriptive results are as follows:

(1) Grade level: kindergarten ( N  = 1, 4.00%), primary school ( N  = 3, 12.00%), junior high school ( N  = 2, 8.00%), high school ( N  = 9, 36.00%), and university ( N  = 10, 40.00%).

(2) Class size: 0–30 ( N  = 9, 36.00%), 31–50 ( N  = 10, 40.00%), and >=51( N  = 6, 24.00%).

(3) Duration: 0–1 month ( N  = 8, 32.00%), 1–3 months ( N  = 7, 28.00%), and =>3 months ( N  = 10, 40.00%).

(4) Subject: STEM ( N  = 16, 64.00%), No-STEM ( N  = 6, 24.00%), and multidiscipline ( N  = 3, 12.00%).

(5) DT model: EDIPT ( N  = 14, 56.00%), 3IE ( N  = 1, 4.00%), UOPIPT ( N  = 1, 4.00%), LAUNCH ( N  = 1, 4.00%), OSIP ( N  = 1, 4.00%), PAS ( N  = 1, 4.00%), PPI2U ( N  = 1, 4.00%), EDEIPT ( N  = 1, 4.00%), CTC ( N  = 1, 4.00%), and Unknown ( N  = 3, 12.00%) (Fig. 4 ).

figure 4

(7) Team size : 0–4 ( N  = 7, 53.85%) and 5–7 ( N  = 6, 46.15%).

(8) Region : Asia ( N  = 21, 84.00%), America ( N  = 1, 4.00%), Australia ( N  = 1, 4.00%), Europe ( N  = 1, 4.00%), and Africa ( N  = 1, 4.00%) (Fig. 5 ).

figure 5

(9) Countries : China ( N  = 12, 48.00%), Thailand ( N  = 2, 8.00%), Australia ( N  = 1, 4.00%), Austria ( N  = 1, 4.00%), Philippines ( N  = 2, 8.00%), Saudi Arabia ( N  = 2, 8.00%), Nigeria ( N  = 1, 4.00%), America ( N  = 1, 4.00%), Indonesia ( N  = 1, 4.00%), Jordan ( N  = 1, 4.00%), and Turkey ( N  = 1, 4.00%).

In general, the results revealed that most research used EDIPT ( N  = 14) as a DT model and focused primarily on the learning of STEM subjects ( N  = 16, 64.00%) by high school ( N  = 9, 36.00%) and university students ( N  = 10, 40.00%) in Asia ( N  = 21, 84.00%).

Overall effect size

When r  = 0.1, there is a small effect size; r  = 0.3 is a medium effect size; and r  = 0.5 is a large effect size (Cohen, 2013 ). The overall effect size of DT was upper-medium ( r  = 0.436, 95% CI [0.342, 0.525], p  < 0.001) (Table 1 ). Moreover, each study’s effect sizes were also provided (Fig. 6 ). The red diamond represents the overall effect size and its CI in the forest plot. Favours A meant the result was in favor of regular instruction, while Favours B meant the result was in support of DT instruction.

figure 6

Forest plot.

Moderator analysis

Learning outcome.

The order of effect sizes from large to small was learning engagement ( r  = 0.740), learning motivation ( r  = 0.608), academic achievement ( r  = 0.450), problem-solving ability ( r  = 0.447), creative thinking ( r  = 0.329), and self-efficacy ( r  = 0.230) (Table 2 ). The between-groups effect ( p  < 0.01) indicated that the learning outcome had a moderating effect.

The order of effect sizes from large to small was <=30 ( r  = 0.609), 31–50 ( r  = 0.422), and >=51 ( r  = 0.389) (Table 2 ). The result of between-group effects was Q  = 0.856 ( p  > 0.05), indicating that the class size had no moderating effect.

Treatment duration

The result showed that the effect size of >=3 months ( r  = 0.535) was the largest, the next was <=1 month ( r  = 0.456), and 1–3 months ( r  = 0.245) was the smallest (Table 2 ). The between-groups effect ( p  < 0.001) indicated that the treatment had a moderating effect.

Grade level

The order of effect sizes from large to small was high school ( r  = 0.538), university ( r  = 0.463), junior high school ( r  = 0.443, p  > 0.05), primary school ( r  = 0.222), and kindergarten ( r  = 0.174) (Table 2 ). The between-groups effect ( p  < 0.01) indicated that the grade level had a moderating effect.

The order of effect sizes from large to small was multidiscipline ( r  = 0.604), No-STEM ( r  = 0.470), and STEM ( r  = 0.393) (Table 3 ). The between-groups effect indicated that the subject had no moderating effect.

The order of effect sizes from large to small was OSIP ( r  = 0.766), EDIPT ( r  = 0.522), 2UPPI ( r  = 0.346), PAS ( r  = 0.301), UOPIPT ( r  = 0.297), 3IE ( r  = 0.222), CTC ( r  = 0.191, p  > 0.05), EDEIPT ( r  = 0.174), and LAUNCH ( r  = 0.066, p  > 0.05) (Table 3 ). The Q test of the between-groups effect was significant ( p  < 0.001), indicating that the DT model had a moderating effect.

The order of effect sizes from large to small was 0–4 ( r  = 0.477) and 5–7 ( r  = 0.441) (Table 3 ). The between-groups effect ( p  > 0.05) indicated that the team size had no moderating effect.

The order of effect sizes from large to small was Africa ( r  = 0.690), Asia ( r  = 0.435), Australia ( r  = 0.355), Europe ( r  = 0.346), and America ( r  = 0.066, p  > 0.05) (Table 3 ). The between-groups effect ( Q  = 50.576, p  < 0.001) indicated that the region had a moderating effect.

Discussions and implications

This meta-analysis investigates DT’s effect on student learning with 42 validated effect sizes from 25 independent empirical articles. This research reveals that DT has an upper-medium effect on student learning. DT is the gaping link between the theoretical discoveries of social transformation pedagogy and the practical application of the skills needed for the future (Scheer et al., 2012 ). The DT process entails a set of logical stages that point to students’ key competencies. DT instruction can increase students’ involvement, establish a positive learning climate, and promote interaction and communication between teachers and students (Tu et al., 2018 ). Moreover, DT relies on teamwork and hands-on activities, which are beneficial for student learning (Holstermann et al., 2010 ; Oje, 2021 ; Sung et al., 2017 ; Swanson et al., 2019 ). Certainly, connecting DT with courses’ content may be a challenge (Hennessey and Mueller, 2020 ). Overall, if educators organize DT instruction appropriately, it will be effective in improving student learning.

It has a moderating effect. Specifically, DT can promote learners’ creative thinking, learning engagement, motivation, problem-solving ability, self-efficacy, and academic achievement. Notably, the effects of learning motivation, engagement, and academic achievement are large. The DT process entails a set of logical stages that point to students’ key competencies. DT is a dynamic, nonlinear, and spiraling process that can facilitate deep learning (Liu and Li, 2023 ), interest, motivation, creativity, and engagement, and eventually improve student learning (Howard et al., 2021 ; Rao et al., 2022 ). However, there are significant differences in the impacts of DT on student learning outcomes. DT models consist of a set of stages, and some models are complex and challenging. So, its effect on self-efficacy is smaller than other types of learning outcomes. In sum, DT still has great potential to enhance various learning outcomes.

It has no moderating effect. Specifically, <= 30 ( r  = 0.609) has a large effect, >= 51 ( r  = 0.389) and 31–50 ( r  = 0.422) have upper-medium effects. The result suggests that the smaller the class size, the better DT’s effects. DT is a guided, student-oriented process where learners need close supervision, guidance, and feedback (Retna, 2016 ). When the class size is large (>= 51), it is hard for teachers to provide prompt guidance and feedback. Moreover, large class sizes challenge teachers’ effective classroom management and interactions (Blatchford et al., 2009 ). Of course, >= 51 is broad. So, DT’s effects on larger class sizes (e.g., 51–80, etc.) need more exploration. Based on the result, we recommend that educators keep the class size below 51 students. Moreover, if conditions permit, more teachers could be involved in one class (e.g., two teachers) (Retna, 2016 ).

It has a moderating effect. Specifically, the effect of >= 3 months ( r  = 0.535) is large, <= 1 month ( r  = 0.456) has an upper-medium effect, and 1–3 months ( r  = 0.245) has an upper-small effect. Generally, the effect of 1–3 months is best (Yu et al., 2023 ), but our result is the smallest. The novelty effect may result in a larger effect at <=1 month than at 1–3 months. The decrease in the 1–3 months’ effect may be due to the novelty effect wearing off as students slowly familiarize themselves with DT and face learning challenges. Guaman-Quintanilla et al. ( 2023 ) noted that it is challenging to experience the entire process of DT within a limited time. Namely, time constraints are a challenge for students and educators (McLaughlin et al., 2023 ; Retna, 2016 ; Razali et al., 2022 ). Longer durations are needed for educators to conduct DT instruction to make students engage in DT (Razali et al., 2022 ). Actually, DT is a long-term journey to develop students’ abilities and skills, so enough time should be allocated. In short, though DT is effective for these durations, <=1 month or >= 3 months are more effective. More future research could examine the 1–3 months’ effect on DT.

It has a moderating effect. Specifically, high school ( r  = 0.538) has the best effect; university ( r  = 0.463) has an upper-medium effect; primary school ( r  = 0.222) and kindergarten ( r  = 0.174) have small effects; and junior high school ( r  = 0.443, p  > 0.05) has an insignificant effect. DT has been used in all stages of education, and DT is also effective. In this research, DT shows greater potential for high school and university students than for primary school and kindergarten students. DT is a task- and activity-oriented learning process that relies on team communication and collaboration, DT studies at different stages might yield different results due to cognitive-developmental differences (Mentzer et al., 2015 ). Given the complexity of DT, more DT instruction could be applied to university and secondary school students. Moreover, for researchers, more studies should be carried out at diverse grade levels, especially in kindergarten ( k  = 2) and junior high school ( k  = 4).

It has no moderating effect, but the effect of multidiscipline is better than that of STEM and No-STEM. This suggests that DT can foster multidisciplinary learning, consistent with previous studies (Chang and Tsai, 2021 ; de Figueiredo, 2021 ; van de Grift and Kroeze, 2016 ). DT has typical interdisciplinary features (Lugmayr et al., 2014 ) and can promote new solutions, innovation, and collaboration opportunities for complex problems in multidisciplinary areas (Cook and Bush, 2018 ; Gleason and Jaramillo Cherrez, 2021 ). At the same time, DT can be integrated into the subjects of STEM or No-STEM to promote learning and teaching (Hsiao et al., 2023 ). DT is taught as a concept rather than affiliated with a specific discipline (Lor, 2017 ). We recommend integrating DT into existing courses rather than adding additional add-on activities (Sandars and Goh, 2020 ), especially for multidisciplinary learning (Hsiao et al., 2023 ). Different disciplines or subjects have their own suitable design processes (Sung and Kelley, 2019 ), the result provides a broad subject division for reference. Future research could explore DT’s effects on more detailed subjects. Besides, most of DT was applied to STEM subjects ( k  = 32), fewer to No-STEM and multidiscipline. So, DT’s effects on both latter subjects should be viewed cautiously and pay more research attention.

It has a moderating effect, indicating that different DT models could generate heterogeneity. Specially, OSIP ( r  = 0.766) and EDIPT ( r  = 0.522) have large effects; PPI2U ( r  = 0.346) and PAS ( r  = 0.301) have lower-medium effects; UOPIPT ( r  = 0.297), 3IE ( r  = 0.222), EDEIPT ( r  = 0.174) have small effects; and CTC ( r  = 0.191, p  > 0.05) and LAUNCH ( r  = 0.066, p  > 0.05) have no significant effects. Before DT can be effectively implemented to solve complicated problems, it is essential to have a solid grasp and comprehension of the different stages of the DT process (Dam and Teo, 2019 ). Different DT models involve different steps or stages, which may affect the processes of cognition and learning. For instance, EDIPT is easier for middle school students (Sarooghi et al.m 2019 ). Based on the result of this meta-analysis, we recommend that educators adopt the models of EDIPT and OSIP in DT instruction. Importantly, educators should not rely too heavily on the pre-determined procedural DT processes, which may hinder the creative potential of DT (Wells, 2013 ). Educators can rationalize the DT model based on their actual situations (Li and Zhan, 2022 ). It is also necessary to mention that, with the exception of EDIPT, the numbers of effect sizes included in other DT models are small, so their results should be treated cautiously and more explorations are needed.

It has no moderating effect. Team sizes of 0–4 ( r  = 0.477) and 5–7 ( r  = 0.441) have upper-medium effects. Teamwork and team collaboration are great challenges for many students. DT could enhance students’ teamwork (Guaman-Quintanilla et al., 2022 ). Success in DT requires teamwork, and larger teams can enrich the diversity of perspectives and increase the likelihood of solutions (Sung et al., 2017 ). Moreover, the composition of teams is also important (Apedoe et al., 2012 ). Generally speaking, heterogeneous ability groups may be appropriate in DT (Lou et al., 1996 ), i.e., both low-ability and high-ability students, and both male and female students (Yu and Yu, 2023 ). From the result of this research, 2–7 members in one group are beneficial. A larger number of teams may limit the teachers’ ability to guide and facilitate each team’s, and individual students’ learning (Apedoe et al., 2012 ). We recommend having <=7 members in one group. Specifically, when the class size is large, 5–7 is better; otherwise, 2–4 will be better. However, the result shows a broad team size for reference only. So, future research could explore which specific composition of teams (from 2 to 7 or above) in DT instruction is better.

It has a moderating effect. Specifically, Africa ( r  = 0.690) has a large effect, Asia ( r  = 0.435), and Australia ( r  = 0.355), and Europe ( r  = 0.346) have upper-medium effects, while America has an insignificant effect. This may be due to differences in cultural and educational systems in different regions. Different from individualistic cultures (e.g., America, Australia, Austria), most Asian countries are collectivist (e.g., China, Thailand, Indonesia, etc.), and students in these countries tend to value team goals more than individual goals (De Mooij and Hofstede, 2010 ). So, DT has an upper-medium effect on Asian students. Since the study distribution between different regions was highly uneven, this result should be treated judiciously. For instance, except for Asia, other regions’ studies are small, e.g., Australia ( N  = 1), Europe ( N  = 1), Africa ( N  = 1), and America ( N  = 1), so these regions need more attention. In general, DT positively impacts student learning in diverse regions, and DT is recommended to enhance Asian students’ learning.

Implications for future practice and work

This meta-analysis makes an evidence-based analysis of DT’s effects on student learning, and we provide some meaningful suggestions for future practice and research. These are also major contributions to the existing literature.

First , though DT’s effects on different types of learning outcomes are significantly different, it is still an effective teaching method to improve student learning. Educators can apply DT to enhance student academic performance, creative thinking, learning engagement, motivation, and problem-solving ability. Due to the limited amount of learning engagement and self-efficacy, their effects should be treated cautiously.

Second , a smaller size means a larger DT’s effect. Educators should keep the class size <51. Future research could focus more on exploring DT’s effects on larger class sizes (e.g., 51–80, etc.).

Third , treatment duration is a critical factor. <= 1 month or >=3 months are more suggested. Particularly, DT’s effect is smallest when the duration is 1–3 months, and this needs more future research.

Fourth , grade level is a key factor. DT could be applied to university and high school students. DT’s effect on junior high school is insignificant . Researchers could carry out more studies at kindergarten ( k  = 2) and junior high school ( k  = 4).

Fifth , DT can be used in the subjects of STEM, No-STEM, or multidiscipline. Meanwhile,future research could explore more on No-STEM, multidiscipline, and more detailed subjects.

Sixth , the DT model is also a critical factor that should be considered. Based on the results of this study, we recommend that educators adopt the models of EDIPT. Importantly, except for EDIPT, other models’ effects need more exploration.

Seventh , in terms of team size, it is suggested to have <= 7 members in one group. Specifically, when the class size is large, 5–7 is better; otherwise, 2–4 will be preferred. However, the result shows a wide range. Future research could explore which specific composition of teams (from 2 to 7 or above) is better for DT instruction.

Eighth , regional analysis suggests that DT is most used in Asia and is most suggested to support Asian student learning. However, the number of effect sizes in other regions is very small. Thus, their results should be viewed with caution, and future researchers can take more steps to test DT’s effects in America, Africa, Australia, and Europe.

Conclusions, limitations and future research

Conclusions.

This meta-analytic evidence reveals DT’s effects in education based on 25 empirical studies. We find that DT has an upper-medium positive effect on students’ learning. Specifically, DT can lead to higher learners’ creative thinking, learning engagement, motivation, problem-solving ability, self-efficacy, and academic achievement. In comparison, DT has better effects on student learning motivation, engagement, and academic achievement. Furthermore, the learning outcome, grade level, treatment duration, DT model, and region moderate DT’s effects on student learning. Namely, these moderators will affect DT’s effectiveness.

DT is on-trend worldwide (Aris et al., 2022 ), and it has profoundly changed many educators’ thinking about how to instruct to support learning (Hubbard and Datnow, 2020 ). Teachers are vital in DT instruction; they should be facilitators and navigators, not lecturers (Henriksen et al., 2020b ; Retna, 2016 ; Rusmann and Ejsing-Duun, 2022 ). In sum, DT can potentially promote learning at different grade levels, but the effectiveness of DT in education depends upon the goals (Panke, 2019 ). It is critical to make teachers see the value of DT in classrooms (Carroll et al. 2010 ) and conduct DT instruction with guidance and rules. This paper provides evidence-based findings for educators and researchers.

Limitations, research gaps, and future directions

There are several limitations that should be solved for future work. First , the literature is distributed unevenly by region, grade level, and DT model, so more future studies could be taken at kindergarten ( k  = 2), junior high school ( k  = 4), America ( k  = 1), Australia ( k  = 1), Africa ( k  = 2), Europe ( k  = 2), learning engagement ( k  = 1), self-efficacy ( k  = 3), and DT model except EDIPT. Second , the literature included in this meta-analysis was published in English . Future work could include other language studies. Third , the heterogeneity is considerable, and some potential moderators may be overlooked. Future work could explore more factors that influence DT’s effectiveness, e.g., learning environments. Fourth , the included literature is not large; future research could focus on experimental design to explore DT’s effects on student learning. Last , a meta-analysis may not display the whole status and findings of DT in education. Future researchers could conduct a systematic literature review to compensate for the neglected aspects of the current research.

Data availability

All data is provided in the forest plot and references. The details are provided at https://doi.org/10.7910/DVN/EHGCGS .

References (*included in this meta-analysis)

Abolhasani Z, Dehghani M, Javadipour M, Salehi K, Mohammadhasani N (2021) An analysis of the role of design thinking in promoting the 21st-century skills: a systematic review. Technol Educ J 16(1):81–98. https://doi.org/10.22061/tej.2021.7206.2508

Aflatoony L, Wakkary R, Neustaedter C (2018) Becoming a design thinker: assessing the learning process of students in a secondary level design thinking course. Int J Art Design Educ 37(3):438–453. https://doi.org/10.1111/jade.12139

*Albay EM, Eisma DV (2021) Performance task assessment supported by the design thinking process: Results from a true experimental research. Soc Sci Human Open 3(1):100116. https://doi.org/10.1016/j.ssaho.2021.100116

Apedoe XS, Ellefson MR, Schunn CD (2012) Learning together while designing: does group size make a difference? J Sci Edu Technol 21(1):83–94. https://doi.org/10.1007/s10956-011-9284-5

Aris NM, Ibrahim NH, Abd Halim ND, Ali S, Rusli NH, Suratin MNM, Hassan FC (2022) Evaluating the academic trends on design thinking research: A bibliometric analysis from 2000 to 2021. J Positive School Psychol 6(4):1022–1038

Google Scholar  

Baker III FW, Moukhliss S (2020) Concretising design thinking: a content analysis of systematic and extended literature reviews on design thinking and human‐centred design. Rev Educ 8(1):305–333. https://doi.org/10.1002/rev3.3186

Balakrishnan B (2022) Exploring the impact of design thinking tool among design undergraduates: a study on creative skills and motivation to think creatively. Int J Technol Design Educ 32(3):1799–1812. https://doi.org/10.1007/s10798-021-09652-y

*Bawaneh AK, Alnamshan MM (2023) Design Thinking in Science Education: Enhancing Undergraduate Students’ Motivation and Achievement in Learning Biology. Int J Inf Educ Technol 13(4):621–633. https://doi.org/10.18178/ijiet.2023.13.4.1846

Beckman SL, Barry M (2007) Innovation as a learning process: embedding design thinking. California Manag Rev 50(1):25–56. https://doi.org/10.2307/41166415

Becker K, Mentzer N (2015) Engineering design thinking: high school students’ performance and knowledge. In 2015 International Conference on Interactive Collaborative Learning, IEEE, (pp 5–12), Firenze, Italy

Blatchford P, Russell A, Brown P (2009) Teaching in large and small classes. In: LJ Saha, & AG Dworkin (eds.), International Handbook of Research on Teachers and Teaching. Springer, Boston, MA

Borenstein M, Hedges L, Higgins J, Rothstein H (2005) Comprehensive meta-analysis (version 3.3) (p. 104). Englewood, NJ: Biostat

Borenstein M, Hedges LV, Higgins JP, Rothstein HR (2009) Introduction to meta-analysis. John Wiley & Sons

Brown T (2008) Design thinking. Harvard Business Review 86(6):84–92

PubMed   Google Scholar  

*Brannon ME (2022) Exploring the impact of design thinking on creativity in preservice teachers. Doctoral dissertation, Kent State University

Bressler DM, Annetta LA (2022) Using game design to increase teachers’ familiarity with design thinking. Int J Technol Design Educ 32(2):1023–1035. https://doi.org/10.1007/s10798-020-09628-4

Carroll M, Goldman S, Britos L, Koh J, Royalty A, Hornstein M (2010) Destination, imagination and the fires within: design thinking in a middle school classroom. Int J Art Design Educ 29(1):37–53. https://doi.org/10.1111/j.1476-8070.2010.01632.x

Carter G, Milner A, McGill K, Pirkis J, Kapur N, Spittal MJ (2017) Predicting suicidal behaviours using clinical instruments: systematic review and meta-analysis of positive predictive values for risk scales. Br J Psychiatry 210(6):387–395. https://doi.org/10.1192/bjp.bp.116.182717

Article   PubMed   Google Scholar  

Çeviker-Çınar G, Mura G, Demirbağ-Kaplan M (2017) Design thinking: a new road map in business education. Design J 20(S1):977–987. https://doi.org/10.1080/14606925.2017.1353042

Article   Google Scholar  

*Chang YS, Tsai MC (2021) Effects of design thinking on artificial intelligence learning and creativity. Educ Stud 1–18. https://doi.org/10.1080/03055698.2021.1999213

*Chou PN, Shih RC (2022) Engineering design thinking in LEGO robot projects: an experimental study. In International Conference on Innovative Technologies and Learning. Springer, (p 324–333)

Cohen J (2013) S tatistical power analysis for the behavioral sciences. Routledge

Cook KL, Bush SB (2018) Design thinking in integrated STEAM learning: surveying the landscape and exploring exemplars in elementary grades. School Sci Mathe 118 (3-4):93–103. https://doi.org/10.1111/ssm.12268

Dam RF, Teo YS (2019) 5 stages in the design thinking process. https://www.interaction-design.org/literature/article/5-stages-in-the-design-thinking-process

*Dawbin B, Sherwen M, Dean S, Donnelly S, Cant R(2021) Building empathy through a design thinking project: A case study with middle secondary schoolboys. Issues Educ Res 31(2):440–457

De Mooij M, Hofstede G (2010) The Hofstede model: applications to global branding and advertising strategy and research. Int J Advertis 29(1):85–110. https://doi.org/10.2501/S026504870920104X

de Figueiredo MD (2021) Design is cool, but… A critical appraisal of design thinking in management education Int J Manag Educ 19(1):100429. https://doi.org/10.1016/j.ijme.2020.100429

Design Council (2015) What is the framework for innovation? Design Council’s evolved Double Diamond. https://www.designcouncil.org.uk/news-opinion/what-framework-innovation-design-councils-evolved-double-diamond

Downs SH, Black N (1998) The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Commun Health 52(6):377–384. https://doi.org/10.1136/jech.52.6.377

Article   CAS   Google Scholar  

Duval S, Tweedie R (2000) Trim and fill: a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56(2):455–463. https://doi.org/10.1111/j.0006-341x.2000.00455.x

Article   CAS   PubMed   Google Scholar  

Field AP, Gillett R (2010) How to do a meta‐analysis. Br J Math Stat Psychol 63(3):665–694. https://doi.org/10.1348/000711010X502733

Article   MathSciNet   PubMed   Google Scholar  

Gleason B, Jaramillo Cherrez N (2024) Design thinking approach to global collaboration and empowered learning: virtual exchange as innovation in a teacher education course. TechTrends 65(3):348–358. https://doi.org/10.1007/s11528-020-00573-6

Glen R, Suciu C, Baughn C (2014) The need for design thinking in business schools. Acad Manag Learn Educ 13(4):653–667. https://doi.org/10.5465/amle.2012.0308

Glen R, Suciu C, Baughn CC, Anson R (2015) Teaching design thinking in business schools. Int J Manag Educ 13(2):182–192. https://doi.org/10.1016/j.ijme.2015.05.001

Grant MJ, Booth A (2009) A typology of reviews: an analysis of 14 review types and associated methodologies. Health Inform Lib J 26(2):91–108. https://doi.org/10.1111/j.1471-1842.2009.00848.x

Grau SL, Rockett T (2022) Creating student-centred experiences: using design thinking to create student engagement. J Entrep 31(2_suppl):S135–S159. https://doi.org/10.1177/09713557221107443

Guaman-Quintanilla S, Everaert P, Chiluiza K, Valcke M (2022) Fostering teamwork through design thinking: evidence from a multi-actor perspective. Educ Sci 12(4):279. https://doi.org/10.3390/educsci12040279

Guaman-Quintanilla S, Everaert P, Chiluiza K, Valcke M (2023) Impact of design thinking in higher education: a multi-actor perspective on problem solving and creativity. Int J Technol Design Educ 33(1):217–240. https://doi.org/10.1007/s10798-021-09724-z

Henriksen D, Jordan M, Foulger TS, Zuiker S, Mishra P (2020a) Essential tensions in facilitating design thinking: collective reflections. J Formative Design Learn 4(1):5–16. https://doi.org/10.1007/s41686-020-00045-3

Henriksen D, Gretter S, Richardson C (2020b) Design thinking and the practicing teacher: addressing problems of practice in teacher education. Teach Educ 31(2):209–229. https://doi.org/10.1080/10476210.2018.1531841

Hennessey E, Mueller J (2020) Teaching and learning design thinking (DT). Can J Educ/Revue Canadienne de l'éducation 43(2):498–521

Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ 327(7414):557–560. https://doi.org/10.1136/bmj.327.7414.557

Article   PubMed   PubMed Central   Google Scholar  

Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (2019) Cochrane handbook for systematic reviews of interventions, 2nd edn. John Wiley & Sons

Holstermann N, Grube D, Bögeholz S (2010) Hands-on activities and their influence on students’ interest. Res Sci Educ 40(5):743–757. https://doi.org/10.1007/s11165-009-9142-0

Howard JL, Bureau JS, Guay F, Chong JX, Ryan RM (2021) Student motivation and associated outcomes: a meta-analysis from self-determination theory. Perspect Psychol Sci 16(6):1300–1323. https://doi.org/10.1177/1745691620966789

*Hsiao HS, Yu KC, Chang YS, Chien YH, Lin KY, Lin CY, … Lin YW (2017) The study on integrating the design thinking model and STEM activity unit for senior high school living technology course. In 2017 7th World Engineering Education Forum. IEEE, (pp 383–390), Kuala Lumpur, Malaysia

*Hsiao HS, Chang YC, Lin KY, Chen JC, Lin CY, Chung GH, Chen JH (2023) Applying the design thinking model to hands-on mechatronics STEM activities for senior high school students to improve the learning performance and learning behavior. Int J Technol Design Educ 33(a):1389–1408

Hubbard L, Datnow A (2020)Design thinking, leadership, and the grammar of schooling: Implications for educational change. Am J Educ 126(4):499–518. https://doi.org/10.1086/709510

IDEO (2013) Design thinking for educators. https://www.ideo.com/post/design-thinking-for-educators

Kelley GA, Kelley KS (2019) Systematic reviews and meta-analysis in rheumatology: a gentle introduction for clinicians. Clin Rheumatol 38(8):2029–2038. https://doi.org/10.1007/s10067-019-04590-6

*Khongprakob N, Petsangsri S (2022) Promoting Undergraduate Creativity and Positive Learning Outcomes through a Design Thinking and Visual Thinking Teaching Model. J Positive Psychol Wellbeing 6(1):3809–3821

Koh JHL, Chai CS, Wong B, Hong HY (2015) Design thinking for education: conceptions and applications in teaching and learning. Springer, Singapore

*Kuo HC, Yang YTC, Chen JS, Hou TW, Ho MT(2022) The impact of design thinking PBL robot course on college students’ learning motivation and creative thinking. IEEE Transac Educ 65(22):1–8. https://doi.org/10.1109/TE.2021.3098295

*Ladachart L, Cholsin J, Kwanpet S, Teerapanpong R, Dessi A, Phuangsuwan L, Phothong W(2022) Ninth-grade students’ perceptions on the design-thinking mindset in the context of reverse engineering. Int J Technol Design Educ 32(5):2445–2465. https://doi.org/10.1007/s10798-021-09701-6

Lake D, Flannery K, Kearns M (2021) A Cross-Disciplines and Cross-Sector Mixed-Methods Examination of Design Thinking Practices and Outcome. Innov Higher Edu 46(3):337–356. https://doi.org/10.1007/s10755-020-09539-1

Lei H, Chiu MM, Li F, Wang X, Geng YJ (2020) Computational thinking and academic achievement: A meta-analysis among students. Children and Youth Services Review 118:105439. https://doi.org/10.1016/j.childyouth.2020.105439

Liedtka J (2015) Perspective: linking design thinking with innovation outcomes through cognitive bias reduction. J Prod Innov Manag 32(6):925–938. https://doi.org/10.1111/jpim.12163

*Lin PY, Hong HY, Chai CS(2020a) Fostering college students’ design thinking in a knowledge-building environment. Educ Technol Res Dev 68(3):949–974. https://doi.org/10.1007/s11423-019-09712-0

*Lin L, Shadiev R, Hwang WY, Shen S(2020b) From knowledge and skills to digital works: An application of design thinking in the information technology course. Thinking Skills Creativ 36:100646. https://doi.org/10.1016/j.tsc.2020.100646

Lipsey MW, Wilson DB (2001) Practical meta-analysis. Sage, Thousand Oaks, CA, USA

*Liu GC, Ko CH (2021) Effects of social media and design thinking on corporate identity design course in Taiwan. E-Learn Digital Media 18(3):251–268. https://doi.org/10.1177/2042753020950879

Li T, Zhan Z (2022) A systematic review on design thinking Integrated Learning in K-12 education. Appl Sci 12(16):8077. https://doi.org/10.3390/app12168077

*Liu S, Li C (2023) Promoting design thinking and creativity by making: a quasi-experiment in the information technology course. Thinking Skills Creativ 49:101335. https://doi.org/10.1016/j.tsc.2023.101335

*Liu X, Gu J, Xu J (2024a) The impact of the design thinking model on pre-service teachers’ creativity self-efficacy, inventive problem-solving skills, and technology-related motivation. Int J Technol Design Educ 34(1):167–190. https://doi.org/10.1007/s10798-023-09809-x

Liu S, Zhao X, Meng X, Ji W, Liu L, Li W, Tao Y, Peng Y, Yang Q (2024b) Research on the application of extended reality in the construction and management of landscape engineering. Electronics 13(5):897. https://doi.org/10.3390/electronics13050897

Lor R (2017) Design thinking in education: a critical review of literature. In International Academic Conference on Social Sciences and Management / Asian Conference on Education and Psychology. Bangkok, Thailand, (p 37–68)

Lou Y, Abrami PC, Spence JC, Poulsen C, Chambers B, d’Apollonia S (1996) Within-class grouping: a meta-analysis. Rev Educ Res 66(4):423–458. https://doi.org/10.3102/00346543066004423

Lugmayr A, Stockleben B, Zou Y, Anzenhofer S, Jalonen M (2014) Applying “design thinking” in the context of media management education. Multimedia Tools Appl 71:119–157. https://doi.org/10.1007/s11042-013-1361-8

McLaughlin JE, Lake D, Chen E, Guo W, Knock M, Knotek S (2023) Faculty experiences and motivations in design thinking teaching and learning. Front Educ 8:1172814. https://doi.org/10.3389/feduc.2023.1172814

Mentzer N, Becker K, Sutton M (2015) Engineering design thinking: high school students’ performance and knowledge. J Eng Educ 104(4):417–432. https://doi.org/10.1002/jee.20105

Moher D, Liberati A, Tetzlaff J, Altman DG, Group PRISMA (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Internal Med 151(4):264–269. https://doi.org/10.7326/0003-4819-151-4-200908180-00135

*Nazim M, Mohammad T (2022) Implications of design thinking in an EFL classroom: writing in context. Theory Pract Language Stud 12(12):2723–2730. https://doi.org/10.17507/tpls.1212.31

Oje O (2021) The effects of hands-on learning on stem student motivation: a meta-analysis. Master thesis, Washington State University. https://doi.org/10.7273/000000061

*Padagas RC(2021) Design Thinking in a Professional Nursing Course–Its Effectiveness and Unearthed Lessons Revista Românească pentru Educaţie Multidimensională 13(2):132–146

Panke S (2019) Design thinking in education: perspectives, opportunities and challenges Open Educ Stud 1(1):281–306. https://doi.org/10.1515/edu-2019-0022

Pande M, Bharathi SV (2020) Theoretical foundations of design thinking–A constructivism learning approach to design thinking. Thinking Skills Creativ 36:100637. https://doi.org/10.1016/j.tsc.2020.100637

Plattner H (2009) An introduction to design thinking: Process guide. Stanford Institute of Design

*Pratomo LC, Wardani DK (2021) The effectiveness of design thinking in improving student creativity skills and entrepreneurial alertness. Int J Instruct 14(4):695–712. https://doi.org/10.29333/iji.2021.14440a

Rao H, Puranam P, Singh J (2022) Does design thinking training increase creativity? Results from a field experiment with middle-school students. Innovation 24(2):315–332. https://doi.org/10.1080/14479338.2021.1897468

Razali NH, Ali NNN, Safiyuddin SK, Khalid F (2022) Design thinking approaches in education and their challenges: a systematic literature review. Creative Educ 13(7):2289–2299. https://doi.org/10.4236/ce.2022.137145

Razzouk R, Shute V (2012) What is design thinking and why is it important? Rev Educ Res 82(3):330–348. https://doi.org/10.3102/0034654312457429

Retna KS (2016) Thinking about “design thinking”: a study of teacher experiences. Asia Pac J Educ 36(S1):5–19. https://doi.org/10.1080/02188791.2015.1005049

*Roth K, Globocnik D, Rau C, Neyer AK(2020) Living up to the expectations: the effect of design thinking on project success Creativ Innov Manag 29(4):667–684. https://doi.org/10.1111/caim.12408

Rowe P (1987) Design thinking. The MIT Press, Cambridge, MA, USA

Rusmann A, Ejsing-Duun S (2022) When design thinking goes to school: a literature review of design competences for the K-12 level. Int J Technol Design Educ 32(4):2063–2091. https://doi.org/10.1007/s10798-021-09692-4

Sarooghi H, Sunny S, Hornsby J, Fernhaber S (2019) Design thinking and entrepreneurship education: Where are we, and what are the possibilities? J Small Bus Manag 57(S1):78–93. https://doi.org/10.1111/jsbm.12541

Sandars J, Goh PS (2020) Design thinking in medical education: the key features and practical application. J Med Educ Curricular Dev 7:1–5. https://doi.org/10.1177/2382120520926518

Scheer A, Noweski C, Meinel C (2012) Transforming constructivist learning into action: Design thinking in education. Design Technol Edu: Int J 17(3):8–19

Simon HA (1969) The sciences of the artificial. The MIT Press, Cambridge, MA, USA

*Simeon MI, Samsudin MA, Yakob N(2022) Effect of design thinking approach on students’ achievement in some selected physics concepts in the context of STEM learning Int J Technol Design Educ 32(1):185–212. https://doi.org/10.1007/s10798-020-09601-1

Stang A (2010) Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. Eur J Epidemiol 25(9):603–605. https://doi.org/10.1007/s10654-010-9491-z

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, Higgins JP (2016) ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355:i4919. https://doi.org/10.1136/bmj.i4919

Sung YT, Yang JM, Lee HY (2017) The effects of mobile-computer-supported collaborative learning: meta-analysis and critical synthesis. Rev Educ Res 87(4):768–805. https://doi.org/10.3102/0034654317704307

Sung E, Kelley TR (2019) Identifying design process patterns: a sequential analysis study of design thinking. Int J Technol Design Educ 29(2):283–302. https://doi.org/10.1007/s10798-018-9448-1

Swanson E, McCulley LV, Osman DJ, Scammacca Lewis N, Solis M (2019) The effect of team-based learning on content knowledge: a meta-analysis. Active Learn Higher Educ 20(1):39–50. https://doi.org/10.1177/1469787417731201

*Tsai CW (2015) Investigating the effects of web-mediated design thinking and co-regulated learning on developing students’ computing skills in a blended course. Univ Access Inform Soc 14(2):295–305. https://doi.org/10.1007/s10209-015-0401-8

Tu JC, Liu LX, Wu KY (2018) Study on the learning effectiveness of Stanford design thinking in integrated design education. Sustainability 10(8):2649. https://doi.org/10.3390/su10082649

van de Grift TC, Kroeze R (2016) Design thinking as a tool for interdisciplinary education in health care. Acad Med 91(9):1234–1238. https://doi.org/10.1097/ACM.0000000000001195

Wells A (2013) The importance of design thinking for technological literacy: a phenomenological perspective. Int J Technol Design Educ 23(3):623–636. https://doi.org/10.1007/s10798-012-9207-7

Article   MathSciNet   Google Scholar  

Wilson ML, Ritzhaupt AD, Cheng L (2020) The impact of teacher education courses for technology integration on pre-service teacher knowledge: A meta-analysis study. Comput Educ 156:103941. https://doi.org/10.1016/j.compedu.2020.103941

*Xu W, Chen JC, Lou YF, Chen H (2024) Impacts of maker education-design thinking integration on knowledge, creative tendencies, and perceptions of the engineering profession. Int J Technol Design Educ 34(1):75–107. https://doi.org/10.1007/s10798-023-09810-4

Yande A (2023) Enhancing Student Learning Outcomes using Design Thinking Strategies. Honor thesis, University of Texas at Austin

*Yalçın V, Erden Ş(2021) The effect of STEM activities prepared according to the design thinking model on preschool children’s creativity and problem-solving skills. Thinking Skills Creativ 41:100864. https://doi.org/10.1016/j.tsc.2021.100864

Yu Q, Yu K (2023) Knowledge Sharing Behavior of Team Members in Blended Team-Based Learning: Moderating of Team Learning Ability. Asia-Pac Educ Res 1–13. https://doi.org/10.1007/s40299-023-00795-1

Yu Q, Yu K, Li B, Wang Q (2023) Effectiveness of blended learning on students’ learning performance: a meta-analysis. J Res Technol Educ 1–22. https://doi.org/10.1080/15391523.2023.2264984

*Ziadat AH, Sakarneh MA (2021) Online design thinking problems for enhancing motivation of gifted students. Int J Learn Teach Educ Res 20(8):91–107. https://doi.org/10.26803/ijlter.20.8.6

Download references

Acknowledgements

We are very grateful to the editor and reviewers for their constructive comments and hard work. We would also like to express our gratitude to Springer Nature.

Author information

Authors and affiliations.

Fudan University, Shanghai, China

Qing Yu, Kun Yu & Rongri Lin

You can also search for this author in PubMed   Google Scholar

Contributions

Qing Yu and Kun Yu: conceptualization, data curation and analysis, investigation, methodology, validation, and writing-review & editing. Qing Yu: writing-original draft, project administration, and resources. Rongri Lin: investigation, validation, resources, and writing-review & editing.

Corresponding authors

Correspondence to Qing Yu , Kun Yu or Rongri Lin .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Additional information.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Yu, Q., Yu, K. & Lin, R. A meta-analysis of the effects of design thinking on student learning. Humanit Soc Sci Commun 11 , 742 (2024). https://doi.org/10.1057/s41599-024-03237-5

Download citation

Received : 26 January 2024

Accepted : 23 May 2024

Published : 10 June 2024

DOI : https://doi.org/10.1057/s41599-024-03237-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research based on secondary data analysis

Our Recommendations

  • Best Small Business Loans for 2024
  • Businessloans.com Review
  • Biz2Credit Review
  • SBG Funding Review
  • Rapid Finance Review
  • 26 Great Business Ideas for Entrepreneurs
  • Startup Costs: How Much Cash Will You Need?
  • How to Get a Bank Loan for Your Small Business
  • Articles of Incorporation: What New Business Owners Should Know
  • How to Choose the Best Legal Structure for Your Business

Small Business Resources

  • Business Ideas
  • Business Plans
  • Startup Basics
  • Startup Funding
  • Franchising
  • Success Stories
  • Entrepreneurs
  • The Best Credit Card Processors of 2024
  • Clover Credit Card Processing Review
  • Merchant One Review
  • Stax Review

How to Conduct a Market Analysis for Your Business

  • Local Marketing Strategies for Success
  • Tips for Hiring a Marketing Company
  • Benefits of CRM Systems
  • 10 Employee Recruitment Strategies for Success
  • Sales & Marketing
  • Social Media
  • Best Business Phone Systems of 2024
  • The Best PEOs of 2024
  • RingCentral Review
  • Nextiva Review
  • Ooma Review
  • Guide to Developing a Training Program for New Employees
  • How Does 401(k) Matching Work for Employers?
  • Why You Need to Create a Fantastic Workplace Culture
  • 16 Cool Job Perks That Keep Employees Happy
  • 7 Project Management Styles
  • Women in Business
  • Personal Growth
  • Best Accounting Software and Invoice Generators of 2024
  • Best Payroll Services for 2024
  • Best POS Systems for 2024
  • Best CRM Software of 2024
  • Best Call Centers and Answering Services for Busineses for 2024
  • Salesforce vs. HubSpot: Which CRM Is Right for Your Business?
  • Rippling vs Gusto: An In-Depth Comparison
  • RingCentral vs. Ooma Comparison
  • Choosing a Business Phone System: A Buyer’s Guide
  • Equipment Leasing: A Guide for Business Owners
  • HR Solutions
  • Financial Solutions
  • Marketing Solutions
  • Security Solutions
  • Retail Solutions
  • SMB Solutions

A market analysis can help you identify how to better position your business to be competitive and serve your customers.

author image

Table of Contents

A market analysis is a thorough assessment of a market within a specific industry. These analyses have many benefits, such as reducing risk for your business and better informing your business decisions. A market analysis can be a time-intensive process, but it is straightforward and easy to do on your own in seven steps.

To perform a market analysis for your business, follow the steps outlined in this guide.

What does a market analysis include?

In a market analysis, you will study the dynamics of your market, such as volume and value, potential customer segments , buying patterns, competition, and other important factors. A thorough marketing analysis should answer the following questions:

  • Who are my potential customers?
  • What are my customers’ buying habits?
  • How large is my target market ?
  • How much are customers willing to pay for my product?
  • Who are my main competitors?
  • What are my competitors’ strengths and weaknesses ?

What are the benefits of running a marketing analysis?

A marketing analysis can reduce risk, identify emerging trends, and help project revenue. You can use a marketing analysis at several stages of your business, and it can even be beneficial to conduct one every year to keep up to date with any major changes in the market.

A detailed market analysis will usually be part of your business plan , since it gives you a greater understanding of your audience and competition. This will help you build a more targeted marketing strategy.

These are some other major benefits of conducting a market analysis:

  • Risk reduction: Knowing your market can reduce risks in your business, since you’ll have an understanding of major market trends, the main players in your industry, and what it takes to be successful, all of which will inform your business decisions. To help you further protect your business, you can also conduct a SWOT analysis , which identifies the strengths, weaknesses, opportunities and threats for your business.
  • Targeted products or services: You are in a much better position to serve your customers when you have a firm grasp on what they are looking for from you. When you know who your customers are, you can use that information to tailor your business’s offerings to your customers’ needs.
  • Emerging trends: Staying ahead in business is often about being the first to spot a new opportunity or trend, and using a marketing analysis to stay on top of industry trends is a great way to position yourself to take advantage of this information.
  • Revenue projections: A market forecast is a key component of most marketing analyses, as it projects the future numbers, characteristics and trends in your target market. This gives you an idea of the profits you can expect, allowing you to adjust your business plan and budget accordingly.
  • Evaluation benchmarks: It can be difficult to gauge your business’s success outside of pure numbers. A market analysis provides benchmarks or key performance indicators (KPIs) against which you can judge your company and how well you are doing compared to others in your industry.
  • Context for past mistakes: Marketing analytics can explain your business’s past mistakes or industry anomalies. For example, in-depth analytics can explain what impacted the sale of a specific product, or why a certain metric performed the way it did. This can help you avoid making those mistakes again or experiencing similar anomalies, because you’ll be able to analyze and describe what went wrong and why.
  • Marketing optimization: This is where an annual marketing analysis comes in handy – regular analysis can inform your ongoing marketing efforts and show you which aspects of your marketing need work, and which are performing well in comparison to the other companies in your industry.

What are the drawbacks of running a marketing analysis?

The below drawbacks of running a market analysis pertain less to the method itself than the resources it requires.

  • Market analysis can be expensive. If you’re not as familiar with marketing concepts such as market volume and customer segmentation, you might want to outsource your market analysis. Doing so can be great for your analysis’s quality, but it can also leave a big dent in your budget. Narrow your market analysis to a certain group – perhaps current customers – to lower your costs.
  • Market analysis can be time-consuming. Market analysis can take precious time away from more directly business-related tasks. You can analyze one area at a time – say, buying patterns or competition – to free up your day-to-day schedule.
  • Market analysis can require extra staff. Some larger companies retain in-house market analysis staff, and you can follow their lead. Doing so, though, comes with all the usual costs of hiring a new employee . The question then becomes: Do you conduct your market analysis yourself, outsource it, or hire in-house? The more expensive options can often yield more meaningful insights.
  • Market analysis can be narrow. The most successful market analyses use actual customer feedback, which analysts often get through customer surveys. These surveys may reach only a portion of your entire customer base, leading to an inaccurate sample size. The result is that market analysis may not fully detail your customers and what you should know about them.

Market analysis vs. conjoint analysis vs. sentiment analysis

Where market analysis is broad and comprehensive, conjoint analysis focuses on how customers value what you offer. Surveys are often the backbone of conjoint analysis – they’re a great way for customers to share what drives their purchases. Product testing is an especially common application of conjoint analysis. This method can yield insights into pricing and product features and configurations.

Sentiment analysis goes beyond number-driven market and conjoint analysis to identify how customers qualitatively feel about your offerings. It can show you what customers are happy and unhappy about with your offerings or buying process. You can also wade into deeper emotional territory such as anger, urgency and intention, or you can dig up descriptive feedback. It’s a great tool to use alongside market analysis, whereas conjoint analysis is all but included in market analysis.

How to conduct a market analysis

While conducting a marketing analysis is not a complicated process, it does take a lot of dedicated research, so be prepared to devote significant time to the process.

These are the seven steps of conducting a market analysis:

1. Determine your purpose.

There are many reasons you may be conducting a market analysis, such as to gauge your competition or to understand a new market. Whatever your reason, it’s important to define it right away to keep you on track throughout the process. Start by deciding whether your purpose is internal – like improving your cash flow or business operations – or external, like seeking a business loan. Your purpose will dictate the type and amount of research you will do.

2. Research the state of the industry.

Map a detailed outline of the current state of your industry. Include where the industry seems to be heading, using metrics such as size, trends and projected growth, with plenty of data to support your findings. You can also conduct a comparative market analysis to help you find your competitive advantage within your specific market.

3. Identify your target customer.

Not everyone in the world will be your customer , and it would be a waste of your time to try to get everyone interested in your product. Instead, use a target market analysis to decide who is most likely to want your product and focus your efforts there. You want to understand your market size, who your customers are, where they come from, and what might influence their buying decisions. To do so, look at demographic factors like these:

During your research, you might consider creating a customer profile or persona that reflects your ideal customer to serve as a model for your marketing efforts.

4. Understand your competition.

To be successful, you need a good understanding of your competitors, including their market saturation, what they do differently than you, and their strengths, weaknesses and advantages in the market. Start by listing all your main competitors, then go through that list and conduct a SWOT analysis of each competitor. What does that business have that you don’t? What would lead a customer to choose that business over yours? Put yourself in the customer’s shoes.

Then, rank your list of competitors from most to least threatening, and decide on a timeline to conduct regular SWOT analyses on your most threatening competitors.

5. Gather additional data.

When conducting marketing analyses, information is your friend – you can never have too much data. It is important that the data you use is credible and factual, so be cautious of where you get your numbers. These are some reputable business data resources:

  • U.S. Bureau of Labor Statistics
  • U.S. Census Bureau
  • State and local commerce sites
  • Trade journals
  • Your own SWOT analyses
  • Market surveys or questionnaires

6. Analyze your data.

After you collect all the information you can and verify that it is accurate, you need to analyze the data to make it useful to you. Organize your research into sections that make sense to you, but try to include ones for your purpose, target market and competition.

These are the main elements your research should include:

  • An overview of your industry’s size and growth rate
  • Your business’s projected market share percentage
  • An industry outlook
  • Customer buying trends
  • Your forecasted growth
  • How much customers are willing to pay for your product or service

7. Put your analysis to work.

Once you’ve created a market analysis, it’s time to actually make it work for you. Internally, look for where you can use your research and findings to improve your business. Have you seen other businesses doing things that you’d like to implement in your own organization? Are there ways to make your marketing strategies more effective?

If you conducted your analysis for external purposes, organize your research and data into an easily readable and digestible document to make it easier to share with lenders.

Retain all of your information and research for your next analysis, and consider making a calendar reminder each year so that you stay on top of your market.

Making market analysis easy

If you have the time to conduct a market analysis yourself, go for it – this guide will help. If you don’t have the time, hiring an in-house expert or outsourcing your analysis is often worth the cost. Your analysis will help you figure out who to target and how – and that’s a huge part of business success.

thumbnail

Building Better Businesses

Insights on business strategy and culture, right to your inbox. Part of the business.com network.

research based on secondary data analysis

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Shanghai Arch Psychiatry
  • v.26(6); 2014 Dec

Language: English | Chinese

Secondary analysis of existing data: opportunities and implementation

现有数据的分析 : 机遇与实施, hui g cheng.

1 Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China

Michael R. PHILLIPS

2 Departments of Psychiatry and Global Health, Emory University, Georgia, United States

The secondary analysis of existing data has become an increasingly popular method of enhancing the overall efficiency of the health research enterprise. But this effort depends on governments, funding agencies, and researchers making the data collected in primary research studies and in health-related registry systems available to qualified researchers who were not involved in the original research or in the creation and maintenance of the registry systems. The benefits of doing this are clear but the barriers are many, so the effort of increasing access to such material has been slow, particularly in low- and middleincome countries. This article introduces the rationale and concept of the secondary analysis of existing data, describes several sources of publicly available datasets, provides general guidelines for conducting secondary analyses of existing data, and discusses the advantages and disadvantages of analyzing existing data.

概述

现有数据的二次分析已成为提升卫生研究机构 整体效率的一种日益流行的方法。该工作取决于政府、 资助机构以及研究者,取决于他们能不能让没有参与 原始研究、没有参与创建和维护登记系统的其他合格 研究人员获得原始研究数据或登记系统的数据。二次 分析的好处是显而易见的,但面临的障碍很多。因此 提高这些数据可获得性的工作进展缓慢,在低收入和 中等收入国家尤为如此。本文介绍了现有数据二次分 析的基本原理和概念,描述了若干个可公开获得的数 据库,为现有数据的二次分析提供一般准则,并讨论 了现有数据分析的优势和不足。

1. Background

A typical mental health research project begins with the development of a comprehensive research proposal and is (hopefully) followed by the successful acquisition of funding; the researcher then collects data, analyzes the results, and writes-up one or more research reports. Another less common, but no less important, research method is the analysis of existing data. The analysis of existing data is a cost-efficient way to make full use of data that are already collected to address potentially important new research questions or to provide a more nuanced assessment of the primary results from the original study. In this article we discuss the distinction between primary and secondary data, provide information about existing mental health-related data that are publically available for further analysis, list the steps of conducting analyzes of existing data, and discuss the pros and cons of analyzing existing data.

2.  Data sources

2.1. ‘primary data’, ‘secondary data’, or ‘existing data’.

There is frequently confusion about the use of the terms ‘primary data’, ‘primary data analysis’, ‘secondary data’, and ‘secondary data analysis’. This confusion arises because it is never completely clear whether data employed in an analysis should be considered ‘primary data’ or ‘secondary data’. Based on the usage of the National Institute of Health (NIH) in the United States, ‘primary data analysis’ is limited to the analysis of data by members of the research team that collected the data, which are conducted to answer the original hypotheses proposed in the study. All other analyses of data collected for specific research studies or analyses of data collected for other purposes (including registry data) are considered ‘secondary analyses of existing data’, whether or not the persons conducting the analyses participated in the collection of the data. This replacement of the traditional term ‘secondary data analysis’ with the term ‘secondary analysis of existing data’ is a much clearer categorization because it avoids the confusion of trying to decide whether the data employed in an analysis is ‘primary data’ or ‘secondary data’.

Of course, there are cases where the distinction is less clear. One example would be the analysis of data by a researcher who has no connection with the data collection team to address a research question that overlaps with the hypotheses considered in the original study. Another example would be when a member of the original research team subsequently revisits the original hypothesis in an analysis that uses different statistical methods. These situations commonly occur in the analyses of large-scale population surveys where the research questions are generally broad (e.g., sociodemographic correlates of depression) and when the participating researchers share the cleaned data with the broader research community. In both of these situations, based on a strict application of the NIH usage, the analyses would be considered ‘secondary analysis of existing data’ NOT ‘primary data analysis’ and NOT ‘secondary data analysis’. In fact, we recommend avoiding the ambiguous term ‘secondary data analysis’ entirely.

2.2 . Sources of existing data

Existing data can be private or public. To maximize the output of data collection efforts, researchers often assess many more variables than those strictly needed to answer their original hypotheses. Often times, these data are not fully used or explored by the original research team due to restrictions in time, resources, or interest. Unfortunately, the vast majority of these completed datasets are not made available, and in many countries (including China), there isn’t even a registry or other means of determining what data have been previously collected about a specific research topic (so there are many unnecessarily duplicated studies). However, if the research team is willing to share their data with other researchers who have the interest, skills, and resources to conduct additional analyses, this can greatly increase the productivity of the research team that conducted the original study. This type of exchange usually involves an agreement between the data collection team and the data analysis team to clarify details about data sharing protocols and how the data should be used.

There are several publically available health-related electronic databases that can be used to address a variety of research topics. A few examples follow. (a) The World Health Organization (WHO) Global Health Observatory Data Repository ( http://apps.who.int/ gho/data/?theme=main ) provides statistics on an array of health-related topics for countries around the world. However, these statistics are generally at the country-level so regional or population subgroup-specific data are not usually available. Another similar source is data available on the website of the Institute of Health Metrics and Evaluation at the University of Washington in the United States ( http://www.healthdata.org/ ). This website includes the Global Burden of Disease (GBD) estimates which quantify country-level healthrelated burden (i.e., cause-specific mortality and disability) from 1990 to 2010 and data visualization tools which make it possible to compare the relative importance of different health conditions (including mental disorders) between countries and between different population groups within countries ( http:// www.healthdata.org/gbd/data-visualizations ).

(b) Established in 1962, the Inter-university Consortium for Political and Social Research (ICPSR, http://www.icpsr.umich.edu/icpsrweb/landing.jsps ) is a major data source for scholars in the social sciences. Located at the University of Michigan in the United States, ICPSR is a membership-based network that includes 65, 000 datasets from over 8, 000 discrete studies or surveys, including a number of largescale population surveys conducted in the United States and other countries. The website provides online analysis tools to generate simple descriptive statistics including frequencies and cross-tabulations. In addition to ASCII and .txt format, the website also provides options for downloading data in formats that are compatible with popular statistical software packages such as SAS, Stata, SPSS, and R. The website also provides technical support in data analysis and in the identification of potential data sources. In order to download data, users need to register with the system.

(c) A variety of government agencies in the United States regularly collect data on different health-related topics and post them online for free download once data cleaning is completed. For example, the United States Census Bureau ( http://www.census.gov/data.html ) provides basic demographic data and the Centers for Disease Control and Prevention ( http://www.cdc.gov ) provides access to data on causespecific disability, mortality, and an array of health conditions including injuries and violence, alcohol use, and tobacco smoking. The Substance Abuse and Mental Health Services Administration have a range of datasets posted on their website ( http://www.samhsa.gov/data/ ) about various mental and substance use disorders. Users interested in more information about publicly available health-related data can refer to Secondary data sources for public health: A practical guide by Boslaugh. [1]

3. Conducting a secondary analysis of existing data

There are two general approaches for analyzing existing data: the ‘research question-driven’ approach and the ‘data-driven’ approach. In the research question approach, researchers have an a priori hypothesis or a question in mind and then look for suitable datasets to address the question. In the data-driven approach researchers glance through variables in a particular dataset and decide what kind of questions can be answered by the available data. In practice, the two approaches are often used jointly and iteratively. Researchers typically start with a general idea about the question or hypothesis and then look for available datasets which contain the variables needed to address the research questions of interest. If they do not find datasets that contain all variables needed, they usually modify the research question(s) or the analysis plan based on the best available data.

When conducting either research question-driven or data-driven approaches to the analysis of existing data, researchers need to follow the same basic steps.

(a) There needs to be an analytic plan that includes the specific variables to be considered and the types of analyses that will be conducted. (In the research question-driven approach this is determined before the researchers look at the actual data available in the dataset; in the data-driven approach this is determined after the researchers look through the dataset.)

(b) Researchers must have a comprehensive understanding of the strengths and weaknesses of the dataset. This involves obtaining detailed descriptions of the population under study, sampling scheme and strategy, time frame of data collection, assessment tools, response levels, and quality control measures. To the extent possible, researchers need to obtain and study in detail all survey instruments, codebooks, guidebooks and any other documentation provided for users of the databases. These documents should provide sufficient information to assess the internal and external validity of the data and allow researchers to determine whether or not there are enough cases in the dataset to generate meaningful estimates about the topic(s) of interest.

(c) Before conducting the analysis, researchers need to generate operational definitions of the exposure variable(s), outcome variable(s), covariates, and confounding variables that will be considered in the analysis.

(d) The first step in the analysis is to run frequency tables and cross-tabulations of all variables that will be included in the main analysis. This provides information about the use of the coding pattern for each variable and about the profile of missing data for each variable. Due attention should be paid to skip patterns, which can result in large numbers of missing values for certain variables. In comprehensive surveys that take a long time to complete, skipping a group of questions that are not relevant for a particular respondent (i.e., ‘skips’) is a common method used to reduce interviewee burden and to avoid interviewee burn-out. For example, in a survey about alcohol-related problems, the survey module typically starts with questions about whether the interviewee has ever drunk alcohol. If the answer is negative, all questions about drinking behaviors and related problems are skipped because it is safe to assume that this interviewee does not have any such problems. Prior to conducting the full analysis, these types of missing values (which indicate that a particular condition is not relevant for the respondent) need to be distinguished from missing values for which the data is, in fact, missing (which indicate that the status of the individual related to the variable is unknown). Researchers should be aware of these skips in order to make a strategic judgment about the coding of these variables.

(e) Finally, the researcher should recode the original variables in order to properly handle missing values and, if necessary, to transform the distribution of the variables so that they meet the assumptions of the statistical model to be used in the intended analysis. The recoded variables should be stored in a new dataset and all syntax for the recoding of variables (and for the analysis itself) should be documented. The original dataset should NEVER be altered in any way.

(f) When using data from longitudinal surveys or when using data stored in different datasets, it is critical to check the accuracy of the identifier variable(s) to ensure that the data from different time periods or from different datasets is matched correctly when merging the datasets.

(g) For longitudinal studies, the assessment methods and the coding methods for key variables can change over time. Thus, close examination of the survey questionnaires and codebooks are essential to ensure that each variable in the combined dataset has a uniform interpretation throughout the study. This may require the creation of separate uniform variables that are constructed in different ways at different points in time throughout the study, such as the crosswalks to convert diagnostic categories between DSM-III, DSM-IV, and DSM-5.

(h) Many population-based surveys, particularly those focused on assessing the prevalence of relatively uncommon conditions such as schizophrenia, employ multi-stage sampling strategies to enrich the sample. In this case, the data set usually includes design variables for each case (including sampling weight, strata, and primary sampling unit) that are needed to adjust the analysis of interest (such as the prevalence of a condition, odds ratios, mean differences, etc.). Researchers who conduct secondary analysis of existing data should consider the design variables used in the original study and apply these variables appropriately in their own analyses in order to generate less biased estimates. [2] , [3]

4.  Pros and cons of the secondary analysis of existing data

4.1 . advantages.

The most obvious advantage of the secondary analysis of existing data is the low cost. There is sometimes a fee required to obtain access to such datasets, but this is almost always a tiny proportion of what it would cost to conduct an original study. Also, the data posted online are usually cleaned by professional staff members who often provide detailed documentation about the data collection and data cleaning process. Moreover, teams conducting large-scale population-based surveys that are made available to others usually employ statisticians to generate ready-to-use survey weights and design variables - something that most users of the data are unable to do - so this helps users make necessary adjustments to their estimates. This is a great boon to graduate students and others who have lots of good ideas but no money to conduct the studies that could test their ideas.

Researchers who would rather spend their time testing hypotheses and thinking about different research approaches rather than collecting primary data can find a large amount of data online. The increasing availability of such data online encourages the creative use and cross-linking of information from different data sources. For example, experts in hierarchical models can combine data from individual surveys with aggregate data from different administrative levels of a community (e.g., village, township, county, province, etc.) to examine the factors associated with healthrelated outcomes at each level. The availability of such databases also provides statisticians with real-life data to test new statistical models. Such analyses could identify potential new interventions to existing problems that can subsequently be tested in prospective studies.

4.2 . Disadvantages

Inherent to the nature of the secondary analysis of existing data, the available data are not collected to address the particular research question or to test the particular hypothesis. It is not uncommon that some important third variables were not available for the analysis. Similarly, the data may not be collected for all population subgroups of interest or for all geographic regions of interest. Another problem is that to protect the confidentiality of respondents, publicly available datasets usually delete identifying variables about respondents, variables that may be important in the intended analysis such as zip codes, the names of the primary sampling units, and the race, ethnicity, and specific age of respondents. This can create residual confounding when the omitted variables are crucial covariates to control for in the secondary analysis.

Another major limitation of the analysis of existing data is that the researchers who are analyzing the data are not usually the same individuals as those involved in the data collection process. Therefore, they are probably unaware of study-specific nuances or glitches in the data collection process that may be important to the interpretation of specific variables in the dataset. Sometimes, the amount of documentation is daunting (particularly for complex, large-scale surveys conducted by government agencies), so users may miss important details unless they are prominently presented in the documents. Succinct documentation of important information about the validity of the data (by the provider) and careful examination of all relevant documents (by the user) can mitigate this problem.

5. Government support for secondary analysis of existing data

This paper discusses several issues related to the secondary analysis of existing data. There are definitely limitations to such analyses, but the great advantage is that secondary analyses can dramatically increase the overall efficiency of the research effort and - a secondary advantage - give young researchers with good ideas but little access to research funds the opportunity to test their ideas. Recognizing the importance of making the most of high-quality research data and of rapidly translating research findings into actionable knowledge, starting in 2003 the United States National Institute of Health, the largest funding agency for biomedical research in the world, required all projects with annual direct costs of 500, 000 US dollars or more to include data-sharing plans in their proposals. Moreover, NIH has released several program announcements specifically designed to promote secondary analysis of existing datasets. Other countries and some large health care providers also make registry data available to qualified researchers. These practices ensure that other researchers not involved in the studies or in the creation and maintenance of the registries will be able to use the data generated by these big projects or by the registries to test a wide range of hypotheses. Other governments (including the Chinese government), health-related non-government organizations, and other funders of biomedical research need to follow these examples. Failure to provide qualified researchers access to government-generated registry data or to government-supported research data results in a huge but unnecessary wastage of economic and intellectual resources that could be better employed to improve the health of the nation.

Dr. Hui Cheng is an epidemiologist by training. She is currently a post-doctoral research associate at Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine. She has published findings from studies on mental health related topics using public data. Her main interest is substance use and related problems, and public mental health.

Funding Statement

This work was supported by a grant from the China Medical Board (13-165) to HGC.

Conflict of interest: The authors declare no conflict of interest related to this article.

  • - Google Chrome

Intended for healthcare professionals

  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Epidural analgesia...

Epidural analgesia during labour and severe maternal morbidity: population based study

Linked editorial.

Unlocking maternal health: labour epidurals and severe morbidity

  • Related content
  • Peer review
  • Rachel J Kearns , consultant anaesthetist 1 2 ,
  • Aizhan Kyzayeva , research associate 2 ,
  • Lucy O E Halliday , doctoral student 2 ,
  • Deborah A Lawlor , professor of epidemiology 3 4 ,
  • Martin Shaw , principal clinical physicist 2 5 ,
  • Scott M Nelson , Muirhead chair of obstetrics and gynaecology 2
  • 1 Department of Anaesthesia, Glasgow Royal Infirmary, Glasgow, UK
  • 2 School of Medicine, University of Glasgow, Glasgow Royal Infirmary, Glasgow, G31 2ER, UK
  • 3 MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
  • 4 Population Health Science, University of Bristol, Bristol, UK
  • 5 Department of Medical Physics and Bioengineering, NHS Greater Glasgow and Clyde, Glasgow, UK
  • Correspondence to: R J Kearns rachel.kearns{at}glasgow.ac.uk (or @rjharrison79 on X)
  • Accepted 10 April 2024

Objectives To determine the effect of labour epidural on severe maternal morbidity (SMM) and to explore whether this effect might be greater in women with a medical indication for epidural analgesia during labour, or with preterm labour.

Design Population based study.

Setting All NHS hospitals in Scotland.

Participants 567 216 women in labour at 24+0 to 42+6 weeks’ gestation between 1 January 2007 and 31 December 2019, delivering vaginally or through unplanned caesarean section.

Main outcome measures The primary outcome was SMM, defined as the presence of ≥1 of 21 conditions used by the US Centers for Disease Control and Prevention (CDC) as criteria for SMM, or a critical care admission, with either occurring at any point from date of delivery to 42 days post partum (described as SMM). Secondary outcomes included a composite of ≥1 of the 21 CDC conditions and critical care admission (SMM plus critical care admission), and respiratory morbidity.

Results Of the 567 216 women, 125 024 (22.0%) had epidural analgesia during labour. SMM occurred in 2412 women (4.3 per 1000 births, 95% confidence interval (CI) 4.1 to 4.4). Epidural analgesia was associated with a reduction in SMM (adjusted relative risk 0.65, 95% CI 0.50 to 0.85), SMM plus critical care admission (0.46, 0.29 to 0.73), and respiratory morbidity (0.42, 0.16 to 1.15), although the last of these was underpowered and had wide confidence intervals. Greater risk reductions in SMM were detected among women with a medical indication for epidural analgesia (0.50, 0.34 to 0.72) compared with those with no such indication (0.67, 0.43 to 1.03; P<0.001 for difference). More marked reductions in SMM were seen in women delivering preterm (0.53, 0.37 to 0.76) compared with those delivering at term or post term (1.09, 0.98 to 1.21; P<0.001 for difference). The observed reduced risk of SMM with epidural analgesia was increasingly noticeable as gestational age at birth decreased in the whole cohort, and in women with a medical indication for epidural analgesia.

Conclusion Epidural analgesia during labour was associated with a 35% reduction in SMM, and showed a more pronounced effect in women with medical indications for epidural analgesia and with preterm births. Expanding access to epidural analgesia for all women during labour, and particularly for those at greatest risk, could improve maternal health.

Introduction

The rising incidence of severe maternal morbidity (SMM) constitutes a pressing global issue, compromising the wellbeing of mothers and their children, and resulting in potentially devastating short term and long term consequences. 1 2 SMM is defined by the US Centers for Disease Control and Prevention (CDC) as encompassing 21 indicative conditions or procedures, such as myocardial infarction, eclampsia, and hysterectomy occurring during admission to hospital for delivery. 3 In the UK, the incidence of SMM almost doubled between 2009 and 2018, from 0.9% to 1.7% of deliveries, likely reflecting the trend of mothers being older, more obese, and with increasing comorbidities, along with a rising incidence of previous caesarean delivery. 4 SMM can be conceptualised as an indicator of increased risk for maternal mortality, providing crucial opportunities to identify and implement interventions to improve the health of mothers and their offspring. 5

Epidural analgesia is commonly advised for safety reasons in pregnant women considered at higher risk of SMM, such as those with multiple births, morbid obesity (body mass index (BMI) ≥40), or certain comorbidities, owing to its advantageous physiological effects and capacity to provide expedient anaesthesia if required in an emergency. 6 Women with these factors can be considered as having a medical indication for epidural analgesia during labour. Women giving birth preterm also carry a higher risk of SMM, although epidural analgesia is seldom recommended for preterm labour alone. 7 Despite the assumed benefits of epidural analgesia during labour to prevent SMM, the evidence base for this is limited. We identified just two observational studies that attempted to delineate the association between epidural analgesia during labour and SMM. 8 9 One, a US study (n=574 525), indicated a 14% risk reduction in SMM in women who received epidural analgesia, but it only included vaginal births and excluded the six week postnatal period, during which about 15% of SMM events occur. 8 10 The other study, from France (n=4550), reported a 47% decreased risk of severe postpartum haemorrhage in women with epidural analgesia who gave birth vaginally, but it did not assess other constituents of SMM. 9 Neither of these studies explored whether the association differed between women with a medical indication and those without, or between women who delivered preterm and those who did not. In these two studies from countries with private healthcare systems, the use of epidural analgesia was 47% 8 and 78%, 9 respectively, whereas in the UK, the use of epidural analgesia during labour is around 22-30%, despite healthcare being free at the point of access. 11 12

Notwithstanding that clinicians may advise mothers with medical indications about epidural analgesia during labour, the final decision is up to the woman. The lack of robust evidence on whether benefits exist beyond the provision of epidural analgesia might affect the discussions clinicians have with women and their decisions. Women from minority ethnic groups and areas of socioeconomic deprivation are at higher risk of maternal morbidity and mortality, and they are more likely to have medical indications for epidural analgesia, but are less likely to have one. 13 14 15 Stronger evidence on the effects of epidural analgesia might contribute to reducing these inequalities. The importance of improving this evidence base is highlighted by the priority setting exercises undertaken by the James Lind Alliance, which identified the effect of epidural analgesia on obstetric outcomes as a research priority. 16 The James Lind Alliance brings patients, carers, and clinicians together to identify research priorities.

In this population based cohort analysis of all births in Scotland over a 13 year period, we estimated the causal effect of the use of epidural analgesia during labour on SMM in all mothers, except those undergoing planned caesarean section delivery. Additionally, we explored whether this effect was more pronounced among pregnant women who according to clinical guidelines are at increased risk of SMM (ie, women with a medical indication for epidural analgesia during labour), and in those with preterm labour.

Our methods are reported in accordance with Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidance. 17

Data sources and study population

We linked six Scotland-wide administrative databases: the Scottish Morbidity Record-2 (SMR02), the Scottish Morbidity Record-1 (SMR01), the Scottish Birth Record, the National Records of Scotland, the Scottish Stillbirth Infant Death Survey, and the Scottish Intensive Care Society Audit Group. The SMR02 documents all obstetric inpatient and day case admissions during pregnancy and the postnatal period and includes maternal and infant characteristics. The SMR02 is subject to regular quality assurance checks, with data more than 99% complete since the late 1970s. 18 19 The SMR01 records all non-obstetric inpatient and day case admissions according to ICD-9 and ICD-10 (international classification of diseases, ninth revision and 10th revision, respectively) codes and UK NHS OPCS-4 (Office of Population Censuses and Surveys classification of interventions and procedures). 20 21 All neonatal care is recorded in the Scottish Birth Record. The National Records of Scotland registers all births, stillbirths, and infant deaths, and the Scottish Stillbirth Infant Death Survey collects additional information from the relevant coordinator of the survey (obstetrician, paediatrician, or midwife) at each hospital. The database of the Scottish Intensive Care Society Audit Group records admission data for all Scottish intensive care and high dependency units, with regular data validation. 22

Inclusion and exclusion criteria

We analysed all women in labour in Scotland between 1 January 2007 and 31 December 2019 with gestation between 24+0 and 42+6 weeks. Births were excluded after this period to remove any potential confounding influence of the covid-19 pandemic. We also excluded births when mode of delivery, child identity, or data for analgesia during labour were not recorded (n=38 705, 5.5% of all 697 981 pregnancies considered); see supplementary eFigure 1), as well as births by elective caesarean section as these women knew their mode of delivery in advance and would not experience labour, and therefore by definition could not have chosen to have epidural analgesia (n=92 060, 13.2%; see supplementary eFigure 1).

Epidural analgesia

We defined epidural analgesia during labour as conventional lumbar epidural sited at any time during labour. This definition is consistent with standard medical practices in the UK, where epidural drugs are generally administered only after labour has commenced. We were unable to identify use of combined spinal epidural (spinal injection plus insertion of an epidural catheter), as SMR02 classifies the procedure as spinal anaesthesia. Combined spinal epidural is used infrequently in Scotland, representing only 1% of epidural use during labour. 23 Women recorded as having no epidural could have delivered without additional analgesia or anaesthesia or have required spinal or general anaesthesia for operative delivery, reflecting the unpredictability of labour outcomes and the resultant different potential pathways care may take. Since recording of anaesthetic intervention is hierarchical, we could not identify if women who had a spinal or general anaesthetic also had epidural analgesia at an earlier point. Conversion of epidural analgesia to spinal or general anaesthesia occurs in around 5% of women. 24

The primary outcome was SMM, defined as a composite outcome of ≥1 of 21 conditions according to the US CDC criteria for SMM or a critical care admission, with either occurring at any point from the date of delivery to 42 days post partum (described as SMM). In keeping with other published data, we incorporated critical care admission as an SMM indicator because the CDC’s definition does not cover all SMM events (eg, asthma attack, status epilepticus). 4 We identified conditions using ICD-9, ICD-10, and OPCS codes from SMR01, SMR02, and Scottish Intensive Care Society Audit Group datasets (see supplementary eTable 1 for table of codes). 3 The CDC’s definition of SMM has a sensitivity of 77% and specificity of 99% in identifying SMM compared with medical records. 25

Secondary outcomes aimed to capture more severe morbidity and included ≥1 of the 21 CDC conditions when that condition resulted in admission for critical care (described as SMM plus critical care), and respiratory morbidity (ventilation, tracheostomy, acute respiratory distress syndrome, or respiratory complications of anaesthesia), as diagnosed from the date of delivery to 42 days post partum (see supplementary eTable 1).

Minor modifications were made to the CDC SMM criteria to accommodate data recording practices in Scotland (see supplementary eTable 1). In line with other UK studies, 4 26 we found that the UK definition for postpartum haemorrhage (≥500 mL blood loss) resulted in over-reporting of major obstetric haemorrhage (ICD-10 code O72), and therefore we included postpartum haemorrhage only if it occurred in association with a critical care admission, indicating a clinically significant haemorrhage event. Alternative metrics such as volume of blood loss and blood transfusion are not reliably recorded in SMR02. Similar to a previous Scottish study, we found the incidence of sepsis had increased exponentially from 2012 (see supplementary eFigure 2). 4 This might reflect different coding practices and changes in guidance with the publication of the 2012 Surviving Sepsis recommendations resulting in increased awareness of the condition. 27 28 Because sepsis is defined as the presence of an infection and evidence of acute organ dysfunction, we included it only if associated with admission to a critical care unit.

Given that in our analyses, as in any risk analyses, we censored at the first SMM condition, the difference between the primary outcome and the first secondary outcome is illustrated by considering a mother with eclampsia diagnosed on the day of delivery and acute heart failure diagnosed on postnatal day 22. In the primary analysis, that woman would be censored on the day of delivery. In contrast, a woman with eclampsia diagnosed on the day of delivery who experienced heart failure resulting in critical care admission at 22 days postnatally when heart failure was diagnosed, would be censored at postnatal day 22. Conversely, a woman with the same conditions at the same time points but who was not admitted to critical care for either would not be considered at risk for the secondary outcome of SMM plus critical care admission (and would contribute to the comparator group—no SMM plus critical care).

Confounders and other variables used in analyses

To determine confounding variables before analyses, we used the established definition of a confounder—something that is a known or plausible reason for having both epidural analgesia during labour and SMM, and we considered all potential plausible pathways between these variables. 29 We included these confounders (irrespective of whether they were available in our data) in directed acyclic graphs drawn using the R package “DAGGitty,” 30 to highlight sources of unmeasured confounding and how these might be captured by other measured confounders on the same confounding path (see supplementary eFigures 3a and 3b). We included socioeconomic status and ethnicity as these factors are increasingly recognised as influencing poor maternal outcomes and epidural analgesia use during labour. 13 14 15 Ethnicity was defined using NHS Scotland 2011 census categories. 31 As we did not have information on individual socioeconomic status, we used residential area deprivation according to the Scottish index for multiple deprivation as a proxy; the first 10% of deprivation denoting the most deprived areas and the last 10% the least deprived. 32 Pre-existing comorbidities that plausibly influence the use of epidural analgesia and SMM were defined for each mother by calculating a Bateman index score, an extensively validated, weighted, risk prediction tool including 20 conditions plus maternal age that is specific to obstetric patients and more accurately predicts SMM than other generic comorbidity indices (see supplementary eTable 2) 33 To avoid conflating comorbid conditions with the outcome of SMM, we applied strict criteria, restricting these diagnoses to the period between 180 days before the estimated date of conception (as described in the original paper by Bateman et al) 33 and the day before delivery. This approach ensured the validity of our findings by accurately reflecting the impact of comorbidities on risk of SMM. Using ICD-9 and ICD-10 codes from SMR02, we obtained information on maternal height, weight, and smoking status plus obstetric indices of previous caesarean section, parity, and induction of labour. Gestational age at birth was based on ultrasound assessment in the first half of pregnancy. Smoking status at booking was defined as current, former, or never. Birth location was categorised into obstetric unit, freestanding midwifery unit, or home birth. Obstetric units were defined as hospitals with on-site obstetric and anaesthetic services, inclusive of epidural analgesia provision, or midwifery led units co-located with an obstetric unit. Freestanding midwifery units were defined as midwifery led units without direct access to obstetric or anaesthetic services. 34

In exploratory analyses we assessed whether associations differed by the presence of a medical indication for epidural analgesia and by gestational age. We classified births as preterm if they occurred before 37 weeks’ gestation and as term or post term if they occurred at ≥37+0 weeks. Births were further classified using World Health Organization (WHO) criteria as extremely preterm (<28 weeks), very preterm (28 to <32 weeks), and moderate to late preterm (≥32 to 36+6 weeks), and by whether labour occurred spontaneously or was commenced iatrogenically. 35

We defined medical indications for epidural analgesia as any of serious cardiovascular or respiratory disease (congestive heart failure, congenital heart disease, pulmonary hypertension, ischaemic heart disease, asthma); pre-eclampsia; previous caesarean section; breech presentation; multiple pregnancy; and morbid obesity (BMI ≥40), diagnosed before the date of delivery and with no contraindication to epidural insertion (see supplementary eTable 3). 6 36 37 38 39 40 41 These indications are easily identified by obstetric, anaesthesia, and midwifery staff, reflect criteria that drive common decision making processes, and are in widespread use in clinical practice. These conditions were included if recorded up to the day pre-delivery to ensure they occurred before the decision to have an epidural and any episodes of SMM.

Statistical analysis

As this was a whole population study, we did not perform sample size calculations. We report baseline characteristics by epidural status. Continuous variables are expressed as medians with interquartile range (IQR), and categorical variables as counts and percentages. For group comparison, we used standardised differences.

To adjust for confounders, we used multivariable Poisson regression models with cluster robust sandwich estimators under the generalised estimation equation framework (see supplementary eFigures 3a and 3b). These models were chosen in place of log-binomial models to avoid problems with convergence. The robust estimator was used to correct the inflated variance found from the standard Poisson model, and to account for more than one birth in some women. 42 We also assessed a zero inflated Poisson model using a single zero inflation parameter applied to all observations to account for any excess of zeros in the model. This indicated no excess of zeros (P>0.9), further supporting the use of a multivariable Poisson regression model with cluster robust errors. In the modelling of risk analyses, we censored at the first SMM condition (ie, a mother with two SMM conditions was only counted once in the analysis). These models were used to determine adjusted relative risks and absolute risks. As we a priori assumed that outcomes might differ depending on gestational age, we included this as an interaction and adjusted for all of the other previously defined confounders. To explore potential residual confounding from confounders that we did not consider because evidence was lacking to suggest they would affect epidural use and SMM, we calculated an E-value. 43 The E-value was defined as the minimum strength of association that one confounder or several unmeasured confounders would need to have with both epidural analgesia and SMM, conditional on the confounders we adjusted for, to fully explain a specific exposure-outcome association. This was calculated using the EValue package (version 4.1.3).

Exploratory subgroup analyses

We repeated the same adjusted Poison regression modelling cluster robust sandwich estimators as described for the main analyses in three sets of subgroup analyses: Women with a medical indication and those without a medical indication, women delivering pre-term (<37 completed weeks of gestation) and those delivering at term or post term (≥37 completed weeks), and women with a medical indication and delivering preterm and those with no medical indication and delivering at term or post term.

In each of these analyses we tested statistical evidence for a difference between the two related subgroups by comparing a model with an interaction term (eg, interaction term between epidural analgesia during labour and medical indication—yes v no) using a likelihood ratio test comparing these two models. As analyses between subgroups are often under-powered, we considered a P value <0.01 to provide statistical evidence of a difference.

As our definition of medical indication for epidural analgesia included some components of the Bateman index score and BMI, we removed Bateman index score and maternal height and weight as confounding variables in the models of subgroup analyses that included medical indication (see supplementary eFigure 3b). Finally, to further model the effect of epidural analgesia on women with different underlying risk profiles for SMM, we analysed the association between epidural analgesia and SMM in women with and without an indication for epidural throughout the continuum of gestational ages using robust Poisson regression with non-linear splines.

Additional analyses

Given that epidural analgesia is only available to women delivering in an obstetric unit, we repeated the analyses restricted to births occurring within an obstetric unit (n=541 389, 95.4% of eligible women) and compared the results to our main analyses. We also provided additional subgroup analyses using WHO criteria of preterm births, and by iatrogenic or spontaneous preterm birth. 35

Dealing with missing confounder data

All eligible women (see supplementary eFigure 1) had complete data on epidural analgesia and outcome. Missing data on confounders varied, with the least for maternal age (0 missing) and most for maternal ethnicity (n=222 213, 39.2%) and illicit drug use (n=179 284, 31.6%) ( table 1 ). In total, 257 713 (45.4%) of eligible participants had missing data on ≥1 confounders. We imputed missing data for confounders using multiple imputations through chained equations to form 10 imputed datasets employing a predictive mean matching methodology. 44 Ten iterations assured data output stability, and 10 imputations guaranteed the accuracy of pooled variable effect size estimates.

Maternal and neonatal characteristics of pregnant women after exclusion of data missing for epidural analgesia during labour. Values are number (percentage) unless stated otherwise

  • View inline

We also presented results from non-imputed, complete case analyses (n=309 503) and compared these with our main imputed analyses. In accordance with data regulation guidelines, we redacted any outcome or variable with five or fewer values, or any data that could be used to derive these redacted values.

Patient and public involvement

This study used anonymised data from national registries, focusing on the analysis of existing information without necessitating new direct contact with participants. Despite the inherent limitations of our approach, including the lack of allocated funding for direct patient involvement, we recognised the importance of incorporating public perspectives into our research. While direct involvement in designing the research question, the outcome measures, and study implementation was not feasible, our motivation was strongly influenced by discussions with members of the public and specific concerns highlighted by patients about maternal morbidity rates. These conversations, along with a priority setting exercise by the James Lind Alliance on the impact of epidural analgesia during labour, shaped our research focus. 16 Although formal patient and public involvement was not integrated into the study’s design, we engaged with the public by inviting a patient to review our manuscript, whose insights contributed to refining our presentation and interpretation of findings.

Study population and baseline characteristics

After exclusions, 567 216 women presented in labour in Scotland between 1 January 2007 and 31 December 2019 ( table 1 , see supplementary eFigure 1), of whom 39 601 (7.0%) delivered prematurely. Epidural analgesia was administered to 125 024 (22.0%) women. Of the 77 439 women with a medical indication for treatment, epidural analgesia was administered to 19 061 (24.6%) (see supplementary eFigure 1). Mothers who received epidural analgesia during labour were more likely to be primiparous, be from a less deprived socioeconomic group, be a former or non-smoker, be undergoing labour induction, give birth in an obstetric unit, and have a multiple birth, ≥1 comorbidities, a higher birthweight baby, and operative delivery ( table 1 ). SMM occurred in 2412 women (0.43%) and was more commonly observed in those with a medical indication for epidural analgesia (819/77 439, 1.06%) and in women delivering preterm (581/39 601, 1.47%) ( table 2 and supplementary eTable 4).

Observed events and adjusted relative risks for all outcomes for whole cohort

Temporal trends in SMM

The overall incidence of SMM (irrespective of epidural analgesia status) did not change annually during the study period (relative risk per year 1.00 (95% confidence interval (CI) 0.99 to 1.02, P=0.7) (see supplementary eTables 5 and 6).

Association between epidural analgesia and SMM and related outcomes

Epidural analgesia during labour was associated with a reduction in SMM (adjusted relative risk 0.65, 95% CI 0.50 to 0.85), SMM plus critical care admission (0.46, 0.29 to 0.73), and respiratory morbidity (0.42, 0.16 to 1.15), although the last of these had limited power with wide confidence intervals ( table 2 ).

In subgroup analyses, epidural analgesia was associated with a greater risk reduction in SMM in women with a medical indication for epidural analgesia (0.50, 0.34 to 0.72) versus those without a medical indication (0.67, 0.43 to 1.03); likelihood ratio of difference between subgroups, P<0.001 ( table 3 ). Similarly, we found a greater risk reduction in SMM in women receiving epidural analgesia and delivering prematurely (0.53, 0.37 to 0.76) compared with women delivering at term or post term (1.09, 0.98 to 1.21); likelihood ratio of difference between subgroups, P<0.001, and in women with a medical indication and delivering prematurely (0.36, 0.24 to 0.53) compared with women with no medical indication and delivering at term or post term (1.14, 0.99 to 1.31); likelihood ratio of difference between subgroups, P<0.001 ( table 3 ). The reduced risk of SMM with epidural analgesia seen in the whole cohort and in women with a medical indication for epidural analgesia was more pronounced as gestational age at birth decreased ( fig 1 ).

Comparison of outcomes between women with and without a medical indication for epidural analgesia during labour and those delivering preterm compared with at term or post term

Fig 1

Time varying adjusted absolute risks for severe maternal morbidity (%) in relation to gestational age (in weeks) for whole cohort, women with a medical indication for epidural analgesia, and women with no medical indication for epidural analgesia. Shading represents 95% confidence intervals

  • Download figure
  • Open in new tab
  • Download powerpoint

Robustness of results and sensitivity analysis

E-values suggest our findings are not likely to be solely due to residual confounding (see supplementary eTable 7). Consistent results were observed in analyses limited to births in obstetric units with 24 hour access to obstetric and anaesthetic services (see supplementary eTables 8 and 9). Epidural analgesia was associated with reduced risk of SMM across all categories of preterm birth: extremely preterm (<28 weeks) gestations (0.36, 0.21 to 0.62), very preterm (28 to <32 weeks) gestations (0.48, 0.32 to 0.72), and moderate to late preterm (≥32 to 37 weeks) gestations (0.71, 0.56 to 0.88) (see supplementary eTable 10 ) . This effect was irrespective of whether the reason for the preterm birth was spontaneous or iatrogenic (see supplementary eTable 10). Similar results were seen in both complete case and unimputed datasets ( table 2 , table 3 , and supplementary eTable 11).

In this population based cohort study encompassing 567 216 births in Scotland, epidural analgesia during labour was associated with a 35% risk reduction in SMM and 54% risk reduction in SMM plus critical care admission across all births. These benefits were more pronounced in women with a medical indication for epidural analgesia compared with those without an indication, and in those who delivered preterm compared with those who did not deliver preterm. Women with a higher pre-existing morbidity risk, stemming from either medical or obstetric conditions, spontaneous preterm delivery, or conditions necessitating iatrogenic preterm delivery, face increased risks of adverse events related to their chronic comorbidities, diseases related to preterm birth, haemorrhage, and surgical complications. 4 45 46 47 Our results suggest that these risks might be effectively mitigated by use of epidural analgesia.

Comparison with other studies

Our findings enhance the limited existing literature, 8 9 and respond to a research priority identified by patients and clinical providers. 16 Given that mode of birth is unknown when the decision to use labour epidural analgesia is made, and that around 15% of SMM events will occur in the postnatal period, 10 our study provided a more accurate portrayal of the clinical situation than in the previous US study, which did not include postnatal SMM. 8 As few known modifiable risk factors for SMM exist, and as the incidence of SMM continues to rise, with this increase contributing to the global plateauing of maternal mortality, our findings provide a means to reduce SMM and maternal mortality. 1 4 45 That a large portion of women in whom epidural analgesia would generally be considered medically indicated did not receive one highlights a potential area for intervention.

The latest UK Mothers and Babies: Reducing Risk through Audits and Confidential Enquiries report underlines the uneven distribution of maternal morbidity and mortality, with deaths in women from black ethnic groups four times higher than in women from white ethnic groups, and the mortality risk twofold higher in women from the most deprived areas compared with least deprived areas. 13 Recent UK based studies have shown that women from ethnic minority groups and socioeconomically deprived areas are less likely to receive epidural analgesia, although the underlying reasons remain unclear. 14 15

Policy implications

Misinformation and misconceptions about epidural analgesia, particularly the effect on delivery mode and neonatal wellbeing, might contribute to inequities in epidural use during labour. 48 Existing research, including a Cochrane review of 40 randomised controlled trials and two Scottish population based studies, found that epidural analgesia was not causally linked to an increased risk of operative births and did not adversely affect neonatal or long term childhood outcomes, but these studies did not examine SMM or mortality. 11 49 50 Although a randomised controlled trial would be ideal for confirming our results, the global prevalence of epidural analgesia during labour, its established safety, and the urgency of this research make a strong case for applying our results in clinical practice. Our study offers valuable insights that can potentially reduce inequalities in maternal healthcare by providing robust evidence for individualised, person centred, and informed decision making. To maximise this effect, it is crucial to develop strategies that ensure women from diverse backgrounds, including those in preterm labour, have access to comprehensive information and support about the use of epidural analgesia.

The mechanism by which epidural analgesia could diminish SMM is likely multifaceted, involving closer medical oversight and haemodynamic monitoring, established intravenous access, fluid administration, blunting of physiological stress responses to labour, avoidance of the need for spinal or general anaesthesia for caesarean section, and faster escalation to definitive obstetric interventions. In essence, using epidural analgesia during labour alters the care pathway to one that enhances the capacity to manage adverse events. From these data it is not possible to separate the direct influence of epidural analgesia from the accompanying comprehensive care package. In the UK, implementing epidural analgesia inherently includes this bundle of enhanced care, which could be particularly advantageous for women at heightened risk of SMM.

Strengths and limitations of this study

Our study was undertaken in a large, unselected population cohort of linked mother-infant data over a 13 year period reflecting contemporary obstetric and anaesthetic practices. We adjusted for confounding variables that were defined before analyses started, used imputation for missing confounder data, and showed consistency between the confounder imputed and complete case analyses. The E-value suggested that bias due to unknown confounders was unlikely to have made a major contribution to our results, and additional sensitivity analyses support the robustness of our findings. We had too few cases of respiratory morbidity to provide precise estimates, highlighting the need for larger studies to explore this outcome. As other forms of anaesthesia may be used in more urgent clinical scenarios, such as major haemorrhage, this could have resulted in more favourable results in the epidural analgesia group. Nevertheless, our analysis aimed to reflect the divergent management pathways and outcomes depending on womens’ choice about epidural analgesia during labour. For instance, a woman with a functioning epidural is potentially more likely to undergo an assisted vaginal delivery than a caesarean section. In line with other UK based studies, we only accounted for postpartum haemorrhage when it necessitated critical care admission, potentially underestimating this morbidity. As a result, our findings might have been attenuated towards the null and strengthens our confidence in the effect seen between epidural analgesia and SMM. Our study excluded elective caesarean births, acknowledging that women undergo this procedure before labour starts and therefore by definition will not receive epidural analgesia during labour. While this analysis was not within our study’s scope, we recognise the importance of investigating anaesthetic choices in elective caesarean deliveries in future research, given the different risk profiles. We used widely validated area deprivation indices to indicate socioeconomic status. 32 However, we acknowledge that this may not always reflect individual socioeconomic positions (eg, well educated or wealthy women living in an area with a high deprivation score). As the population of Scotland is predominantly white, our results might not be generalisable to more diverse populations; however, the similarity of our results to those of a US study with an ethnically diverse population increases confidence in our findings. 8 We lacked data on systemic opioid use and maternal haemodynamics, both of which would have been valuable in elucidating the mechanisms by which epidural analgesia during labour could reduce the risk of SMM. Additionally, we did not have information on individual care providers and factors influencing maternal decision making about epidural analgesia. These aspects are crucial for understanding and dealing with potential barriers to the adoption of epidural analgesia during labour.

Conclusions

Our analysis of 567 216 births in Scotland indicates that epidural analgesia during labour is associated with a 35% risk reduction in SMM in all women. This effect was more pronounced in specific groups, showing a 50% risk reduction in women with predefined risk factors, and a 47% reduction in those delivering prematurely. These findings substantiate the current practice of recommending epidural analgesia during labour to women with known risk factors, underscores the importance of ensuring equitable access to such treatment, and highlights the importance of supporting women from diverse backgrounds to be able to make informed decisions relating to epidural analgesia during labour.

What is already known on this topic

Severe maternal morbidity (SMM) is a potentially life threatening outcome of pregnancy

Epidural analgesia during labour may reduce SMM, although evidence is limited

Assessing the effect of epidural analgesia during labour on obstetric outcomes is a research priority for women and healthcare providers

What this study adds

This study showed a reduced risk of SMM in women who received epidural analgesia during labour, with the greatest effects seen in those with a medical indication for epidural analgesia or delivering preterm

Encouraging the adoption of, and enhancing accessibility to, epidural analgesia for women in these higher risk categories could be instrumental in improving maternal health outcomes

Ethics statements

Ethical approval.

The Public Benefit and Privacy Panel for Health and Social Care (HSC-PBPP) of NHS Scotland provided ethical approval for the linkage (ref 1920-0097) and the NHS Greater Glasgow and Clyde Research and Development department approved the study (ref GN20PH059). The NHS Scotland electronic Data Research and Innovation Service linked and deidentified data before analysis.

Data availability statement

Depersonalised study data may be made available on request to accredited researchers who submit a proposal that is approved by NHS Scotland’s electronic Data Research and Innovation Service.

Acknowledgments

We would like to acknowledge the support of the eDRIS team (Public Health Scotland) for obtaining approvals, providing and linking data, and use of the secure analytical platform within the National Safe Haven.

Contributors: MS and SNM are joint senior authors. RJK, MS, and SMN conceived and designed the study. All authors acquired, analysed, or interpreted the data. RJK, MS, and SMN drafted the initial manuscript. All authors critically revised the manuscript for important intellectual content. RJK, MS, AK, and LOEH did the statistical analyses. RJK, MS, DAL, and SMN obtained funding. AK and LOEH provided administrative, technical, or material support. MS, DAL, and SMN supervised the study, obtained regulatory approval, and provided advice on analyses. RJK, AK, MS, and SMN had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. They are the guarantors. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: This work was supported by an NHS Research Scotland senior researcher fellowship (RJK). DAL’s contribution is supported by the UK Medical Research Council (MC_UU_00032/05) and British Heart Foundation (CH/F/20/90003 and AA/18/1/34219). None of the funders had any role in the design, data analyses, or interpretation of results. The views expressed in this publication are those of the author(s) and not necessarily those of the UK NHS, or any funders or institutions acknowledged.

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: support from NHS Research Scotland, the UK Medical Research Council, and British Heart Foundation; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work. Outside of the submitted work, RJK is a board member and research lead for Regional Anaesthesia UK and sits on the research council of the National Institute of Academic Anaesthesia Research Council. RJK has declared funding from NHS National Research Scotland (administered by NHS Greater Glasgow and Clyde), Wellbeing of Women, and the Chief Scientist Office (for research unrelated to this work in the past three years). SMN has participated in advisory boards and received speakers or consultancy fees from Access Fertility, Beckman Coulter, Ferring, Finox, Merck, MSD, Roche Diagnostics, and The Fertility Partnership. SMN has declared funding from the Chief Scientist Office, Wellbeing of Women, and National Institute of and Care Health Research (NIHR), for research unrelated to this work in the past three years. All funds for these grants go to and are managed and audited by the University of Glasgow. DAL has declared funding from the NIHR, Diabetes UK, and US National Institute of Research, for research unrelated to this work in the past three years. All funds for these grants go to and are managed and audited by the University of Bristol. DAL is a member of the UK Biobank strategic oversight committee, chair of the scientific advisory board for the Bradford Health Research Institute public health ActEarly programme, and chair of the NIHR-British Heart Foundation partnership working group on maternal cardiovascular health. She does not receive any payment for these activities. The authors declare no other relationships or activities that could appear to have influenced the submitted work.

Transparency: The lead author (RJK) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported, that no important aspects of the study have been omitted, and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

Dissemination to participants and related patient and public communities: To maximise the impact of our findings, we will employ a multifaceted dissemination strategy, targeting both academic and public audiences. Our plan includes leveraging social media platforms such as X (formerly Twitter) and Facebook to engage with the public, healthcare professionals, and policy makers. We will collaborate with patient advocacy groups (eg, the Maternal Mental Health Alliance ( https://maternalmentalhealthalliance.org/ ) and professional societies such as the Royal College of Obstetricians and Gynaecologists and Obstetric Anaesthetists’ Association to ensure our research reaches a wide audience and is presented in an accessible format, including lay summaries and infographics. Press releases will be distributed to both national and international media outlets, and findings will be presented at national and international conferences to foster academic and clinical discussion. We aim to facilitate a feedback loop by encouraging commentary and discussion through our social media channels, allowing us to gauge public and professional responses to our findings. This feedback will be invaluable for guiding future research directions and policy recommendations, ensuring our work remains aligned with patient needs and priorities.

Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/ .

  • Geller SE ,
  • Garland CE ,
  • MacDonald EJ ,
  • Cranfield K ,
  • Victory G ,
  • ↵ Centers for Disease Control and Prevention. Severe maternal morbidity in the United States. 2019. https://www.cdc.gov/reproductivehealth/maternalinfanthealth/severematernalmorbidity.html . Accessed 10 August 2022.
  • Masterson JA ,
  • Adamestam I ,
  • Pattinson R ,
  • WHO Working Group on Maternal Mortality and Morbidity Classifications
  • ↵ Practice Guidelines for Obstetric Anaesthesia: An Updated Report by the American Society of Anesthesiologists Task Force on Obstetric Anaesthesia and the Society for Obstetric Anaesthesia and Perinatology . Anesthesiol 2016 ; 124 : 270 - 300 . OpenUrl CrossRef PubMed
  • El Ayadi AM ,
  • Jelliffe-Pawlowski L
  • Guglielminotti J ,
  • Friedman AM ,
  • Chihuri S ,
  • Driessen M ,
  • Bouvier-Colle MH ,
  • Khoshnood B ,
  • Rudigoz RC ,
  • Deneux-Tharaux C ,
  • Pithagore6 Group
  • Kuklina EV ,
  • Barfield W ,
  • Kearns RJ ,
  • Gromski PS ,
  • Iliodromiti S ,
  • Lawlor DA ,
  • ↵ NHS patient survey programme. 2019 survey of women’s experiences of maternity care. https://www.cqc.org.uk/sites/default/files/20200128_mat19_statisticalrelease.pdf . Accessed 18 July 2022.
  • Bamber JH ,
  • Goldacre R ,
  • Halliday L ,
  • Kyzayeva A ,
  • Nelson SM ,
  • ↵ James Lind Alliance. Anaesthesia and Perioperative Care Top 10. 2015. https://www.jla.nihr.ac.uk/priority-setting-partnerships/anaesthesia-and-perioperative-care/top-10-priorities/ . Accessed 15 June 2022.
  • von Elm E ,
  • Altman DG ,
  • Pocock SJ ,
  • Gøtzsche PC ,
  • Vandenbroucke JP ,
  • STROBE Initiative
  • ↵ NSS Information and Intelligence. Data Quality Assurance Assessment of SMR02 (Maternity Inpatient and Day Case) Data; Scotland 2017-2018. Isdscotland.org. 2019. https://www.isdscotland.org/Products-and-Services/Data-Quality/docs/20191023-Assessment-of-SMR02-Data-Scotland-2017-2018.pdf . Accessed 14 June 2022.
  • Crossley JA ,
  • ↵ World Health Organization. The ICD-10 classification of mental and behavioural disorders: clinical descriptions and diagnostic guidelines: Geneva: WHO; 1992. https://apps.who.int/iris/handle/10665/37958 . Accessed 20 July 2022.
  • ↵ UK National Health Service Operating Procedure Codes Supplement. OPCS-4.10. 2022. https://digital.nhs.uk/data-and-information/information-standards/information-standards-and-data-collections-including-extractions/publications-and-notifications/standards-and-collections/dapb0084-opcs-classification-of-interventions-and-procedures . Accessed 20 July 2022.
  • ↵ Scottish Intensive Care Society Audit Group. Annual report: audit of critical care in Scotland 2022. https://publichealthscotland.scot/media/21021/full-report.pdf. Accessed 22 November 2023).
  • ↵ Vedagiri Sai R, Rappai G, Johnstone C. Survey of obstetric epidural anaesthetic practises in Scotland [Abstract]. Presented at the Obstetric Anaesthetists’ Association Annual Meeting, Bournemouth, 2013: 43.
  • Halpern SH ,
  • Soliman A ,
  • Ioscovich A
  • McNulty J ,
  • Kurinczuk JJ ,
  • Dellinger RP ,
  • Surviving Sepsis Campaign Guidelines Committee including the Pediatric Subgroup
  • Tidswell R ,
  • Brealey D ,
  • van der Zander B ,
  • Gilthorpe MS ,
  • Liśkiewicz M ,
  • ↵ Scotland Home ISD. Data dictionary A-Z. https://www.ndc.scot.nhs.uk/Dictionary-A-Z/Definitions/index.asp?Search=E&ID=243&Title=Ethnic%20Group . Accessed 15 June 2022.
  • ↵ Scottish Government. Scottish index of multiple deprivation 2020: introductory booklet. https://www.gov /scot/publications/scottish-index-multiple-deprivation-2020/. Accessed 15 June 2022.
  • Bateman BT ,
  • Hernandez-Diaz S ,
  • Brocklehurst P ,
  • Hollowell J ,
  • Birthplace in England Collaborative Group
  • ↵ World Health Organization. Preterm birth. https://www.who.int/news-room/fact-sheets/detail/preterm-birth . Accessed 12 May 2023.
  • American College of Obstetricians and Gynecologists
  • VanderWeele TJ ,
  • Kontopantelis E ,
  • Sperrin M ,
  • Morelli R ,
  • Di Mascio D ,
  • Grobman WA ,
  • Eunice Kennedy Shriver National Institute of Child Health and Human Development Maternal-Fetal Medicine Units Network ,
  • Eunice Kennedy Shriver National Institute of Child Health and Human Development Maternal-Fetal Medicine Units Network
  • D’Souza RS ,
  • D’Souza S ,
  • Anim-Somuah M ,

research based on secondary data analysis

  • Open access
  • Published: 15 June 2024

Identification of key genes for triacylglycerol biosynthesis and storage in herbaceous peony ( Paeonia lactifolra Pall.) seeds based on full-length transcriptome

  • Huajie Xu 1 ,
  • Miao Li 1 ,
  • Jiajun Gao 1 ,
  • Jun Tao 1 , 2 &
  • Jiasong Meng 1 , 2  

BMC Genomics volume  25 , Article number:  601 ( 2024 ) Cite this article

1 Altmetric

Metrics details

The herbaceous peony ( Paeonia lactiflora Pall.) is extensively cultivated in China due to its root being used as a traditional Chinese medicine known as ‘Radix Paeoniae Alba’. In recent years, it has been discovered that its seeds incorporate abundant unsaturated fatty acids, thereby presenting a potential new oilseed plant. Surprisingly, little is known about the full-length transcriptome sequencing of Paeonia lactiflora , limiting research into its gene function and molecular mechanisms.

A total of 484,931 Reads of Inserts (ROI) sequences and 1,455,771 full-Length non-chimeric reads (FLNC) sequences were obtained for CDS prediction, TF analysis, SSR analysis and lncRNA identification. In addition, gene function annotation and gene structure analysis were performed. A total of 4905 transcripts were related to lipid metabolism biosynthesis pathway, belonging to 28 enzymes. We use these data to identify 10 oleosin (OLE) and 5 diacylglycerol acyltransferase (DGAT ) gene members after de-redundancy. The analysis of physicochemical properties and secondary structure showed them similarity in gene family respectively. The phylogenetic analysis showed that the distribution of OLE and DGAT family members was roughly the same as that of Arabidopsis. Quantitative real-time polymerase chain reaction (qRT–PCR) analyses revealed expression changes in different seed development stages, and showed a trend of increasing and then decreasing.

In summary, these results provide new insights into the molecular mechanism of triacylglycerol (TAG) biosynthesis and storage during the seedling stage in Paeonia lactiflora . It provides theoretical references for selecting and breeding oil varieties and understanding the functions of oil storage as well as lipid synthesis related genes in Paeonia lactiflora .

Peer Review reports

Introduction

In China, herbaceous peony ( Paeonia lactiflora Pall.) is a famous flower with excellent ornamental value, it belongs to paeonia, paeoniaceae. There is only one genus of peonies in the family paeoniaceae, among which the herbaceous peony is widely loved for its large and beautiful flowers, it symbolizes wealth, prosperity and happiness. There are eight species of herbaceous peony in China (Supplementary table: Table S1), among which Paeonia lactiflora is the most widely spread throughout the country [ 1 ]. As a member of herbaceous peony, ‘Hangshao’ is mainly cultivated in areas such as Zhejiang, Sichuan and Anhui due to its medicinal value and clearly characterized by white or pink single petals. In recent years, with the recognition of tree peony as a new type of oil resources [ 2 ], the research on the oil function of herbaceous peony in the same family and genus has been increasingly emphasised. Additionally, the oil yield of ‘Hangshao’ seeds tended to increase with seed development [ 3 ], and it has been shown that the seed yield of ‘Hangshao’ at maturity is higher than the oil peony [ 4 ]. It is expected to be developed as a new oil plant due to the seed of ‘Hangshao’ has a high fruiting rate, oil content and unsaturated fatty acid content [ 5 ]. Consequently, ‘Hangshao’ was used the material of transcriptome sequencing to lay the foundation for exploring the molecular mechanism of lipid synthesis in ‘Hangshao’. Unfortunately, no high-quality genome sequence is available for reference in herbaceous peony, and thus transcriptome sequencing offers a valuable alternative for gene mining and functional characterization [ 6 , 7 ].

The oil of oil-bearing crop is mainly distributed in seeds, and the formation and accumulation of oil in seeds, mainly include fatty acid synthesis, triacylglycerol (TAG) assembly and oil body formation, involving a series of physiological and biochemical processes [ 8 , 9 , 10 , 11 ]. Lipids are mainly stored as the form of triacylglycerols in seed oil bodies, which are generally liquid matrices of triacylglycerols on the inside and a single layer of phospholipids on the outside, and several binding proteins are embedded in this semi-unit membrane. Among them, oleosin (OLE) plays important roles in the formation and stability of oil body, that is the earliest and most abundant protein found in the oil binding protein, while diacylglycerol acyltransferase (DGAT) directly involved in TAG synthesis [ 12 ]. The function of DGAT in TAG synthesis has been validated in peanut and oleaginous yeast [ 13 ]. It was shown that heterologous expression of AhDGAT1-1 and AhDGAT1-2 in yeast restored the ability of mutant yeast to lipids synthesis, and that heterologous expression of AhDGAT2a and AhDGAT2b in Escherichia coli significantly increased the fatty acid content of E. coli [ 14 ]. Excessive expression of the OLE gene can prevent oil melting to maintain the size of oil body, as in Arabidopsis AtOLE1 mutants, late seed stage leads to oil melting due to the lack of oil proteins. The product becomes larger, making the developing seeds more sensitive to low temperatures [ 15 ], indicating OLE can be used as a key protein for seed frost resistance [ 16 ]. In addition, the BnOLE gene promotes transgenic Arabidopsis seeds development and increased oil content [ 17 ], oil proteins can be used as binding sites for lipases, mobilizing for the storage of TAG to provides energy for seed germination [ 18 ]. The OLE and DGAT gene plays important roles in promoting seed development, regulates oil morphology and increases seed oil content quantity. However, studies on OLE and DGAT in herbaceous peony seeds have been reported rarely.

Currently, three generations of transcriptome sequencing enables sequencing reads in the size of thousands of bases [ 19 ], showing more RNA molecules [ 20 ], which have been applied to investigate full-length transcriptomes of different species, such as wheat [ 21 ], salvia [ 22 ], sorghum [ 23 ], maize [ 24 ], sugarcane [ 25 ], perennial rye grass [ 26 ],Chinese cabbage [ 27 ], etc. Combining RNA-Seq, Iso-Seq and proteomic identification methods, Zhu investigated the mechanism of Alternative Splicing (AS) in the model plant Arabidopsis after treatment with abscisic acid (ABA) [ 28 ]. Studies have compared transcriptional differences in different parts of bamboo using Iso-Seq, revealing the growth and development mechanisms of underground rhizomes in Phyllostachys heterocycla . In conclusion, the three generation transcriptome sequencing technology has been widely applied, especially advancing research in the field of plant. The purpose of this study is to apply PacBio full-length sequencing to provide a basis for in-depth understanding of the OLE and DGAT gene family in P. lactiflora , this paper mainly collected young leaves, roots, stems, seeds, flowers and stamens for full-length transcriptome sequencing, and analyzed the P. lactiflora ‘Hangshao’ transcriptome, will provide valuable genetic resources for further study of the evolutionary and biological functions of Paeonia lactiflora.

Full-length transcriptome sequencing with SMRT analysis

Through the PacBio Sequel platform, we co-sequenced a sample and established a total of PacBio IsoSeq library which yielded 554,117 polymerase reads (41.35 GB), in total, 170,904 genes were detected. The ROI sequence was extracted from the original sequence according to the condition that full passes ≥ 0 and the sequence accuracy ≥ 0.75. Then calculate the offline date, the number of ROI in the library, the number of bases for the ROI, and the Mean Read Length of Insert sequence. Based on the test results, a total of 484,931 ROI sequences were generated in SMRT cell sequencing, and the Mean Read Quality of Insert was above 97% (Supplementary table: Table S2).

By screening short fragments < 300 bp, sequences containing both 3’primers and 5’ primers with the presence of poly A tail before the 3’primers were defined as full-length sequences. After further screening and analysis, 1,455,771 full-length non-chimeric (FLNC) reads were obtained, and the peak movements of the two charts were consistent and in line with expectations (Fig.  1 ). Furthermore, CD-Hit-V4.6.7 was used to remove redundancy for subsequent analysis. 1,335,148 transcripts were obtained and the common gene samples are 282,635. The total length was 319,979,564 bp, the maximum length of the 282,635 genes was 42,047 bp, the minimum was 200 bp, and the GC content was 41.33% (Table  1 ). The obtained de-redundant transcripts were sorted by length, and the resulting N50 and N90 statistics were 1,514 and 584 bp, respectively. Quality control of raw reads was conducted with FASTP to filter low-quality data and clean the obtained reads. All data met the requirements and could be conducted in subsequent tests.

figure 1

Quality and length distribution of Reads of insert (ROI), full-length non-chimeric (FLNC) and Isoforms. A , B Quality and length distribution of ROI. C , D Quality and length distribution of FL. E , F Quality and length distribution of Isoforms

Functional annotation of genes

The GO annotation system consists of three main branches, along with biological processes, molecular functions, and cellular components. After GO annotation of the obtained isoforms, 51 biological function annotations were obtained under three categories. In the biological process, the cellular process, metabolic process and single-organism process were among the 20 terms that accounted for high proportions. In the cellular component, the cell, cell part, membrane, membrane part and organelle were among the 16 terms that accounted for high proportions. In the molecular function, the catalytic activities and binding were among the 15 terms that accounted for high proportions (Fig.  2 A).

figure 2

Function annotation of transcripts. A Distribution of GO terms for all annotated transcripts in biological process, cellular component and molecular function. B The COG function classification of consensus sequence. C The Nr Homologous species distribution

Furthermore, we annotated the full-length transcriptome with the COG database, and the 166,100 annotated genes were associated with 25 processes such as RNA processing and modification, among which the Signal transduction mechanisms (24,646), Posttranslational modification, protein turnover, chaperones (20,078), and the General function prediction only (39,395) were most abundant, the Lipid transport and metabolism was annotated 8,451, these transcripts associated with lipid metabolism may be involved in the biosynthesis of unsaturated fatty acids and lipid metabolism pathways of the herbaceous peony, while Cell motility (314) and Nucleotide transport and metabolism (1,379) were less abundant (Fig.  2 B).

We have submitted the final polished consensus mRNA sequence to the NCBI. Blast software compares non-redundant transcripts with Nr, Nt SwissProt, GO, COG, Pfam, and KEGG databases. A total of 282,635 transcript annotation information points were obtained. Among these isoforms, 210,927 were observed in Nr (74.64%), 174,649 in Nt (61.79%), 161,615 in SwissProt (57.18%), 166,100 in COG (58.77%), 131,865 in Pfam (46.66%), 165,253 in GO (58.47%), and 164,473 in KEGG (58.19%) (Table  2 ). We looked for homologous species by sequence alignment. The permutation of transcripts among the Nr 210,972 isoforms shows the largest distribution of transcripts in Vitis vinifera (14.42%), followed by Nyssa sinensis (9.16%) and Actinidia chinensis (2.60%) (Fig.  2 C).

Gene structure analysis

Firstly, we conducted transcriptome-wide identification of transcription factor families from Paeonia lactiflora full-length transcriptome using animalTFDB2.0 [ 29 ]. In this study, a total of 4,735 transcripts encoding 59 types of TFs were identifed through blasting with PlnTFDB database. The most abundant transcription factor families are MYB (557), MYB-related (449), AP2-EREBP (349), C3H (283), GRAS (262) and bHLH (253) (Fig.  3 A). Analysis of the transcription factor family of Paeonia lactiflora 'Hangshao' allowed a deeper understanding of their interactions with target genes and gene regulatory networks, laying a solid foundation for later studies.

figure 3

Gene structure analysis of transcripts. A Transcription factor (TF) analysis. B The simple sequence repeats (SSR) analysis. C Venn diagram of lncRNAs prediction. D The coding sequence (CDS) length distribution

Additionally, full-length transcriptome has been helpful for marking discovery of simple sequence repeats (SSR). MISA ( http://pgrc.ipkgatersleben.de/misa/misa.html ) was used to identify SSRs. The primary type of SSRs (> 6,4000 SSRs) was mono-nucleotide, followed with di-nucleotide (~ 10,000SSRs) (Fig.  3 B). We found that mono-, di- and Tri- nucleotide repeats (77.42%) were the dominant motifs for SSR loci, with mono- and di- nucleotide repeat types accounting for 68.86% of the overall number of SSR motifs, which may indirectly account for the complexity and diversity in ‘Hangshao’.

Furthermore, we used four methods to predict long non-coding RNAs (lncRNAs) in the full-length transcriptome. The lncRNA were predicted by CNCI [ 30 ], txCdsPredict [ 31 ], CPC [ 31 ], and Pfam [ 32 ]. A total number of 217,304 lncRNAs were found in the full-length transcriptome. A total of 133,107 lncRNAs, 174,808 lncRNAs, 161,960 lncRNAs, 182,099 lncRNAs were found using CPC, txCdsPredict, CNCI, Pfam, respectively. Subsequently, we conducted an upset plot analysis of lncRNAs predicted by the four kinds of software and found that a total of 105,832 lncRNAs existed simultaneously (Fig.  3 C).

The gene structure analysis was conducted based on CDS prediction, SSR analysis, lncRNA prediction, and transcriptional factor analysis. The coding sequence (CDS) is a sequence that encodes a protein product. Predicting the CDS of a protein is helpful for preliminary genetic analysis and is the basis for subsequent analysis of the protein structure. CDS prediction analysis was conducted using ANGEL software [ 33 ]. In CDS prediction, the CDS length of over 90% is < 3,000 bp. A total of 152,639 CDS were predicted, mainly between 400 and 3000 bp in length (Fig.  3 D).

Identification of enzyme genes in lipid metabolism biosynthesis

Based on the functional annotations of the genes, we identified 10,151 transcripts associated with lipid metabolism (Fig.  4 ). These transcripts were associated with 13 metabolic pathways: fatty acid biosynthesis (742 transcripts), fatty acid elongation (366 transcripts), fatty acid degradation (1,249 transcripts), cutin, suberine and wax biosynthesis (443 transcripts), steroid biosynthesis (385 transcripts), glycerolipid metabolism (1,369 transcripts), glycerophospholipid metabolism (1,749 transcripts), ether lipid metabolism (525 transcripts), sphingolipid metabolism (1,157 transcripts), arachidonic acid metabolism (476 transcripts), linoleic acid metabolism (212 transcripts), alpha-linolenic acid metabolism (869 transcripts), biosynthesis of unsaturated fatty acids (609 transcripts). Of these 10,151 transcripts, 4,905 were associated with the biosynthesis of unsaturated fatty acids and oil accumulation, including fatty acid biosynthesis (474 transcripts), fatty acid elongation (362 transcripts), biosynthesis of unsaturated fatty acids (3,091 transcripts), triacylglycerol (TAG) biosynthesis (616 transcripts) and lipid storage (362 transcripts) (Supplementary table: Table S3).

figure 4

Lipid metabolism pathway related genes

Referring to a previously published paper [ 5 ], which speculated that MCAT , KASIII , FATA , SAD , FAD , DGAT and OLE are the key genes for the biosynthesis of unsaturated fatty acids and oil accumulation in herbaceous peony seeds, we mainly analysed the above seven genes. The malonyl CoA ACP transacylase (MCAT) is the main substrate of the subsequent condensation reaction cycle, converting malonyl-CoA to malonyl-ACP. Only 2 transcripts was identified as MCAT . Subsequently, 3-Ketoacyl-ACP synthase III (KASIII) catalyses the conversion of malonyl-CoA to β-ketobutyryl-ACP., and 7 transcripts for KASIII was identified. In the initial step, stearoyl-ACP desaturase (SAD) catalyzes the dehydrogenation process, converting C18:0-ACP into C18:1-ACP within the plastid, and 85 transcripts were pinpointed as SAD . Then, the fatty acyl-ACP thioesterase A (FATA) converts C18:1-ACP to C18:1, which makes up the free fatty acid (FFA). Only 14 transcripts for FATA was identified. Lysophosphatidylcholine acyltransferase (LPCAT) and fatty acid desaturase (FAD) are involved in the biosynthesis of unsaturated fatty acids by facilitating the exchange of unsaturated fatty acids between PC Pool and Acyl-CoA Pool. We identified 27, 2819 transcripts as LPCAT and FAD , respectively. The synthesis of TAG from glycerol-3-phosphate and acyl-CoA known as the Kennedy pathway. Diacylglycerol acyltransferase (DGAT) catalyses the final step of TAG synthesis, while oleosin (OLE) and caleosin (CLO) are mainly involved in TAG storage. We identified 91, 285 and 77 transcripts as DGAT , OLE and CLO . In most cases, more than one transcript were annotated as the same enzyme, and the transcripts number encoding fatty acid desaturase (FAD) were the most (2,819 transcripts) and followed by oleosin (285 transcripts). The critical steps and key enzymes are shown in Fig.  5 . The full names of the individual genes in the figure are detailed in supplementary files (Supplementary table: Table S4).

figure 5

The proposed pathways and genes involved in lipid metabolism in the Paeonia lactiflora ‘Hangshao’. This model was developed based on the transcriptome data obtained in this study and information from Meng et al. [ 5 ], Zhang et al. [ 34 ] and Zhong et al. [ 35 ]

Selection and identification of OLE and DGAT genes utilizing full-length transcriptome

After de-redundancy of the full-length transcriptome database, 10 OLE and 5 DGAT family genes were identified. OLEs were first found from mustard greens, but the isolate of this protein was originally derived from peanut seeds. Subsequently, a number of plant OLE genes were cloned and identified, including mustard, sunflower, cotton, sesame and woody oil plant oil tea [ 36 ]. Currently, OLE gene family studies have been conducted in Arabidopsis, peanuts, and some legumes [ 37 , 38 , 39 ]. The final step of TAG synthesis to be completed involved the catalysis of DGAT . It has been shown that modulation of the expression of DGAT , an acyltransferase at the sn-3 locus, can affect the content of ALA. For example, decreasing the expression of CsDGAT in Camelina sativa can increase the content of ALA in its oil [ 40 ]. In order to clarify the relevant protein information of the OLE family of Paeonia lactiflora , the physicochemical properties and secondary structural elements of the OLE family members were analyzed by ProtParam and SOPMA. The results showed that the amino acid quantity was between 89–220, the molecular weight was between 9.21kD-23.60kD, and the isoelectric point was between 5.40–10.45. The major secondary structure of other PlOLEs is dominated by αlpha-helix, followed by random coil and extended strand apart from PlOLE1 and PlOLE10 , while their beta-turn accounting for the least. Compared with OLE in Arachis hypogaea , there are also similarity secondary structure, but there are still differences in the ratios [ 41 ] (Table  3 ). Subsequently, we analyzed the basic characteristics of the five identified PlDGAT genes, including physicochemical properties and secondary structural elements. Among these PlDGAT proteins, PlDGAT2 were the smallest PlDGAT genes identified, encoding a total of 326 amino acids, while the rest of the genes encoded from 391 to 517 amino acids. The relative molecular weight and isoelectric point analysis of the encoded proteins revealed that their relative molecular weights ranged from 36.68 to 58.79 kDa, and their isoelectric points ranged from 7.18 to 9.28. The aliphatic index is between 78.24 and 103.81, the grand average of hydropathicity (GRAVY) is between -0.431 and -0.261, which means that all five PlDGATs are hydrophilic proteins. According to instability index, PlDGAT1, PlDGAT2 and PlWSD2 belong to instability protein, while PlDGAT3 and PlWSD1 belong to stability protein. The secondary structure of them is dominated by αlpha-helix and random coil, followed by extended strand, with minimal to beta-turn (Table  3 ).

Conserved Domains and Phylogenetic Analysis of OLEs and DGATs

Analysis of protein domains using Pfam and SMART, it was found that these OLE proteins all have conserved structures (Pfam: PF01277), while DGAT was divided into four subfamilies. In addition, we found that the domain distribution of the members of the OLE and DGAT family was roughly the same as that of Arabidopsis, indicating that the conserved domain of the family was positionally conserved across species. However, the functional similarities of these genes are unclear. The genetic evolutionary relationship between Paeonia lactiflora and Arabidopsis thaliana was analyzed by MEGA7.0 [ 42 ] software, and it was found that PlOLE2 were highly similar to Arabidopsis protein (Fig.  6 ). Each OLE gene contains motif 1, at the same time, PlOLE2 , PlOLE4 and PlOLE6 contains the most motif. The genetic evolutionary relationship of DGAT among Paeonia lactiflora , Arabidopsis thaliana , Oryza sativa , Glycine max and Paeonia rockii was analysed using MEGA 7.0 software (Fig.  7 A). To better characterize the PlDGAT family, the motifs in PlDGAT protein sequences were predicted using the MEME online software (Fig.  7 B). Based on the number of DGAT domains and the zinc-finger motifs, the putative DGAT proteins could be classified into 4 main groups. It was found that one genes were classified as DGAT1 subfamily, one as DGAT2 subfamily, one as DGAT3 subfamily and two as WSD/DGAT subfamily. Moreover, the conserved domains of each subfamily have a distinct similarity and even contain the same motifs. There were 10 distinct motifs that were identified, and the number of motifs in each DGAT varied between 4 and 10. Most PlDGATs in the same subgroup had similar motif compositions. For example, motif 2–5, 7, and 10 only appeared in DGAT2 subfamily, motif 1–10 only appeared in DGAT2 subfamily, motif 3, 6, 9 occurred in WSD/DGAT subfamily. Interestingly, DGAT1 subfamily was very similar to DGAT2 subfamily, which was consistent with the fact that they have degree of homology (Fig.  7 ).

figure 6

Bioinformatics analysis of PlOLE members. A Phylogenetic tree of plant OLE homologous proteins. The phylogenetic tree was constructed with neighbor-joining method using MEGA7.0. The statistical reliability of the tree topology was assessed by a bootstrap analysis with 1000 replicates. B Schematic diagram of amino acid motifs of OLE protein

figure 7

Bioinformatics analysis of PlDGAT members. A Phylogenetic tree of plant DGAT homologous proteins between Paeonia lactiflora , Arabidopsis , Oryza sativa , Glycine max and Paeonia rockii . The phylogenetic tree was constructed with neighbor-joining method using MEGA7.0. The statistical reliability of the tree topology was assessed by a bootstrap analysis with 1000 replicates. B Schematic diagram of amino acid motifs of DGAT protein

Gene expression analysis

We analyzed the expression levels of 10 OLE and 5 DGAT family members on roots, stems, leaves, flowers, stamens, and seeds including 30 days after flower (DAF), 45DAF, 60DAF, 75DAF, and 90DAF to explore whether the expression of OLE and DGAT genes in different tissues and at different times followed certain expression patterns, and whether these genes were specifically expressed in different tissues. The results obtained are analyzed using TBtools software, and the darker the color, the higher the expression level (Fig.  8 ). The results showed that the OLE gene family was expressed at higher levels in roots, leaves and flowers than in stems and stamens, while the DGAT gene family was expressed at higher levels in roots than in stems, leaves flowers, and stamens, and both of them at the highest level in seeds. Most of the genes showed an increasing at first and then tended to decreasing with the time of seed development in Paeonia lactiflora . This also indirectly speculates that OLE and DGAT are involved in the synthesis and accumulation of unsaturated fatty acids by influencing the seed developmental of herbaceous peony,that is beneficial to lay foundations for a more in-depth study of their functions.

figure 8

Verification of genes by qRT-PCR. A Heatmaps of expression levels of 10 nonredundant PlOLEs in the seed of ‘Hangshao’ at different tissues and five developmental stages. B Expression heatmap of 5 nonredundant PlDGATs in the seed of ‘Hangshao’ at different tissues and five developmental stages. The relative expression value in red indicates the darker the color, the higher the expression level

Discussions

Paeonia lactiflora as a traditional Chinese flower, because of its large and showy flower is widely loved by people. In recent years, research on herbaceous peony has mainly focused on specific tissues, little research has been done on its full-length transcriptome. With the rapid development of molecular technologies, molecular genetic modification has become a powerful method for flower breeding.

To date, full-length transcriptome information of many species has been obtained through the SMRT technology. For example, for Alfalfa, 21.53 Gb of clean data was obtained using the full-length transcriptome [ 43 ], and for maize, 55 Gb of clean data was obtained [ 44 ]. It also has been extensively studied in horticulture. For lily, about 36 Gb giant genome was acquired, that will deepen understanding of its bulbil outgrowth [ 45 ], for tree peony, a total of 21.27 Gb clean reads were obtained, unveiling potential mechanisms of brassinosteroid-induced delayed flowering in peony [ 46 ], for Camellia oleifera , cv. Min 43 (M43) contained 41.49 Gb clean reads, and cv. Hongguo (HG) contained 38.99 Gb clean reads, help to unveil potential mechanisms of triacylglycerol degradation during seed desiccation [ 47 ]. In this study, a total of 10,187,282 subreads were obtained from 41.35 Gb of data using SMRT sequencing technology. We clustered the corrected transcript sequences according to the 95% similarity among the sequences, then remove redundancy and finally obtained 1,335,148 specific transcripts. 484,931 ROI sequences were obtained and 1,455,771 FLNC transcripts for further functional annotation, CDS and transcription factor prediction, SSR analysis, and lncRNA identification. The COG database annotated genes related to lipid transport and metabolism, while metabolic processes are the terms that account for a relatively high proportion of the GO annotation system. We then performed structural analysis and functional annotation of these transcripts, which provided an important database for further molecular studies on herbaceous peony. Since Paeonia lactiflora does not yet have a wide-genome, it is particularly important to study the molecular mechanisms of peony through a full-length transcriptome. A large number of full-length transcripts were obtained through the full-length transcriptome, which provided more information for the molecular mechanism of subsequent herbaceous peony growth and development, and also laid an important foundation for molecular breeding.

In recent years, many studies have found that the seed fruiting rate, oil content and unsaturated fatty acid content of 'Hangshao' have a well performance, and close to the Paeonia suffruticosa variety 'Fengdan', which is expected to be developed into a new type of oil plant [ 5 ]. In order to avoid a huge waste, we have carried out extensive studies on its fatty acid biosynthesis pathway, since the seeds of Paeonia lactiflora are rich in unsaturated fatty acids. Fats and oils are the main source of energy metabolism in living organisms, mainly synthesised in the form of TAG in plants [ 48 ]. It was found that oleosin regulate lipid metabolism during seed germination [ 49 ], diacylglycerol acyltransferase (DGAT) is considered to be the key enzyme for the last step of triacylglycerol synthesis and the only rate-limiting enzyme, both of which play key roles in the biosynthesis and storage of TAG. Consequently, 4,905 genes in pathways related to lipid metabolism were annotated with transcriptome sequencing. A recent study in Paeonia lactifolra found that the comparative transcriptome analysis of herbaceous peony at different development stages provides an effective way to study gene differential expression patterns and dissect oil synthesis candidate genes [ 5 ]. In our study, we had identified and analysed 10 PlOLEs and 5 PlDGATs using the full-length transcriptome data of Paeonia lactiflora after de-redundancy, which is of significance in studying lipid metabolism in this species.

Oleosin protein is a structural protein that is first isolated and identified on seed oil bodies [ 50 ]. It consists of three parts, the N-terminal hydrophilic domain, the hydrophobic central structural domain and the most conservative hydrophobic hairpin zone (about 72 residues) and the C-terminal α-helical structural domain [ 51 ]. Amphiphilic oleosins are able to stabilize intracellular hydrophobic triglycerides (TAG) by inserting their hairpin regions into the oil body and exposing their N- and C-terminal hydrophilic regions [ 52 ]. To data, oleosin protein have been successively reported in different oil crops, such as soybean [ 53 ], vernicia tree [ 54 ] and peanut [ 55 ] etc. In Cyperus esculentus , 9 OLE and 21 CLO genes were identified, which can be provided a reference for the development of strategies to improve oil content of C. esculentus tubers [ 56 ]. In Carthamus tinctorius , 8 putative OLE genes were identified from the genome database, providing a way of elucidating the intricate mechanisms of oil body synthesis [ 57 ]. Using the full-length transcriptome we identified 10 PlOLEs , and the number of genes was not significantly different from the other species, proving the reliability of the results. Protein physicochemical properties and phylogenetic analysis showed that they also share certain similarities. The results indicated that the amino acid quantity was between 89–220, the molecular weight was between 9.21kD-23.60kD, and the isoelectric point was between 5.09–10.45. Phylogenetic and motif analysis showed that ten oleosin proteins are homologous to Arabidopsis and each of them contains motif 1, indicating that they are highly conserved here. We found that the OLE involved in the TAG assembly were highly expressed at the 45d of seed development, concomitant to the active oil biosynthesis in this period. Overall, we found that the expression patterns of 10 OLEs verified by qRT-PCR at higher levels in seed than in other tissues, and showed a trend of increasing first and then decreasing with the development of seeds (Fig.  8 A).

DGAT is responsible for transferring acyl of acyl CoA to DAG and plays a key role in controlling lipid synthesis [ 58 ]. Many studies have been conducted to increase TAG production and fatty acid content by manipulating the DGAT gene. Four subfamilies of DAGT enzymes have been identified in plants, DGAT1, DGAT2, DGAT3 and WSD/DGAT, respectively. For example, in Arabidopsis and most oilseed crops, DGAT2 are generally specialized in catalyzing the acylation of unusual fatty acids onto DAG molecule, and hence responsible for the content of TAG containing unusual fatty acids, whereas DGAT1 was regarded as the key player in determining oil content in seeds. However, in peanut, all three DGATs ( DGAT1 , DGAT2 , and DGAT3 ) are involved in TAG synthesis [ 59 ]. In Paeonia rockii , PrDGAT3 is essential in TAG synthesis and has a substrate preference for polyunsaturated fatty acids, especially LA and ALA. A recent study in Zea mays , overexpression of DGAT1 not only increased the oil content of maize seeds, but also altered the composition of seed lipids [ 60 ]. In this study, the transcript of DGAT3 were more abundant than DGAT2 in herbaceous peony (Fig.  8 B), in congruence with previous studies [ 61 ]. In Physaria fendleri , four PfDGATs were identified [ 61 ]. Through genome identification analysis, 7, 7, 9, and 10 members of the DGAT family were identified in maize, rice, sorghum, and foxtail millet, respectively [ 62 ]. We identified 5 PlDGATs based on full-length transcriptome. The physical and chemical properties indicated that the protein numbers was range from 326 to 517, the molecular weight was range from 36.68kD to 58.79kD, and the isoelectric point was range from 7.18 to 9.28. According to secondary structure analysis, three of them belong to unstable proteins, and four proteins mainly dominated by irregular. Phylogenetic and motif analysis showed that 5 PlDGATs were homologous to soybean, rice and Arabidopsis , of which PlDGATs were distributed in four subfamilies, the composition of motifs of the same subfamily is essentially the same. The gene structures of DGAT members of different subfamilies differed significantly, whereas the distribution of motif structures among members of the same subfamily was basically the same, suggesting that different DGAT subfamilies have a high degree of conservatism while undergoing parallel evolution, and that the generation of such differences in gene structure may be a conserved mode of evolution for the DGAT gene family. PlDGAT1 , PlDGAT2 and PlDGAT3 were highly expressed at 45d of seed development, in congruence with the accumulation rate of fatty acids in herbaceous peony seeds, while the expression levels of PlWSD1 and PlWSD2 generally increase and reach the highest level in the late stage of seed development. The expression pattern of PlDGATs at higher level in seeds than in other tissues. The result is consistent with PlOLEs , indicating that they play an important role in seed development period.

In conclusion, PlOLEs and PlDGATs had a significant response in the initial period of seeds development and a higher expression level in seeds compared with other tissues. In general, this finding significantly improves our knowledge of the biosynthesis pathways of lipid metabolism, this study provides a basis for further research on the molecular functions and regulatory mechanisms of PlOLEs and PlDGATs .

Conclusions

In this study, we used the full-length transcriptome to reveal the molecular mechanisms of herbaceous peony, providing a basis for subsequent research on the herbaceous peony. We identified and analysed genes associated with the biosynthesis pathway of lipid metabolism, it was found that lipid metabolism is completed in plastid and endoplasmic reticulum, OLE and DGAT are involved in the Kennedy pathway. In addition, we identified 10 PlOLE and 5 PlDGAT family members and analyzed their physicochemical properties, conservative protein motifs, and phylogenetic trees. Finally, we analyzed the expression patterns of PlOLEs and PlDGATs to help us to better understand the functionality which may play roles in lipid metabolism pathways.

Materials and methods

Plant materials.

The plant materials used in this experiment was ‘Hangshao’ variety of Paeonia lactiflora from the germplasm repository of college of Horticulture and Landscape Architecture, Yangzhou University, Jiangsu Province (32°23′31’N, 119°24′50’E).According to our previous experiment, young leaves, stems, roots, flowers, seeds which are collected 30, 45, 60, 75, and 90 days after flowering, and stamen of ‘Hangshao’ [ 5 ] (Fig.  9 ). Seeds, leaves, flowers, stamen, roots and stems used for qRT-PCR from the same herbaceous peony plant.

figure 9

The tissues of Paeonia lactiflora Pall. used in this study

RNA sample preparation

There were three biological replicates for each sample, and stored in liquid nitrogen for RNA extraction. RNA was extracted from plant tissue using CTAB methods. To ensure the accuracy of the sequencing data, all RNA samples quality were measured using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientifc, Waltham, MA, USA). The RNA integrity was checked using an Agilent 2100 bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), which included RIN, 28S, 18S and 5S peaks. Electrophoresis was used to detect whether the RNA samples contained gDNA contamination and assess RNA quality by identifying the ribosomal bands.

Library construction and SMRT sequencing

After RNA quality testing, we first mixed an equal amount of high-quality RNA from different tissues of ‘Hangshao’ and then mixed it into a sample bank [ 63 ]. Extract all digested RNA samples, thermal degeneration opens its secondary structure, enrich mRNA using oligo (dT) magnetic beads. The divalent cations were applied to manage the fragmentation under elevated temperatures. The first strand of cDNA was synthesized using the UMI base PCR cDNA Synthesis Kit (BGI), and the first strand of cDNA was amplified by PCR to synthesize the double-strand cDNA. Prepare the reaction system, the temperature reaction for a certain time, repair the double-strand cDNA end, and add an A base at the 3' end, prepare the linker to connect the reaction system, the temperature reaction for a certain time, so that the linker and cDNA are connected, the ligation product is amplified. After the PCR product is denatured into a single strand, the cyclization reaction system is prepared, the temperature response is a certain time, the single-stranded ring product is obtained, and the final library is obtained after digesting the linear DNA molecules that have not been cyclized.. The libraries were evaluated quantitatively by a Qubit2.0 DNA kit (Life Technologies, China), size of the libraries was detected by Agilent 2100.

PacBio Iso-Seq data processing and bioinformatics analysis

After sequenced by PacBio sequel, large number of Circular Consensus Sequencing (CCS) reads were obtained. Reads of insert (ROI) was identified and classified into full-length non-chimeric (FLNC) and non-full-length (nFLNC) reads. The full-length and non-full-length fasta files produced were then fed into the cluster step, which performs isoform-level clustering Interative Clustering and Error Correction (ICE), similar sequences were clustered into clusters, each of which yields a consensus isoform, followed by final Arrow polishing. The final Isoform sequence is obtained using CD-HIT [ 64 ] software for de-redundancy. The resulting transcript sequence can be directly used for subsequent analysis, gene families, CDS, TF, SSR, lncRNA and other analyses.The TransDecoder ( https://transdecoder.github.io ) software is used to identify the longest Open Reading Frames (ORFs), and then searching for Pfam protein homologous sequences by blast comparing SwissProt ( http://ftp.ebi.ac.uk/pub/databases/swissprot ) and Hmmscan ( http://hmmer.org ) to predict the coding regions. All transcription factors (TFs) were identified by using the Plant Transcription Factor Database (Plant TFDB, http://planttfdb.gao-lab.org/index.php?sp=Zma ) [ 65 ] and GRASSIUS ( https://grassius.org/tfomecollection.php ) [ 66 ]. If a gene appears in any of databases, the gene is considered as TF and the corresponding transcript of the TF encoding gene is retrieved. Additionally, full-length transcriptome has been helpful for marker discovery for simple sequence repeats. We used MISA ( http://pgrc.ipkgatersleben.de/misa/misa.html ) to identify SSRs. We also screened transcripts with coding potential to obtain predicted lncRNA. In this study, the most widely used coding potential analysis methods to predict lncRNA in transcripts, including four methods: CPC analysis [ 31 ], CNCI analysis, Pfam protein structure and analysis, and txCdsPredict analysis.

Functional annotation and enrichment analysis

We used BLAST to combine the obtained sequence of non-redundant transcripts with NR (NCBI non-redundant protein sequences database), Nt ( http://www.ncbi.nlm.nih.gov ), SwissProt ( http://www.ebi.ac ). uk/swissprot), GO ( http://www.geneontology.org ), KOG ( https://mycocosm.jgi.doe.gov/ help/kogbrowser.jsf), Pfam ( http://pfam.xfam.org/ ) and KEGG ( http://www.genome.jp/kegg ) databases, to get annotation information for the transcript. The results of enrichment analysis were visualized by the enrichplot and ggplot2 packages.

Analysis of the OLE and DGAT genes family in Paeonia lactiflora

To classify the PlOLE and P lDGAT genes in Paeonia lactiflora , Cluster X 2.0.12 software ( http://www.cluster-x.org/ ) was applied for multiple sequence alignment by using protein sequences of Arabidopsis. SMART ( http://smart.embl-heidelberg.de/ ) and CDD ( https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi ) were used to manually confirm whether the candidate genes were PlOLE and PlDGAT genes. Their serial information is detailed in the Supplementary Table S5. To construct the phylogenetic tree, neighbor-joining (NJ) method was used by MEGA7.0 software, and bootstrap values were set as 1000 bootstrap replicates [ 42 ]. The conserved motifs of the PlOLE and PlDGAT sequences were identified by the MEME program ( https://meme-suite.org/meme/ ), and the parameters were set as a maximum of 10 motifs and an optimum motif width of 6–200 amino acid residues [ 67 ]. The conserved domains were visualized using the TBtools software.

Validation of gene expression by Quantitative Real-Time PCR (qRT-PCR)

Each plant tissue is represented by three biological replicates and three technical replicates. Extract RNA from plant roots, stems, leaves, flowers, stamens, and seeds which are collected 30, 45, 60, 75and 90 days after flowering by using the TaKaRa Mini Best Plant RNA Extraction Kit (TaKaRa, Japan). Then use PrimeScript ®RT reagent Kit (TaKaRa, Japan) with gDNA Eraser (Perfect Real Time) the kit reverses the total RNA of the sample into cDNA [ 68 ]. NovoStart® SYBR qPCR SuperMix Plus kit (Novoprotein, China) was used for qRT-PCR analysis on the Bio-Rad CFX Manager V1.6.541.1028 software. The PlActin (JN105299) gene was used as an internal reference for this experiment and the expression level of this reference gene was stable in all organs of Paeonia lactiflora . The primers were designed using Primer Premier 5, and all primers were listed in table (Supplementary table: Table S6). The relative expression levels of the target genes were calculated using the 2 −∆∆Ct method, and the data were analyzed by the TBtools software.

Availability of data and materials

The datasets generated or analysed during the current study are available in the main paper and supplementary information files, The raw reads are available in the Sequence Read Archive (SRA) database of the National Center for Biotechnology Information (NCBI) under accession number PRJNA1064234.

Abbreviations

Fatty acid desaturase

Phosphatide phosphatase

Delta(12) fatty acid desaturase

Phospholipase A2

Glycerol-3-phosphate acyltransferase

Long-chain acyl-CoA synthetase

Fatty acyl-ACP thioesterase B,

3-Ketoacyl-CoA synthase

3-Ketoacyl-ACP reductase

  • Diacylglycerol acyltransferase

Stearoyl-ACP desaturase

Phospholipid:diacylglycerol acyltransferase

3-Ketoacyl-ACP synthase

Enoyl-CoA reductase

Carboxyltransferase subunit alpha

Lysophosphatidic acid acyltransferase

Biotin carboxyl carrier protein

Biotin carboxylase

Ketoacyl-CoA reductase

Lysophosphatidylcholine acyltransferase

Enoyl-ACP reductases

Carboxyltransferase subunit beta

Hydroxyacyl-CoA dehydratase

Fatty acyl-ACP thioesterase A

Phosphatidylcholine:diacylglycerol cholinephosphotransferase

Hydroxyacyl-ACP dehydratase

Malonyl-CoA:ACP transacylase

Ren XX, Xue J, Wang SL, Xue YQ, Zhang P, Jiang HD, Zhang XX. Proteomic analysis of tree peony ( Paeonia ostii ’Feng Dan’) seed germination affected by low temperature. Plant Physiol. 2018;224:56–67.

Article   Google Scholar  

Wang X, Liang H, Guo D, Guo L, Duan X, Jia Q, Hou X. Integrated analysis of transcriptom6ic and proteomic data from tree peony (P. ostii) seeds reveals key developmental stages and candidate genes related to oil biosynthesis and fatty acid metabolism. Hort Res. 2019;6:111.

Article   CAS   Google Scholar  

Ning CL, Jiang Y, Meng JS, Zhou CH, Tao J. Herbaceous peony seed oil: a rich source of unsaturated fatty acids and γ-tocopherol. Eur J Lipid Sci Technol. 2014;117(4):532–42.

Meng JS, Jiang Y, Zhang KL, Tao J. Phenotypic traits in the development of capsule and seed of paeonia lactiflora hangshao. J Henan Agri Sci. 2018;47(08):109–17.

Google Scholar  

Meng JS, Tang YH, Sun J, Zhao DQ, Zhang KL, Tao J. Identification of genes associated with the biosynthesis of unsaturated fatty acid and oil accumulation in herbaceous peony “Hangshao” ( Paeonia lactiflora ’Hangshao’) seeds based on transcriptome analysis. BMC Genomics. 2021;22(1):94.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–8.

Article   CAS   PubMed   Google Scholar  

Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63.

Fan SQ, Liang TY, Yu HY, Bi Q, Li GT, Wang LB. Kernel characteristics, oil contents, fatty acid compositions and biodiesel properties in developing Siberian apricot (Prunus sibirica L.) seeds. Ind Crops Prod. 2016;89:195–199.

CAS   Google Scholar  

Niu J, Wang J, An JY, Liu LL, Lin ZX, Wang R, Wang LB, Ma C, Shi LL, Lin SZ. Integrated mRNA and miRNA transcriptome reveal a cross-talk between developing response and hormone signaling for the seed kernels of Siberian apricot . Sci Reports. 2016;6:35675.

Wang J, Lin WJ, Yin ZD, Wang LB, Dong SB, An JY, Lin ZX, Yu HY, Shi LL, Lin SZ, Chen SL. Comprehensive evaluation of fuel properties and complex regulation of intracellular transporters for high oil production in developing seeds of Prunus sibirica for woody biodiesel. Biotechnol Biofuels. 2019;12:6.

Article   PubMed   PubMed Central   Google Scholar  

Lin ZX, An JY, Wang J, Niu J, Ma C, Wang LB, Yuan GS, Shi LL, Liu LL, Zhang JS, Zhang ZX, Qi J, Lin SZ. Integrated analysis of 454 and Illumina transcriptomic sequencing characterizes carbon flux and energy source for fatty acid synthesis in developing Lindera glauca fruits for woody biodiesel. Biotechnol Biofuels. 2017;10:134.

Abell BM, Hahn M, Holbrook LA, Moloney MM. Membrane topology and sequence requirements for oil body targeting of oleosin. Plant J. 2004;37(4):461–70.

Rani SH, Saha S, Rajasekharan R. A soluble diacylglycerol acyltransferase is involved in triacylglycerol biosynthesis in the oleaginous yeast Rhodotorula glutinis . Microbiology. 2013;159:155–6.

Trenz T, Turchetto-Zolet A, Margis M, Margis R, Maraschin F. Functional characterization of castor bean ( Ricinus communis ) DGAT3 and DAcT enzymes in Arabidopsis thaliana . BMC Proc. 2014;8:P117.

Article   PubMed Central   Google Scholar  

Huang MD, Huang AHC. Bioinformatics Reveal Five Lineages of Oleosins and the Mechanism of Lineage Evolution Related to Structure/Function from Green Algae to Seed Plants. Plant Physiol. 2015;169(1):453–70.

Beisson F, Ferté N, Bruley S, Voultoury R, Verger R, Arondel V. Oil-bodies as substrates for lipolytic enzymes. Biochim Biophys Acta. 2001;1531(1–2):47–58.

Miquel M, Trigui G, d’Andréa S, Kelemen Z, Baud S, Berger A, Deruyffelaere C, Trubuil A, Lepiniec L, Dubreucq B. Specialization of Oleosinsin Oil Body Dynamics during Seed Development in Arabidopsis Seeds. Plant Physiol. 2014;164(4):1866–78.

Chen K, Yin YT, Liu S, Guo ZY, Zhang K, Liang Y, Zhang LN, Zhao WG, Chao HB, Li MT. Genome-wide identification and functional analysis of oleosin genes in Brassica napus L. BMC Plant Biol. 2019;19(1):294.

Gordon SP, Tseng E, Salamov A, Zhang JW, Meng XD, Zhao ZY, Kang DW, Underwood J, Grigoriev IV, Figueroa M, Schilling JS, Chen F, Wang Z. Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing. PLoS ONE. 2015;10(7): e0132628.

Sharon D, Tilgner H, Grubert F, Snyder M. A single-molecule long-read survey of the human transcriptome. Nat Biotechnol. 2013;31(11):1009–14.

Dong LL, Liu HF, Zhang JC, Yang SJ, Kong GY, Chu JSC, Chen NS, Wang DW. Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research. BMC Genomics. 2015;16:1039.

Xu ZC, Peters RJ, Weirather J, Luo HM, Liao BS, Zhang X, Zhu YJ, Ji AJ, Zhang B, Hu SN, Au KF, Song JY, Chen SL. Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis. Plant J. 2015;82(6):951–61.

Abdel-Ghany SE, Hamilton M, Jacobi JL, Ngam P, Devitt N, Schilkey F, Ben-Hur A, Reddy ASN. A survey of the sorghum transcriptome using single-molecule long reads. Nat Commun. 2016;7:11706.

Wang B, Tseng E, Regulski M, Clark TA, Hon T, Jiao Y, Lu ZY, Olson A, Stein JC, Ware D. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat Commun. 2016;7:11708.

Hoang NV, Furtado A, Mason PJ, Marquardt A, Kasirajan L, Thirugnanasambandam PP, Botha FC, Henry RJ. A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing. BMC Genomics. 2017;18(1):395.

Xie LJ, Teng K, Tan PH, Chao YH, Li YRZ, Guo WE, Han LB. PacBio single-molecule long-read sequencing shed new light on the transcripts and splice isoforms of the perennial ryegrass. Mol Genet Genomics. 2020;295(2):475–89.

Tan C, Liu HX, Ren J, Ye XL, Feng H, Liu ZY. Single-molecule real-time sequencing facilitates the analysis of transcripts and splice isoforms of anthers in Chinese cabbage (Brassica rapa L. ssp. pekinensis). BMC Plant Biol. 2019;19:517.

Zhu FY, Chen MX, Ye NH, Shi L, Ma KL, Yang JF, Cao YY, Zhang YJ, Yoahida T, Fernie A, Fan GY, Wen B, Zhou R, Liu TY, Fan T, Gao B, Zhang D, Hao GF, Xiao S, Liu YG, Zhang JH. Proteogenomic analysis reveals alternative splicing and translation as part of the abscisic acid response in Arabidopsis seedlings. Plant J. 2017;91(3):518–33.

Zhang HM, Liu T, Liu CJ, Song SY, Zhang XT, Liu W, Jia HB, Xue Y, Guo AY. AnimalTFDB 2.0: a resource for expression, prediction and functional study of animal transcription factors. Nucleic Acids Res. 2015;43(D1):D76–D81.

Sun L, Luo HT, Bu DC, Zhao GG, Yu KT, Zhang CH, Liu YN, Chen RS, Zhao Y. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 2013;41(17): e166.

Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007;35(Web Server issue):W345–9.

Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44(D1):D279–85.

Shimizu K, Adachi J, Muraoka Y. ANGLE: A sequencing errors resistant program for predicting protein coding regions in unfinished cDNA. J Bioinform Comput Biol. 2006;4(3):649–64.

Zhang XY, Mu XP, Cui HL, Sun Y, Xue JN, Jia XY, Li RZ. Comprehensive mining of storage oil related genes in developing seed of Abelmoschus esculentus . Sci Horticulturae. 2022;291:110612.

Zhong Y, Zhao Y, Wang Y, Niu J, Sun Z, Chen J, Luan M. Transcriptome analysis and GC-MS profiling of key fatty acid biosynthesis genes in akebia trifoliata (Thunb.) koidz seeds. Biology. 2022;11(6):855.

Liu Q, Sun YP, Su WJ, Yang J, Liu XM, Wang YF, Wang FW, Li HY, Li XK. Species-specific size expansion and molecular evolution of the oleosins in angiosperms. Gene. 2012;509(2):247–57.

Schein M, Yang ZH, Mitchell-Olds T, Schmid KJ. Rapid evolution of a pollen-specific oleosin-like gene family from Arabidopsis Thaliana and closely related species. Mol Biol Evol. 2004;21(4):659–69.

Li A, Zhao C, Wang X, Xia H, Su L. Cloning and expression analysis of oleosin family genes in Arachis hypogaea L. Journal of Agricultural Biotechnology. 2011;19(6):1003–10.

Hyun TK, Kumar D, Cho YY, Hyun HN, Kim JS. Computational identification and phylogenetic analysis of the oil-body structural proteins, oleosin and caleosin, in castor bean and flax. Gene. 2013;515(2):454–60.

Marmon S, Sturtevant D, Herrfurth C, Chapman K, Stymne S, Feussner I. Two acyltransferases contribute differently to linolenic acid levels in seed oil. Plant Physiol. 2017;173(4):2081–95.

Jiang HH,Wen SH, Lu YT, Chen G, Wang T. Genome-wide analysis and stress-responsive expression profiling of the Oleosin gene family in diploid wild species Arachis duranensis and Arachis ipaensis. Chin J Oil Crop Sci. 2024;1–11.

Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0. Mol Biol Evol. 2007;24(8):1596–1599.

Fang ZH, Liu JN, Wu XM, Zhang Y, Jia HL, Shi YH. Full-length transcriptome of in medicago sativa L. roots in response to drought stress. Front Genet. 2023;13:1086356.

Li XH, Chen WW, Lu SQ, Fang JT, Zhu H, Zhang XB, Qi YW. Full-length transcriptome analysis of maize root tips reveals the molecular mechanism of cold stress during the seedling stage. BMC Plant Biol. 2022;22(1):398.

Li J, Sun MY, Li H, Ling ZY, Wang D, Zhang JZ, Shi L. Full-length transcriptome-referenced analysis reveals crucial roles of hormone and wounding during induction of aerial bulbils in lily. BMC Plant Biol. 2022;22(1):415.

Zhang L, Song C, Guo L, Guo D, Xue X, Wang H, Hou X. Full-Length Transcriptome and Transcriptome Sequencing Unveil Potential Mechanisms of Brassinosteroid-Induced Flowering Delay in Tree Peony. Horticulturae. 2022;8(12):1136.

Chen M, Zhang Y, Du Z, Kong X, Zhu X. Integrative metabolic and transcriptomic profiling in camellia oleifera and camellia meiocarpa uncover potential mechanisms that govern triacylglycerol degradation during seed desiccation. Plants. 2023;12(14):2591.

Chen Z, Li XL, Chen FZ. Research progress on biological synthesis and biological function in plant oil body. World Sic-Tech R&D. 2021;43(2):182–91.

Shao Q, Liu X, Su T, Ma CL, Wang P. New insights into the role of seed oil body proteins in metabolism and plant development. Front Plant Sci. 2019;10:1568.

Huang AH. Oleosins and oil bodies in seeds and other organs. PlantPhysiol. 1996;110(4):1055–61.

Tzen JT, Huang AH. Surface structure and properties of plant seed oil bodies. Cell Biole. 1992;117(2):327–35.

Zhao HQ, Wang XF, Gao SP. Progress on the functional role of oleosin gene family in plants. Hereditas. 2022;44(12):1128–40.

PubMed   Google Scholar  

Zhang D, Zhang HY, Hu ZB, Chu SS, Yu KY, Lv LL, Yang YM, Zhang XQ, Chen Xi, Kan GZ, Tang Y, An YQCRL, Yu DY. Artificial selection on GmOLEO1 contributes to the increase in seed oil during soybean domestication. PLOS Genetics, 2019;15(7): e1008267.

Wu QK, Yang SS, Wang YD, Gao M, Chen YC. Isolation and expression analysis on vernicia fordii oleosin gene of five VfOLE Isoforms. For Res. 2014;27(02):233–9.

Xu HE, Pan LJ, Chen MN, Chen N, Wang T, Wang M, Yu SL, Liang CW, Chi XY. Cloning and expression analysis of oleosin genes in peanut. J Peanut Sci. 2019;48(03):9–14.

Zhu YC, Wang Y, Wei ZM, Zhang XK, Jiao BY, Yian Y, Yan F, Li JW, Liu YJ, Zhang JH, Wang XY, Mu ZS, Wang QY. Analysis of oil synthesis pathway in Cyperus esculentus tubers and identification of oleosin and caleosin genes. Plant Physiol. 2023;284: 153961.

Lu YB, Chi MH, Li LX, Li HY, Noman M, Yang Y, Ji K, Lan XX, Qiang WD, Du LN, Li HY, Yang J. Genome-wide identification, expression profiling, and functional validation of oleosin gene family in carthamus tinctorius L. Plant Sci. 2018;18:1393.

Liao P. Lechon T, Romsdahl T, Woodfield H, Fenyk S, Fawcett T, Wallington E, Bates Ruth, Chye M, Chapman KD, Harwood JL, Scofield S. Transgenic manipulation of triacylglycerol biosynthetic enzymes in B.napus alters lipid-associated gene expression and lipid metabolism. Scientific Reports. 2022;12(1):3352.

Saha S, Enugutti B, Rajakumari S, Rajasekharan R. Cytosolic triacylglycerol biosynthetic pathway in oilseeds. Molecular cloning and expression of peanut cytosolic diacylglycerol acyltransferase. Plant Physiology. 2006;141(4):1533–1543.

Zheng PZ, Allen WB, Roesler K, Williams ME, Zhang SR, Li JM, Glassman K, Ranch J, Nubel D, Solawetz W, Bhattramakki D, Llaca V, Deschamps S, Zhong GY, Tarczynski MC, Shen B. A phenylalanine in DGAT is a keydeterminant of oil content and composition in maize. Nat Genet. 2008;40(3):367–72.

Song JK, Pei WF, Wang NH, Ma JJ, Xin Y, Yang SX, Wang W, Chen QJ, Zhang JF, Yu JW, Wu M, Qu YY. Transcriptome analysis and identification of genes associated with oil accumulation in upland cotton. Physiol Plant. 2022;174(3):e13701.

Meng YX, Yao XH, Sun YQ, Zhao XY, Wang FX, Weng QY, Liu YH. Identification and Bioinformatics Analysis of DGAT Gene Family in Cereal Crops. Crops. 2023;01:20–9.

Sun J, Chen T, Tao J. Single molecule, full-length transcript sequencing provides insight into the TPS gene family in Paeonia ostii . PeerJ. 2021;9: e11808.

Li WZ, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.

Jin JP, Tian F, Yang DC, Meng YQ, Kong L, Luo JC, Gao G. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 2017; 45(D1):D1040–D1045.

Yilmaz A, Nishiyama MY Jr, Fuentes BG, Souza GM, Janies D, Gray J, Grotewold E. GRASSIUS: A Platform for Comparative Regulatory Genomics across the Grasses. Plant Physiol. 2009;149(1):171–80.

Zhao P, Wang DD, Wang RQ, Kong NN, Zhang C, Yang CH, Wu WT, Ma HL, Chen Q. Genome-wide analysis of the potato Hsp20 gene family: identification, genomic organization and expression profiles in response to heat stress. BMC Genomics. 2018;19(1):61.

Zhao XC, Yang GY, Liu XQ, Yu ZD, Peng SB. Integrated Analysis of Seed microRNA and mRNA Transcriptome Reveals Important Functional Genes and microRNA-Targets in the Process of Walnut ( Juglans regia ) Seed Oil Accumulation. Int J Mol Sci. 2020;21(23):9093.

Download references

Acknowledgements

We thank BGI (Beijing Genomics Institute, China) for help with the transcriptome sequencing and for technical assistance.

This work was supported by funding from the National Natural Science Foundation of China (32071813) and Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX23_3583).

Author information

Authors and affiliations.

College of Horticulture and Landscape Architecture, Yangzhou University, Yangzhou, 225009, China

Huajie Xu, Miao Li, Di Ma, Jiajun Gao, Jun Tao & Jiasong Meng

Joint International Research Laboratory of Agriculture and Agri-Product Safety, the Ministry of Education of China, Yangzhou University, Yangzhou, 225009, China

Jun Tao & Jiasong Meng

You can also search for this author in PubMed   Google Scholar

Contributions

JSM and JT conceived and planned the experiments. HJX and ML conducted the sequence data analysis and drafted the manuscript. HJX and ML performed the experiments. HJX, DM, JJG contributed to the manuscript revision. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jiasong Meng .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Xu, H., Li, M., Ma, D. et al. Identification of key genes for triacylglycerol biosynthesis and storage in herbaceous peony ( Paeonia lactifolra Pall.) seeds based on full-length transcriptome. BMC Genomics 25 , 601 (2024). https://doi.org/10.1186/s12864-024-10513-w

Download citation

Received : 13 January 2024

Accepted : 10 June 2024

Published : 15 June 2024

DOI : https://doi.org/10.1186/s12864-024-10513-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Paeonia lactiflora ‘Hangshao’
  • Full-length transcriptome
  • PacBio Iso-Seq
  • Triacylglycerol

BMC Genomics

ISSN: 1471-2164

research based on secondary data analysis

IMAGES

  1. 15 Secondary Research Examples (2024)

    research based on secondary data analysis

  2. Secondary Data Analysis Framework

    research based on secondary data analysis

  3. Secondary Data: Advantages, Disadvantages, Sources, Types

    research based on secondary data analysis

  4. Secondary Data Analysis

    research based on secondary data analysis

  5. (PDF) Secondary data analysis in educational research: opportunities

    research based on secondary data analysis

  6. SECONDARY DATA ANALYSIS

    research based on secondary data analysis

VIDEO

  1. A Guide for Secondary Data Analysis in Biopsychosocial Research

  2. Primary and Secondary Data

  3. Leveraging DataDirect for Mental Health Research

  4. Secondary Data Analysis with Black, Indigenous, People of Color (BIPOC) Populations

  5. Lecture-9 Sources of Secondary Data (Internal and External)

  6. Data Analysis in Research

COMMENTS

  1. Secondary Data Analysis: Using existing data to answer new questions

    All research begins with a research question, but with secondary data analysis, clinical research questions may need to be refined based on the availability of the data (Dunn et al., 2015; Polit & Beck, 2021). See Figure 1. Identifying a potential data source, then vetting the quality and utility of the data to answer the research team's ...

  2. Secondary Data Analysis: Your Complete How-To Guide

    Step 3: Design your research process. After defining your statement of purpose, the next step is to design the research process. For primary data, this involves determining the types of data you want to collect (e.g. quantitative, qualitative, or both) and a methodology for gathering them. For secondary data analysis, however, your research ...

  3. Secondary Analysis Research

    Secondary analysis of data collected by another researcher for a different purpose, or SDA, is increasing in the medical and social sciences. This is not surprising, given the immense body of health care-related research performed worldwide and the potential beneficial clinical implications of the timely expansion of primary research (Johnston, 2014; Tripathy, 2013).

  4. What is Secondary Research?

    Secondary research is a research method that uses data that was collected by someone else. In other words, whenever you conduct research using data that already exists, you are conducting secondary research. On the other hand, any type of research that you undertake yourself is called primary research. Example: Secondary research.

  5. Secondary Data

    Secondary data analysis involves the use of pre-existing data for research purposes. Here are some common methods of secondary data analysis: ... This method involves making inferences and drawing conclusions about a population based on a sample of data. Inferential analysis can be used to test hypotheses and determine the statistical ...

  6. How to Analyse Secondary Data for a Dissertation

    The process of data analysis in secondary research. Secondary analysis (i.e., the use of existing data) is a systematic methodological approach that has some clear steps that need to be followed for the process to be effective. In simple terms there are three steps: Step One: Development of Research Questions. Step Two: Identification of dataset.

  7. Conducting secondary analysis of qualitative data: Should we, can we

    SDA involves investigations where data collected for a previous study is analyzed - either by the same researcher(s) or different researcher(s) - to explore new questions or use different analysis strategies that were not a part of the primary analysis (Szabo and Strang, 1997).For research involving quantitative data, SDA, and the process of sharing data for the purpose of SDA, has become ...

  8. Conducting High-Value Secondary Dataset Analysis: An Introductory Guide

    Secondary analyses of large datasets provide a mechanism for researchers to address high impact questions that would otherwise be prohibitively expensive and time-consuming to study. This paper presents a guide to assist investigators interested in conducting secondary data analysis, including advice on the process of successful secondary data ...

  9. 28 Secondary Data Analysis

    The analysis of existing data sets is routine in disciplines such as economics, political science, and sociology, but it is less well established in psychology (but see Brooks-Gunn & Chase-Lansdale, 1991; Brooks-Gunn, Berlin, Leventhal, & Fuligini, 2000).Moreover, biases against secondary data analysis in favor of primary research may be present in psychology (see McCall & Appelbaum, 1991).

  10. Sage Research Methods Foundations

    Secondary analysis is the analysis of data that have originally been collected either for a different purpose or by a different researcher or organisation. Because of the cost and complexity of primary data collection, and because of the opportunities offered by "found" data not originally collected for research purposes (e.g ...

  11. Using Secondary Research For Better Decisions: An Overview

    06/11/2024. Secondary research, also known as desk research or literature review, is a cornerstone of academic inquiry and professional investigation. It involves the analysis and synthesis of existing data, information, and knowledge collected by others, rather than gathering primary data firsthand. In essence, secondary research is akin to ...

  12. Sage Research Methods

    Volume 1: Using Secondary Sources and Secondary Analysis provides an overview of the theoretical underpinnings of secondary analysis in social research. Volume 2: Quantitative Approaches to Secondary Analysis covers the broad range of approaches adopted in quantitative secondary analysis research designs. Volume 3: Qualitative Data and Research ...

  13. Steps in Secondary Data Analysis

    Steps in Secondary Data Analysis. Stepping Your Way through Effective Secondary Data Analysis. Determine your research question - As indicated above, knowing exactly what you are looking for. Locating data - Knowing what is out there and whether you can gain access to it. A quick Internet search, possibly with the help of a librarian, will ...

  14. Use of secondary data analyses in research: Pros and Cons

    This empirical analysis based on secondary data primarily the Gender and Development monitor (2022) and other reports that unveil an acute dearth of women in top positions across all the sectors ...

  15. Secondary Qualitative Research Methodology Using Online Data within the

    In addition to the challenges of secondary research as mentioned in subsection Secondary Data and Analysis, in current research realm of secondary analysis, there is a lack of rigor in the analysis and overall methodology (Ruggiano & Perry, 2019). This has the pitfall of possibly exaggerating the effects of researcher bias (Thorne, 1994, 1998 ...

  16. Secondary Research for Your Dissertation: A Research Guide

    Secondary research plays a crucial role in dissertation writing, providing a foundation for your primary research. By leveraging existing data, you can gain valuable insights, identify research gaps, and enhance the credibility of your study. Unlike primary research, which involves collecting original data directly through experiments, surveys ...

  17. Secondary Data Analysis: Ethical Issues and Challenges

    Secondary data analysis. Secondary analysis refers to the use of existing research data to find answer to a question that was different from the original work ( 2 ). Secondary data can be large scale surveys or data collected as part of personal research. Although there is general agreement about sharing the results of large scale surveys, but ...

  18. PDF Secondary Data Analysis: A Method of which the Time Has Come

    In a time where the large amounts of data being collected, compiled, and archived by researchers all over the world are now more easily accessible, the time has definitely come for secondary data analysis as a viable method for LIS research. References. Andrews, L., Higgins, A., Andrews, M. W., & Lalor, J. G. (2012).

  19. Secondary Data Analysis as an Efficient and Effective Approach to

    Secondary data analysis is one strategy to address this challenge. The use of existing data to test new hypotheses or answer new research questions has several advantages. It typically takes less time and resources, is low risk to participants, and allows access to large data sets and longitudinal data. Despite these advantages, limitations do ...

  20. Secondary analysis: theoretical, methodological, and practical ...

    Secondary analysis, which involves the use of existing data sets to answer new research questions, is an increasingly popular methodological choice among researchers who wish to investigate particular research questions but lack the resources to undertake primary data collections. Much time loss and considerable frustration may result, however ...

  21. Sage Research Methods Foundations

    Abstract. Secondary analysis is a research methodology in which preexisting data are used to investigate new questions or to verify the findings of previous work. It can be applied to both quantitative and qualitative data but is more established in relation to the former. Interest in the secondary analysis of qualitative data has grown since ...

  22. Secondary data analysis and combining with primary data

    Secondary data analysis is the process of analyzing data that was originally collected by another researcher, organization, or entity. This data can come from a variety of sources, such as government databases, academic studies, market research reports, or even internal company records. By repurposing this existing information, researchers can ...

  23. Benefits of Using Secondary Data Analysis for Your Research

    The Advantages of Secondary Data Analysis. One of the most noticeable advantages of using secondary data analysis is its cost effectiveness. Because someone else has already collected the data, the researcher does not need to invest any money, time, or effort into the data collection stages of his or her study.

  24. Secondary data for global health digitalisation

    Substantial opportunities for global health intelligence and research arise from the combined and optimised use of secondary data within data ecosystems. Secondary data are information being used for purposes other than those intended when they were collected. These data can be gathered from sources on the verge of widespread use such as the internet, wearables, mobile phone apps, electronic ...

  25. Extracting secondary data from citizen science images reveals host

    Researchers are not only using the primary data (i.e., the observed species together with its observation date and location), but the additional information captured with the citizen observations—the so-called secondary data (Callaghan et al., 2021). Secondary data comprise the information that can be extracted from the observation evidence ...

  26. A meta-analysis of the effects of design thinking on student learning

    Meta-analysis can integrate various empirical research results to calculate the overall effect value (Lipsey and Wilson, 2001). This research was conducted based on the process proposed by Field ...

  27. How to Conduct a Business Market Analysis

    These are the seven steps of conducting a market analysis: 1. Determine your purpose. There are many reasons you may be conducting a market analysis, such as to gauge your competition or to ...

  28. Secondary analysis of existing data: opportunities and implementation

    The secondary analysis of existing data has become an increasingly popular method of enhancing the overall efficiency of the health research enterprise. But this effort depends on governments, funding agencies, and researchers making the data collected in primary research studies and in health-related registry systems available to qualified ...

  29. Epidural analgesia during labour and severe maternal morbidity

    Objectives To determine the effect of labour epidural on severe maternal morbidity (SMM) and to explore whether this effect might be greater in women with a medical indication for epidural analgesia during labour, or with preterm labour. Design Population based study. Setting All NHS hospitals in Scotland. Participants 567 216 women in labour at 24+0 to 42+6 weeks' gestation between 1 ...

  30. Identification of key genes for triacylglycerol biosynthesis and

    A total of 4905 transcripts were related to lipid metabolism biosynthesis pathway, belonging to 28 enzymes. We use these data to identify 10 oleosin (OLE) and 5 diacylglycerol acyltransferase (DGAT) gene members after de-redundancy. The analysis of physicochemical properties and secondary structure showed them similarity in gene family ...