P-Value And Statistical Significance: What It Is & Why It Matters

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

  • Search Search Please fill out this field.

What Is P-Value?

Understanding p-value.

  • P-Value in Hypothesis Testing

The Bottom Line

  • Corporate Finance
  • Financial Analysis

P-Value: What It Is, How to Calculate It, and Why It Matters

what is the p value in research

Yarilet Perez is an experienced multimedia journalist and fact-checker with a Master of Science in Journalism. She has worked in multiple cities covering breaking news, politics, education, and more. Her expertise is in personal finance and investing, and real estate.

what is the p value in research

In statistics, a p-value is a number that indicates how likely you are to obtain a value that is at least equal to or more than the actual observation if the null hypothesis is correct.

The p-value serves as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means stronger evidence in favor of the alternative hypothesis.

P-value is often used to promote credibility for studies or reports by government agencies. For example, the U.S. Census Bureau stipulates that any analysis with a p-value greater than 0.10 must be accompanied by a statement that the difference is not statistically different from zero. The Census Bureau also has standards in place stipulating which p-values are acceptable for various publications.

Key Takeaways

  • A p-value is a statistical measurement used to validate a hypothesis against observed data.
  • A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true.
  • The lower the p-value, the greater the statistical significance of the observed difference.
  • A p-value of 0.05 or lower is generally considered statistically significant.
  • P-value can serve as an alternative to—or in addition to—preselected confidence levels for hypothesis testing.

Jessica Olah / Investopedia

P-values are usually found using p-value tables or spreadsheets/statistical software. These calculations are based on the assumed or known probability distribution of the specific statistic tested. P-values are calculated from the deviation between the observed value and a chosen reference value, given the probability distribution of the statistic, with a greater difference between the two values corresponding to a lower p-value.

Mathematically, the p-value is calculated using integral calculus from the area under the probability distribution curve for all values of statistics that are at least as far from the reference value as the observed value is, relative to the total area under the probability distribution curve.

The calculation for a p-value varies based on the type of test performed. The three test types describe the location on the probability distribution curve: lower-tailed test, upper-tailed test, or two-tailed test .

In a nutshell, the greater the difference between two observed values, the less likely it is that the difference is due to simple random chance, and this is reflected by a lower p-value.

The P-Value Approach to Hypothesis Testing

The p-value approach to hypothesis testing uses the calculated probability to determine whether there is evidence to reject the null hypothesis. The null hypothesis, also known as the conjecture, is the initial claim about a population (or data-generating process). The alternative hypothesis states whether the population parameter differs from the value of the population parameter stated in the conjecture.

In practice, the significance level is stated in advance to determine how small the p-value must be to reject the null hypothesis. Because different researchers use different levels of significance when examining a question, a reader may sometimes have difficulty comparing results from two different tests. P-values provide a solution to this problem.

Even a low p-value is not necessarily proof of statistical significance, since there is still a possibility that the observed data are the result of chance. Only repeated experiments or studies can confirm if a relationship is statistically significant.

For example, suppose a study comparing returns from two particular assets was undertaken by different researchers who used the same data but different significance levels. The researchers might come to opposite conclusions regarding whether the assets differ.

If one researcher used a confidence level of 90% and the other required a confidence level of 95% to reject the null hypothesis, and if the p-value of the observed difference between the two returns was 0.08 (corresponding to a confidence level of 92%), then the first researcher would find that the two assets have a difference that is statistically significant , while the second would find no statistically significant difference between the returns.

To avoid this problem, the researchers could report the p-value of the hypothesis test and allow readers to interpret the statistical significance themselves. This is called a p-value approach to hypothesis testing. Independent observers could note the p-value and decide for themselves whether that represents a statistically significant difference or not.

Example of P-Value

An investor claims that their investment portfolio’s performance is equivalent to that of the Standard & Poor’s (S&P) 500 Index . To determine this, the investor conducts a two-tailed test.

The null hypothesis states that the portfolio’s returns are equivalent to the S&P 500’s returns over a specified period, while the alternative hypothesis states that the portfolio’s returns and the S&P 500’s returns are not equivalent—if the investor conducted a one-tailed test , the alternative hypothesis would state that the portfolio’s returns are either less than or greater than the S&P 500’s returns.

The p-value hypothesis test does not necessarily make use of a preselected confidence level at which the investor should reset the null hypothesis that the returns are equivalent. Instead, it provides a measure of how much evidence there is to reject the null hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis.

Thus, if the investor finds that the p-value is 0.001, there is strong evidence against the null hypothesis, and the investor can confidently conclude that the portfolio’s returns and the S&P 500’s returns are not equivalent.

Although this does not provide an exact threshold as to when the investor should accept or reject the null hypothesis, it does have another very practical advantage. P-value hypothesis testing offers a direct way to compare the relative confidence that the investor can have when choosing among multiple different types of investments or portfolios relative to a benchmark such as the S&P 500.

For example, for two portfolios, A and B, whose performance differs from the S&P 500 with p-values of 0.10 and 0.01, respectively, the investor can be much more confident that portfolio B, with a lower p-value, will actually show consistently different results.

Is a 0.05 P-Value Significant?

A p-value less than 0.05 is typically considered to be statistically significant, in which case the null hypothesis should be rejected. A p-value greater than 0.05 means that deviation from the null hypothesis is not statistically significant, and the null hypothesis is not rejected.

What Does a P-Value of 0.001 Mean?

A p-value of 0.001 indicates that if the null hypothesis tested were indeed true, then there would be a one-in-1,000 chance of observing results at least as extreme. This leads the observer to reject the null hypothesis because either a highly rare data result has been observed or the null hypothesis is incorrect.

How Can You Use P-Value to Compare 2 Different Results of a Hypothesis Test?

If you have two different results, one with a p-value of 0.04 and one with a p-value of 0.06, the result with a p-value of 0.04 will be considered more statistically significant than the p-value of 0.06. Beyond this simplified example, you could compare a 0.04 p-value to a 0.001 p-value. Both are statistically significant, but the 0.001 example provides an even stronger case against the null hypothesis than the 0.04.

The p-value is used to measure the significance of observational data. When researchers identify an apparent relationship between two variables, there is always a possibility that this correlation might be a coincidence. A p-value calculation helps determine if the observed relationship could arise as a result of chance.

U.S. Census Bureau. “ Statistical Quality Standard E1: Analyzing Data .”

what is the p value in research

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 26 November 2021

The clinician’s guide to p values, confidence intervals, and magnitude of effects

  • Mark R. Phillips   ORCID: orcid.org/0000-0003-0923-261X 1   na1 ,
  • Charles C. Wykoff 2 , 3 ,
  • Lehana Thabane   ORCID: orcid.org/0000-0003-0355-9734 1 , 4 ,
  • Mohit Bhandari   ORCID: orcid.org/0000-0001-9608-4808 1 , 5 &
  • Varun Chaudhary   ORCID: orcid.org/0000-0002-9988-4146 1 , 5

for the Retina Evidence Trials InterNational Alliance (R.E.T.I.N.A.) Study Group

Eye volume  36 ,  pages 341–342 ( 2022 ) Cite this article

17k Accesses

5 Citations

14 Altmetric

Metrics details

  • Outcomes research

A Correction to this article was published on 19 January 2022

This article has been updated

Introduction

There are numerous statistical and methodological considerations within every published study, and the ability of clinicians to appreciate the implications and limitations associated with these key concepts is critically important. These implications often have a direct impact on the applicability of study findings – which, in turn, often determine the appropriateness for the results to lead to modification of practice patterns. Because it can be challenging and time-consuming for busy clinicians to break down the nuances of each study, herein we provide a brief summary of 3 important topics that every ophthalmologist should consider when interpreting evidence.

p -values: what they tell us and what they don’t

Perhaps the most universally recognized statistic is the p-value. Most individuals understand the notion that (usually) a p -value <0.05 signifies a statistically significant difference between the two groups being compared. While this understanding is shared amongst most, it is far more important to understand what a p -value does not tell us. Attempting to inform clinical practice patterns through interpretation of p -values is overly simplistic, and is fraught with potential for misleading conclusions. A p -value represents the probability that the observed result (difference between the groups being compared)—or one that is more extreme—would occur by random chance, assuming that the null hypothesis (the alternative scenario to the study’s hypothesis) is that there are no differences between the groups being compared. For example, a p -value of 0.04 would indicate that the difference between the groups compared would have a 4% chance of occurring by random chance. When this probability is small, it becomes less likely that the null hypothesis is accurate—or, alternatively, that the probability of a difference between groups is high [ 1 ]. Studies use a predefined threshold to determine when a p -value is sufficiently small enough to support the study hypothesis. This threshold is conventionally a p -value of 0.05; however, there are reasons and justifications for studies to use a different threshold if appropriate.

What a p -value cannot tell us, is the clinical relevance or importance of the observed treatment effects. [ 1 ]. Specifically, a p -value does not provide details about the magnitude of effect [ 2 , 3 , 4 ]. Despite a significant p -value, it is quite possible for the difference between the groups to be small. This phenomenon is especially common with larger sample sizes in which comparisons may result in statistically significant differences that are actually not clinically meaningful. For example, a study may find a statistically significant difference ( p  < 0.05) between the visual acuity outcomes between two groups, while the difference between the groups may only amount to a 1 or less letter difference. While this may be in fact a statistically significant difference, the difference is likely not large enough to make a meaningful difference for patients. Thus, p -values lack vital information on the magnitude of effects for the assessed outcomes [ 2 , 3 , 4 ].

Overcoming the limitations of interpreting p -values: magnitude of effect

To overcome this limitation, it is important to consider both (1) whether or not the p -value of a comparison is significant according to the pre-defined statistical plan, and (2) the magnitude of the treatment effects (commonly reported as an effect estimate with 95% confidence intervals) [ 5 ]. The magnitude of effect is most often represented as the mean difference between groups for continuous outcomes, such as visual acuity on the logMAR scale, and the risk or odds ratio for dichotomous/binary outcomes, such as occurrence of adverse events. These measures indicate the observed effect that was quantified by the study comparison. As suggested in the previous section, understanding the actual magnitude of the difference in the study comparison provides an understanding of the results that an isolated p -value does not provide [ 4 , 5 ]. Understanding the results of a study should shift from a binary interpretation of significant vs not significant, and instead, focus on a more critical judgement of the clinical relevance of the observed effect [ 1 ].

There are a number of important metrics, such as the Minimally Important Difference (MID), which helps to determine if a difference between groups is large enough to be clinically meaningful [ 6 , 7 ]. When a clinician is able to identify (1) the magnitude of effect within a study, and (2) the MID (smallest change in the outcome that a patient would deem meaningful), they are far more capable of understanding the effects of a treatment, and further articulate the pros and cons of a treatment option to patients with reference to treatment effects that can be considered clinically valuable.

The role of confidence intervals

Confidence intervals are estimates that provide a lower and upper threshold to the estimate of the magnitude of effect. By convention, 95% confidence intervals are most typically reported. These intervals represent the range in which we can, with 95% confidence, assume the treatment effect to fall within. For example, a mean difference in visual acuity of 8 (95% confidence interval: 6 to 10) suggests that the best estimate of the difference between the two study groups is 8 letters, and we have 95% certainty that the true value is between 6 and 10 letters. When interpreting this clinically, one can consider the different clinical scenarios at each end of the confidence interval; if the patient’s outcome was to be the most conservative, in this case an improvement of 6 letters, would the importance to the patient be different than if the patient’s outcome was to be the most optimistic, or 10 letters in this example? When the clinical value of the treatment effect does not change when considering the lower versus upper confidence intervals, there is enhanced certainty that the treatment effect will be meaningful to the patient [ 4 , 5 ]. In contrast, if the clinical merits of a treatment appear different when considering the possibility of the lower versus the upper confidence intervals, one may be more cautious about the expected benefits to be anticipated with treatment [ 4 , 5 ].

There are a number of important details for clinicians to consider when interpreting evidence. Through this editorial, we hope to provide practical insights into fundamental methodological principals that can help guide clinical decision making. P -values are one small component to consider when interpreting study results, with much deeper appreciation of results being available when the treatment effects and associated confidence intervals are also taken into consideration.

Change history

19 january 2022.

A Correction to this paper has been published: https://doi.org/10.1038/s41433-021-01914-2

Li G, Walter SD, Thabane L. Shifting the focus away from binary thinking of statistical significance and towards education for key stakeholders: revisiting the debate on whether it’s time to de-emphasize or get rid of statistical significance. J Clin Epidemiol. 2021;137:104–12. https://doi.org/10.1016/j.jclinepi.2021.03.033

Article   PubMed   Google Scholar  

Gagnier JJ, Morgenstern H. Misconceptions, misuses, and misinterpretations of p values and significance testing. J Bone Joint Surg Am. 2017;99:1598–603. https://doi.org/10.2106/JBJS.16.01314

Goodman SN. Toward evidence-based medical statistics. 1: the p value fallacy. Ann Intern Med. 1999;130:995–1004. https://doi.org/10.7326/0003-4819-130-12-199906150-00008

Article   CAS   PubMed   Google Scholar  

Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–50. https://doi.org/10.1007/s10654-016-0149-3

Article   PubMed   PubMed Central   Google Scholar  

Phillips M. Letter to the editor: editorial: threshold p values in orthopaedic research-we know the problem. What is the solution? Clin Orthop. 2019;477:1756–8. https://doi.org/10.1097/CORR.0000000000000827

Devji T, Carrasco-Labra A, Qasim A, Phillips MR, Johnston BC, Devasenapathy N, et al. Evaluating the credibility of anchor based estimates of minimal important differences for patient reported outcomes: instrument development and reliability study. BMJ. 2020;369:m1714. https://doi.org/10.1136/bmj.m1714

Carrasco-Labra A, Devji T, Qasim A, Phillips MR, Wang Y, Johnston BC, et al. Minimal important difference estimates for patient-reported outcomes: a systematic survey. J Clin Epidemiol. 2020;0. https://doi.org/10.1016/j.jclinepi.2020.11.024

Download references

Author information

Authors and affiliations.

Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada

Mark R. Phillips, Lehana Thabane, Mohit Bhandari & Varun Chaudhary

Retina Consultants of Texas (Retina Consultants of America), Houston, TX, USA

Charles C. Wykoff

Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA

Biostatistics Unit, St. Joseph’s Healthcare-Hamilton, Hamilton, ON, Canada

Lehana Thabane

Department of Surgery, McMaster University, Hamilton, ON, Canada

Mohit Bhandari & Varun Chaudhary

NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK

Sobha Sivaprasad

Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Peter Kaiser

Retinal Disorders and Ophthalmic Genetics, Stein Eye Institute, University of California, Los Angeles, CA, USA

David Sarraf

Department of Ophthalmology, Mayo Clinic, Rochester, MN, USA

Sophie J. Bakri

The Retina Service at Wills Eye Hospital, Philadelphia, PA, USA

Sunir J. Garg

Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Rishi P. Singh

Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA

Department of Ophthalmology, University of Bonn, Boon, Germany

Frank G. Holz

Singapore Eye Research Institute, Singapore, Singapore

Tien Y. Wong

Singapore National Eye Centre, Duke-NUD Medical School, Singapore, Singapore

Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia

Robyn H. Guymer

Department of Surgery (Ophthalmology), The University of Melbourne, Melbourne, VIC, Australia

You can also search for this author in PubMed   Google Scholar

  • Varun Chaudhary
  • , Mohit Bhandari
  • , Charles C. Wykoff
  • , Sobha Sivaprasad
  • , Lehana Thabane
  • , Peter Kaiser
  • , David Sarraf
  • , Sophie J. Bakri
  • , Sunir J. Garg
  • , Rishi P. Singh
  • , Frank G. Holz
  • , Tien Y. Wong
  •  & Robyn H. Guymer

Contributions

MRP was responsible for conception of idea, writing of manuscript and review of manuscript. VC was responsible for conception of idea, writing of manuscript and review of manuscript. MB was responsible for conception of idea, writing of manuscript and review of manuscript. CCW was responsible for critical review and feedback on manuscript. LT was responsible for critical review and feedback on manuscript.

Corresponding author

Correspondence to Varun Chaudhary .

Ethics declarations

Competing interests.

MRP: Nothing to disclose. CCW: Consultant: Acuela, Adverum Biotechnologies, Inc, Aerpio, Alimera Sciences, Allegro Ophthalmics, LLC, Allergan, Apellis Pharmaceuticals, Bayer AG, Chengdu Kanghong Pharmaceuticals Group Co, Ltd, Clearside Biomedical, DORC (Dutch Ophthalmic Research Center), EyePoint Pharmaceuticals, Gentech/Roche, GyroscopeTx, IVERIC bio, Kodiak Sciences Inc, Novartis AG, ONL Therapeutics, Oxurion NV, PolyPhotonix, Recens Medical, Regeron Pharmaceuticals, Inc, REGENXBIO Inc, Santen Pharmaceutical Co, Ltd, and Takeda Pharmaceutical Company Limited; Research funds: Adverum Biotechnologies, Inc, Aerie Pharmaceuticals, Inc, Aerpio, Alimera Sciences, Allergan, Apellis Pharmaceuticals, Chengdu Kanghong Pharmaceutical Group Co, Ltd, Clearside Biomedical, Gemini Therapeutics, Genentech/Roche, Graybug Vision, Inc, GyroscopeTx, Ionis Pharmaceuticals, IVERIC bio, Kodiak Sciences Inc, Neurotech LLC, Novartis AG, Opthea, Outlook Therapeutics, Inc, Recens Medical, Regeneron Pharmaceuticals, Inc, REGENXBIO Inc, Samsung Pharm Co, Ltd, Santen Pharmaceutical Co, Ltd, and Xbrane Biopharma AB—unrelated to this study. LT: Nothing to disclose. MB: Research funds: Pendopharm, Bioventus, Acumed – unrelated to this study. VC: Advisory Board Member: Alcon, Roche, Bayer, Novartis; Grants: Bayer, Novartis – unrelated to this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: In this article the middle initial in author name Sophie J. Bakri was missing.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Phillips, M.R., Wykoff, C.C., Thabane, L. et al. The clinician’s guide to p values, confidence intervals, and magnitude of effects. Eye 36 , 341–342 (2022). https://doi.org/10.1038/s41433-021-01863-w

Download citation

Received : 11 November 2021

Revised : 12 November 2021

Accepted : 15 November 2021

Published : 26 November 2021

Issue Date : February 2022

DOI : https://doi.org/10.1038/s41433-021-01863-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

what is the p value in research

Statology

Statistics Made Easy

An Explanation of P-Values and Statistical Significance

In statistics, p-values are commonly used in hypothesis testing for t-tests, chi-square tests, regression analysis, ANOVAs, and a variety of other statistical methods.

Despite being so common, people often interpret p-values incorrectly, which can lead to errors when interpreting the findings from an analysis or a study. 

This post explains how to understand and interpret p-values in a clear, practical way.

Hypothesis Testing

To understand p-values, we first need to understand the concept of hypothesis testing .

A  hypothesis test  is a formal statistical test we use to reject or fail to reject some hypothesis. For example, we may hypothesize that a new drug, method, or procedure provides some benefit over a current drug, method, or procedure. 

To test this, we can conduct a hypothesis test where we use a null and alternative hypothesis:

Null hypothesis – There is no effect or difference between the new method and the old method.

Alternative hypothesis – There does exist some effect or difference between the new method and the old method.

A p-value indicates how believable the null hypothesis is, given the sample data. Specifically, assuming the null hypothesis is true, the p-value tells us the probability of obtaining an effect at least as large as the one we actually observed in the sample data. 

If the p-value of a hypothesis test is sufficiently low, we can reject the null hypothesis. Specifically, when we conduct a hypothesis test, we must choose a significance level at the outset. Common choices for significance levels are 0.01, 0.05, and 0.10.

If the p-values is  less than  our significance level, then we can reject the null hypothesis.

Otherwise, if the p-value is  equal to or greater than  our significance level, then we fail to reject the null hypothesis. 

How to Interpret a P-Value

The textbook definition of a p-value is:

A p-value is the probability of observing a sample statistic that is at least as extreme as your sample statistic, given that the null hypothesis is true.

For example, suppose a factory claims that they produce tires that have a mean weight of 200 pounds. An auditor hypothesizes that the true mean weight of tires produced at this factory is different from 200 pounds so he runs a hypothesis test and finds that the p-value of the test is 0.04. Here is how to interpret this p-value:

If the factory does indeed produce tires that have a mean weight of 200 pounds, then 4% of all audits will obtain the effect observed in the sample, or larger, because of random sample error. This tells us that obtaining the sample data that the auditor did would be pretty rare if indeed the factory produced tires that have a mean weight of 200 pounds. 

Depending on the significance level used in this hypothesis test, the auditor would likely reject the null hypothesis that the true mean weight of tires produced at this factory is indeed 200 pounds. The sample data that he obtained from the audit is not very consistent with the null hypothesis.

How Not  to Interpret a P-Value

The biggest misconception about p-values is that they are equivalent to the probability of making a mistake by rejecting a true null hypothesis (known as a Type I error).

There are two primary reasons that p-values can’t be the error rate:

1.  P-values are calculated based on the assumption that the null hypothesis is true and that the difference between the sample data and the null hypothesis is simple caused by random chance. Thus, p-values can’t tell you the probability that the null is true or false since it is 100% true based on the perspective of the calculations.

2. Although a low p-value indicates that your sample data are unlikely assuming the null is true, a p-value still can’t tell you which of the following cases is more likely:

  • The null is false
  • The null is true but you obtained an odd sample

In regards to the previous example, here is a correct and incorrect way to interpret the p-value:

  • Correct Interpretation: Assuming the factory does produce tires with a mean weight of 200 pounds, you would obtain the observed difference that you  did  obtain in your sample or a more extreme difference in 4% of audits due to random sampling error.
  • Incorrect Interpretation: If you reject the null hypothesis, there is a 4% chance that you are making a mistake.

Examples of Interpreting P-Values

The following examples illustrate correct ways to interpret p-values in the context of hypothesis testing.

A phone company claims that 90% of its customers are satisfied with their service. To test this claim, an independent researcher gathered a simple random sample of 200 customers and asked them if they are satisfied with their service, to which 85% responded yes. The p-value associated with this sample data turned out to be 0.018.

Correct interpretation of p-value:  Assuming that 90% of the customers actually are satisfied with their service, the researcher would obtain the observed difference that he  did  obtain in his sample or a more extreme difference in 1.8% of audits due to random sampling error.

A company invents a new battery for phones. The company claims that this new battery will work for at least 10 minutes longer than the old battery. To test this claim, a researcher takes a simple random sample of 80 new batteries and 80 old batteries. The new batteries run for an average of 120 minutes with a standard deviation of 12 minutes and the old batteries run for an average of 115 minutes with a standard deviation of 15 minutes. The p-value that results from the test for a difference in population means is 0.011.

Correct interpretation of p-value:  Assuming that the new battery works for the same amount of time or less than the old battery, the researcher would obtain the observed difference or a more extreme difference in 1.1% of studies due to random sampling error.

Featured Posts

Statistics Cheat Sheets to Get Before Your Job Interview

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

2 Replies to “An Explanation of P-Values and Statistical Significance”

My concepts are actually getting clear because of the consice and simple way, with ample examples. Thank you statology!

I read this article and its written amazingly in a simplified way.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

what is the p value in research

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.2 hypothesis testing (p-value approach).

The P -value approach involves determining "likely" or "unlikely" by determining the probability — assuming the null hypothesis was true — of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed. If the P -value is small, say less than (or equal to) \(\alpha\), then it is "unlikely." And, if the P -value is large, say more than \(\alpha\), then it is "likely."

If the P -value is less than (or equal to) \(\alpha\), then the null hypothesis is rejected in favor of the alternative hypothesis. And, if the P -value is greater than \(\alpha\), then the null hypothesis is not rejected.

Specifically, the four steps involved in using the P -value approach to conducting any hypothesis test are:

  • Specify the null and alternative hypotheses.
  • Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic. Again, to conduct the hypothesis test for the population mean μ , we use the t -statistic \(t^*=\frac{\bar{x}-\mu}{s/\sqrt{n}}\) which follows a t -distribution with n - 1 degrees of freedom.
  • Using the known distribution of the test statistic, calculate the P -value : "If the null hypothesis is true, what is the probability that we'd observe a more extreme test statistic in the direction of the alternative hypothesis than we did?" (Note how this question is equivalent to the question answered in criminal trials: "If the defendant is innocent, what is the chance that we'd observe such extreme criminal evidence?")
  • Set the significance level, \(\alpha\), the probability of making a Type I error to be small — 0.01, 0.05, or 0.10. Compare the P -value to \(\alpha\). If the P -value is less than (or equal to) \(\alpha\), reject the null hypothesis in favor of the alternative hypothesis. If the P -value is greater than \(\alpha\), do not reject the null hypothesis.

Example S.3.2.1

Mean gpa section  .

In our example concerning the mean grade point average, suppose that our random sample of n = 15 students majoring in mathematics yields a test statistic t * equaling 2.5. Since n = 15, our test statistic t * has n - 1 = 14 degrees of freedom. Also, suppose we set our significance level α at 0.05 so that we have only a 5% chance of making a Type I error.

Right Tailed

The P -value for conducting the right-tailed test H 0 : μ = 3 versus H A : μ > 3 is the probability that we would observe a test statistic greater than t * = 2.5 if the population mean \(\mu\) really were 3. Recall that probability equals the area under the probability curve. The P -value is therefore the area under a t n - 1 = t 14 curve and to the right of the test statistic t * = 2.5. It can be shown using statistical software that the P -value is 0.0127. The graph depicts this visually.

t-distrbution graph showing the right tail beyond a t value of 2.5

The P -value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0127, is less than \(\alpha\) = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ > 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ > 3 if we lowered our willingness to make a Type I error to \(\alpha\) = 0.01 instead, as the P -value, 0.0127, is then greater than \(\alpha\) = 0.01.

Left Tailed

In our example concerning the mean grade point average, suppose that our random sample of n = 15 students majoring in mathematics yields a test statistic t * instead of equaling -2.5. The P -value for conducting the left-tailed test H 0 : μ = 3 versus H A : μ < 3 is the probability that we would observe a test statistic less than t * = -2.5 if the population mean μ really were 3. The P -value is therefore the area under a t n - 1 = t 14 curve and to the left of the test statistic t* = -2.5. It can be shown using statistical software that the P -value is 0.0127. The graph depicts this visually.

t distribution graph showing left tail below t value of -2.5

The P -value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0127, is less than α = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ < 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ < 3 if we lowered our willingness to make a Type I error to α = 0.01 instead, as the P -value, 0.0127, is then greater than \(\alpha\) = 0.01.

In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t * instead of equaling -2.5. The P -value for conducting the two-tailed test H 0 : μ = 3 versus H A : μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean μ really was 3. That is, the two-tailed test requires taking into account the possibility that the test statistic could fall into either tail (hence the name "two-tailed" test). The P -value is, therefore, the area under a t n - 1 = t 14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually.

t-distribution graph of two tailed probability for t values of -2.5 and 2.5

Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests. The P -value, 0.0254, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0254, is less than α = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ ≠ 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ ≠ 3 if we lowered our willingness to make a Type I error to α = 0.01 instead, as the P -value, 0.0254, is then greater than \(\alpha\) = 0.01.

Now that we have reviewed the critical value and P -value approach procedures for each of the three possible hypotheses, let's look at three new examples — one of a right-tailed test, one of a left-tailed test, and one of a two-tailed test.

The good news is that, whenever possible, we will take advantage of the test statistics and P -values reported in statistical software, such as Minitab, to conduct our hypothesis tests in this course.

p-value Calculator

What is p-value, how do i calculate p-value from test statistic, how to interpret p-value, how to use the p-value calculator to find p-value from test statistic, how do i find p-value from z-score, how do i find p-value from t, p-value from chi-square score (χ² score), p-value from f-score.

Welcome to our p-value calculator! You will never again have to wonder how to find the p-value, as here you can determine the one-sided and two-sided p-values from test statistics, following all the most popular distributions: normal, t-Student, chi-squared, and Snedecor's F.

P-values appear all over science, yet many people find the concept a bit intimidating. Don't worry – in this article, we will explain not only what the p-value is but also how to interpret p-values correctly . Have you ever been curious about how to calculate the p-value by hand? We provide you with all the necessary formulae as well!

🙋 If you want to revise some basics from statistics, our normal distribution calculator is an excellent place to start.

Formally, the p-value is the probability that the test statistic will produce values at least as extreme as the value it produced for your sample . It is crucial to remember that this probability is calculated under the assumption that the null hypothesis H 0 is true !

More intuitively, p-value answers the question:

Assuming that I live in a world where the null hypothesis holds, how probable is it that, for another sample, the test I'm performing will generate a value at least as extreme as the one I observed for the sample I already have?

It is the alternative hypothesis that determines what "extreme" actually means , so the p-value depends on the alternative hypothesis that you state: left-tailed, right-tailed, or two-tailed. In the formulas below, S stands for a test statistic, x for the value it produced for a given sample, and Pr(event | H 0 ) is the probability of an event, calculated under the assumption that H 0 is true:

Left-tailed test: p-value = Pr(S ≤ x | H 0 )

Right-tailed test: p-value = Pr(S ≥ x | H 0 )

Two-tailed test:

p-value = 2 × min{Pr(S ≤ x | H 0 ), Pr(S ≥ x | H 0 )}

(By min{a,b} , we denote the smaller number out of a and b .)

If the distribution of the test statistic under H 0 is symmetric about 0 , then: p-value = 2 × Pr(S ≥ |x| | H 0 )

or, equivalently: p-value = 2 × Pr(S ≤ -|x| | H 0 )

As a picture is worth a thousand words, let us illustrate these definitions. Here, we use the fact that the probability can be neatly depicted as the area under the density curve for a given distribution. We give two sets of pictures: one for a symmetric distribution and the other for a skewed (non-symmetric) distribution.

  • Symmetric case: normal distribution:

p-values for symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

  • Non-symmetric case: chi-squared distribution:

p-values for non-symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

In the last picture (two-tailed p-value for skewed distribution), the area of the left-hand side is equal to the area of the right-hand side.

To determine the p-value, you need to know the distribution of your test statistic under the assumption that the null hypothesis is true . Then, with the help of the cumulative distribution function ( cdf ) of this distribution, we can express the probability of the test statistics being at least as extreme as its value x for the sample:

Left-tailed test:

p-value = cdf(x) .

Right-tailed test:

p-value = 1 - cdf(x) .

p-value = 2 × min{cdf(x) , 1 - cdf(x)} .

If the distribution of the test statistic under H 0 is symmetric about 0 , then a two-sided p-value can be simplified to p-value = 2 × cdf(-|x|) , or, equivalently, as p-value = 2 - 2 × cdf(|x|) .

The probability distributions that are most widespread in hypothesis testing tend to have complicated cdf formulae, and finding the p-value by hand may not be possible. You'll likely need to resort to a computer or to a statistical table, where people have gathered approximate cdf values.

Well, you now know how to calculate the p-value, but… why do you need to calculate this number in the first place? In hypothesis testing, the p-value approach is an alternative to the critical value approach . Recall that the latter requires researchers to pre-set the significance level, α, which is the probability of rejecting the null hypothesis when it is true (so of type I error ). Once you have your p-value, you just need to compare it with any given α to quickly decide whether or not to reject the null hypothesis at that significance level, α. For details, check the next section, where we explain how to interpret p-values.

As we have mentioned above, the p-value is the answer to the following question:

What does that mean for you? Well, you've got two options:

  • A high p-value means that your data is highly compatible with the null hypothesis; and
  • A small p-value provides evidence against the null hypothesis , as it means that your result would be very improbable if the null hypothesis were true.

However, it may happen that the null hypothesis is true, but your sample is highly unusual! For example, imagine we studied the effect of a new drug and got a p-value of 0.03 . This means that in 3% of similar studies, random chance alone would still be able to produce the value of the test statistic that we obtained, or a value even more extreme, even if the drug had no effect at all!

The question "what is p-value" can also be answered as follows: p-value is the smallest level of significance at which the null hypothesis would be rejected. So, if you now want to make a decision on the null hypothesis at some significance level α , just compare your p-value with α :

  • If p-value ≤ α , then you reject the null hypothesis and accept the alternative hypothesis; and
  • If p-value ≥ α , then you don't have enough evidence to reject the null hypothesis.

Obviously, the fate of the null hypothesis depends on α . For instance, if the p-value was 0.03 , we would reject the null hypothesis at a significance level of 0.05 , but not at a level of 0.01 . That's why the significance level should be stated in advance and not adapted conveniently after the p-value has been established! A significance level of 0.05 is the most common value, but there's nothing magical about it. Here, you can see what too strong a faith in the 0.05 threshold can lead to. It's always best to report the p-value, and allow the reader to make their own conclusions.

Also, bear in mind that subject area expertise (and common reason) is crucial. Otherwise, mindlessly applying statistical principles, you can easily arrive at statistically significant, despite the conclusion being 100% untrue.

As our p-value calculator is here at your service, you no longer need to wonder how to find p-value from all those complicated test statistics! Here are the steps you need to follow:

Pick the alternative hypothesis : two-tailed, right-tailed, or left-tailed.

Tell us the distribution of your test statistic under the null hypothesis: is it N(0,1), t-Student, chi-squared, or Snedecor's F? If you are unsure, check the sections below, as they are devoted to these distributions.

If needed, specify the degrees of freedom of the test statistic's distribution.

Enter the value of test statistic computed for your data sample.

Our calculator determines the p-value from the test statistic and provides the decision to be made about the null hypothesis. The standard significance level is 0.05 by default.

Go to the advanced mode if you need to increase the precision with which the calculations are performed or change the significance level .

In terms of the cumulative distribution function (cdf) of the standard normal distribution, which is traditionally denoted by Φ , the p-value is given by the following formulae:

Left-tailed z-test:

p-value = Φ(Z score )

Right-tailed z-test:

p-value = 1 - Φ(Z score )

Two-tailed z-test:

p-value = 2 × Φ(−|Z score |)

p-value = 2 - 2 × Φ(|Z score |)

🙋 To learn more about Z-tests, head to Omni's Z-test calculator .

We use the Z-score if the test statistic approximately follows the standard normal distribution N(0,1) . Thanks to the central limit theorem, you can count on the approximation if you have a large sample (say at least 50 data points) and treat your distribution as normal.

A Z-test most often refers to testing the population mean , or the difference between two population means, in particular between two proportions. You can also find Z-tests in maximum likelihood estimations.

The p-value from the t-score is given by the following formulae, in which cdf t,d stands for the cumulative distribution function of the t-Student distribution with d degrees of freedom:

Left-tailed t-test:

p-value = cdf t,d (t score )

Right-tailed t-test:

p-value = 1 - cdf t,d (t score )

Two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

p-value = 2 - 2 × cdf t,d (|t score |)

Use the t-score option if your test statistic follows the t-Student distribution . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails – the exact shape depends on the parameter called the degrees of freedom . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from the normal distribution N(0,1).

The most common t-tests are those for population means with an unknown population standard deviation, or for the difference between means of two populations , with either equal or unequal yet unknown population standard deviations. There's also a t-test for paired (dependent) samples .

🙋 To get more insights into t-statistics, we recommend using our t-test calculator .

Use the χ²-score option when performing a test in which the test statistic follows the χ²-distribution .

This distribution arises if, for example, you take the sum of squared variables, each following the normal distribution N(0,1). Remember to check the number of degrees of freedom of the χ²-distribution of your test statistic!

How to find the p-value from chi-square-score ? You can do it with the help of the following formulae, in which cdf χ²,d denotes the cumulative distribution function of the χ²-distribution with d degrees of freedom:

Left-tailed χ²-test:

p-value = cdf χ²,d (χ² score )

Right-tailed χ²-test:

p-value = 1 - cdf χ²,d (χ² score )

Remember that χ²-tests for goodness-of-fit and independence are right-tailed tests! (see below)

Two-tailed χ²-test:

p-value = 2 × min{cdf χ²,d (χ² score ), 1 - cdf χ²,d (χ² score )}

(By min{a,b} , we denote the smaller of the numbers a and b .)

The most popular tests which lead to a χ²-score are the following:

Testing whether the variance of normally distributed data has some pre-determined value. In this case, the test statistic has the χ²-distribution with n - 1 degrees of freedom, where n is the sample size. This can be a one-tailed or two-tailed test .

Goodness-of-fit test checks whether the empirical (sample) distribution agrees with some expected probability distribution. In this case, the test statistic follows the χ²-distribution with k - 1 degrees of freedom, where k is the number of classes into which the sample is divided. This is a right-tailed test .

Independence test is used to determine if there is a statistically significant relationship between two variables. In this case, its test statistic is based on the contingency table and follows the χ²-distribution with (r - 1)(c - 1) degrees of freedom, where r is the number of rows, and c is the number of columns in this contingency table. This also is a right-tailed test .

Finally, the F-score option should be used when you perform a test in which the test statistic follows the F-distribution , also known as the Fisher–Snedecor distribution. The exact shape of an F-distribution depends on two degrees of freedom .

To see where those degrees of freedom come from, consider the independent random variables X and Y , which both follow the χ²-distributions with d 1 and d 2 degrees of freedom, respectively. In that case, the ratio (X/d 1 )/(Y/d 2 ) follows the F-distribution, with (d 1 , d 2 ) -degrees of freedom. For this reason, the two parameters d 1 and d 2 are also called the numerator and denominator degrees of freedom .

The p-value from F-score is given by the following formulae, where we let cdf F,d1,d2 denote the cumulative distribution function of the F-distribution, with (d 1 , d 2 ) -degrees of freedom:

Left-tailed F-test:

p-value = cdf F,d1,d2 (F score )

Right-tailed F-test:

p-value = 1 - cdf F,d1,d2 (F score )

Two-tailed F-test:

p-value = 2 × min{cdf F,d1,d2 (F score ), 1 - cdf F,d1,d2 (F score )}

Below we list the most important tests that produce F-scores. All of them are right-tailed tests .

A test for the equality of variances in two normally distributed populations . Its test statistic follows the F-distribution with (n - 1, m - 1) -degrees of freedom, where n and m are the respective sample sizes.

ANOVA is used to test the equality of means in three or more groups that come from normally distributed populations with equal variances. We arrive at the F-distribution with (k - 1, n - k) -degrees of freedom, where k is the number of groups, and n is the total sample size (in all groups together).

A test for overall significance of regression analysis . The test statistic has an F-distribution with (k - 1, n - k) -degrees of freedom, where n is the sample size, and k is the number of variables (including the intercept).

With the presence of the linear relationship having been established in your data sample with the above test, you can calculate the coefficient of determination, R 2 , which indicates the strength of this relationship . You can do it by hand or use our coefficient of determination calculator .

A test to compare two nested regression models . The test statistic follows the F-distribution with (k 2 - k 1 , n - k 2 ) -degrees of freedom, where k 1 and k 2 are the numbers of variables in the smaller and bigger models, respectively, and n is the sample size.

You may notice that the F-test of an overall significance is a particular form of the F-test for comparing two nested models: it tests whether our model does significantly better than the model with no predictors (i.e., the intercept-only model).

Can p-value be negative?

No, the p-value cannot be negative. This is because probabilities cannot be negative, and the p-value is the probability of the test statistic satisfying certain conditions.

What does a high p-value mean?

A high p-value means that under the null hypothesis, there's a high probability that for another sample, the test statistic will generate a value at least as extreme as the one observed in the sample you already have. A high p-value doesn't allow you to reject the null hypothesis.

What does a low p-value mean?

A low p-value means that under the null hypothesis, there's little probability that for another sample, the test statistic will generate a value at least as extreme as the one observed for the sample you already have. A low p-value is evidence in favor of the alternative hypothesis – it allows you to reject the null hypothesis.

Car vs. Bike

Coefficient of determination, grams to cups, sampling distribution of the sample proportion.

  • Biology (100)
  • Chemistry (100)
  • Construction (144)
  • Conversion (295)
  • Ecology (30)
  • Everyday life (262)
  • Finance (570)
  • Health (440)
  • Physics (510)
  • Sports (105)
  • Statistics (184)
  • Other (183)
  • Discover Omni (40)
  • Newsletters

Site search

  • Israel-Hamas war
  • Home Planet
  • 2024 election
  • Supreme Court
  • All explainers
  • Future Perfect

Filed under:

800 scientists say it’s time to abandon “statistical significance”

P-values and “statistical significance” are widely misunderstood. Here’s what they actually mean.

Share this story

  • Share this on Facebook
  • Share this on Twitter
  • Share this on Reddit
  • Share All sharing options

Share All sharing options for: 800 scientists say it’s time to abandon “statistical significance”

what is the p value in research

For too long, many scientists’ careers have been built around the pursuit of a single statistic: p<.05.

In many scientific disciplines, that’s the threshold beyond which study results can be declared “statistically significant,” which is often interpreted to mean that it’s unlikely the results were a fluke, a result of random chance.

Though this isn’t what it actually means in practice. “Statistical significance” is too often misunderstood — and misused. That’s why a trio of scientists writing in Nature this week are calling “for the entire concept of statistical significance to be abandoned.”

Their biggest argument: “Statistically significant” or “not statistically significant” is too often easily misinterpreted to mean either “the study worked” or “the study did not work.” A “true” effect can sometimes yield a p-value of greater than .05. And we know from recent years that science is rife with false-positive studies that achieved values of less than .05 (read my explainer on the replication crisis in social science for more).

The Nature commentary authors argue that the math is not the problem. Instead, it’s human psychology. Bucketing results into “statistically significant” and “statistically non-significant,” they write, leads to a too black-and-white approach to scrutinizing science.

More than 800 other scientists and statisticians across the world have signed on to this manifesto. For now, it seems more like a provocative argument than the start of a real sea change. “ Nature,” for one, “is not seeking to change how it considers statistical analysis in evaluation of papers at this time,” the journal noted.

But the tides may be rising against “statistical significance.” This isn’t the first time scientists and statisticians have challenged the status quo. In 2016, I wrote about how a large group of them called for raising the threshold to .005, making it much harder to call a result “statistically significant.” (Concurrently, with the Nature commentary, the journal The American Statistician devoted an entire issue to the problem of “statistical significance.”) There’s a wide recognition that p-values can be problematic.

I suspect this proposal will be heavily debated (as is everything in science). At least this latest call for radical change does highlight an important fact plaguing science: Statistical significance is widely misunderstood. Let me walk you through it. I think it will help you understand this debate better, and help you see that there are a lot more ways to judge the merits of a scientific finding than p-values.

Wait, what is a p-value? What’s statistical significance?

what is the p value in research

Even the simplest definitions of p-values tend to get complicated, so bear with me as I break it down.

When researchers calculate a p-value, they’re putting to the test what’s known as the null hypothesis. First thing to know: This is not a test of the question the experimenter most desperately wants to answer.

Let’s say the experimenter really wants to know if eating one bar of chocolate a day leads to weight loss. To test that, they assign 50 participants to eat one bar of chocolate a day. Another 50 are commanded to abstain from the delicious stuff. Both groups are weighed before the experiment and then after, and their average weight change is compared.

The null hypothesis is the devil’s advocate argument. It states there is no difference in the weight loss of the chocolate eaters versus the chocolate abstainers.

Rejecting the null is a major hurdle scientists need to clear to prove their hypothesis. If the null stands, it means they haven’t eliminated a major alternative explanation for their results. And what is science if not a process of narrowing down explanations?

So how do they rule out the null? They calculate some statistics.

The researcher basically asks: How ridiculous would it be to believe the null hypothesis is the true answer, given the results we’re seeing?

Rejecting the null is kind of like the “innocent until proven guilty” principle in court cases, Regina Nuzzo, a mathematics professor at Gallaudet University, explained. In court, you start off with the assumption that the defendant is innocent. Then you start looking at the evidence: the bloody knife with his fingerprints on it, his history of violence, eyewitness accounts. As the evidence mounts, that presumption of innocence starts to look naive. At a certain point, jurors get the feeling, beyond a reasonable doubt, that the defendant is not innocent.

Null hypothesis testing follows a similar logic: If there are huge and consistent weight differences between the chocolate eaters and chocolate abstainers, the null hypothesis — that there are no weight differences — starts to look silly and you can reject it.

You might be thinking: Isn’t this a pretty roundabout way to prove an experiment worked?

You are correct!

Rejecting the null hypothesis is indirect evidence of an experimental hypothesis. It says nothing about whether your scientific conclusion is correct.

Sure, the chocolate eaters may lose some weight. But is it because of the chocolate? Maybe. Or maybe they felt extra guilty eating candy every day, and they knew they were going to be weighed by strangers wearing lab coats (weird!), so they skimped on other meals.

Rejecting the null doesn’t tell you anything about the mechanism by which chocolate causes weight loss. It doesn’t tell you if the experiment is well designed, or well controlled for, or if the results have been cherry-picked.

It just helps you understand how rare the results are.

But — and this is a tricky, tricky point — it’s not how rare the results of your experiment are. It’s how rare the results would be in the world where the null hypothesis is true. That is, it’s how rare the results would be if nothing in your experiment worked and the difference in weight was due to random chance alone.

Here’s where the p-value comes in: The p-value quantifies this rareness. It tells you how often you’d see the numerical results of an experiment — or even more extreme results — if the null hypothesis is true and there’s no difference between the groups.

If the p-value is very small, it means the numbers would rarely (but not never!) occur by chance alone. So when the p is small, researchers start to think the null hypothesis looks improbable. And they take a leap to conclude “their [experimental] data are pretty unlikely to be due to random chance,” Nuzzo explains.

Here’s another tricky point: Researchers can never completely rule out the null (just like jurors are not firsthand witnesses to a crime). So scientists instead pick a threshold where they feel pretty confident that they can reject the null. For many disciplines, that’s now set at less than .05.

Ideally, a p of .05 means if you ran the experiment 100 times — again, assuming the null hypothesis is true — you’d see these same numbers (or more extreme results) five times.

And one last, super-thorny concept that almost everyone gets wrong: A p<.05 does not mean there’s less than a 5 percent chance your experimental results are due to random chance. It does not mean there’s only a 5 percent chance you’ve landed on a false positive. Nope. Not at all.

Again: A p-value of less than .05 means that there is less than a 5 percent chance of seeing these results (or more extreme results), in the world where the null hypothesis is true. This sounds nitpicky, but it’s critical. It’s the misunderstanding that leads people to be unduly confident in p-values. The false-positive rate for experiments at p=.05 can be much higher than 5 percent .

Let’s repeat it: P-values don’t necessarily tell you if an experiment “worked” or not

Psychology PhD student Kristoffer Magnusson has designed a pretty cool interactive calculator that estimates the probability of obtaining a range of p-values for any given true difference between groups. I used it to create the following scenario.

Let’s say there’s a study where the actual difference between two groups is equal to half a standard deviation. (Yes, this is a nerdy way of putting it. But think of it like this: It means 69 percent of those in the experimental group show results higher than the mean of the control group. Researchers call this a “medium-size” effect.) And let’s say there are 50 people each in the experimental group and the control group.

In this scenario , you should only be able to obtain a p-value between .03 and .05 around 7.62 percent of the time.

If you ran this experiment over and over and over again, you’d actually expect to see a lot more p-values with a much lower number. That’s what the following chart shows. The x-axis is the specific p-values, and the y-axis is the frequency you’d find them repeating this experiment. Look how many p-values you’d find below .001.

what is the p value in research

This is why many scientists get wary when they see too many results cluster around .05. It shouldn’t happen that often and raises red flags that the results have been cherry-picked, or, in science-speak, “p-hacked.” In science, it can be much too easy to game and tweak statistics to achieve significance.

And from this chart, you’ll see: Yes, you can obtain a p-value of greater than .05 when an experimental hypothesis is true. It just shouldn’t happen as often. In this case, around 9.84 percent of all p-values should fall between .05 and .1.

There are better, more nuanced approaches to evaluating science

Many scientists recognize there are more robust ways to evaluate a scientific finding. And they already engage in them. But they, somehow, don’t currently hold as much power as “statistical significance.” They are:

  • Concentrating on effect sizes (how big of a difference does an intervention make, and is it practically meaningful?)
  • Confidence intervals (what’s the range of doubt built into any given answer?)
  • Whether a result is novel study or a replication (put some more weight into a theory many labs have looked into)
  • Whether a study’s design was preregistered (so that authors can’t manipulate their results post-test), and that the underlying data is freely accessible (so anyone can check the math)
  • There are also alternative statistical techniques — like Bayesian analysis — that in some ways more directly evaluate a study’s results. (P-values ask the question “how rare are my results?” Bayes factors ask the question “what is the probability my hypothesis is the best explanation for the results we found?” Both approaches have trade-offs. )

The real problem isn’t with statistical significance; it’s with the culture of science

The authors of the latest Nature commentary aren’t calling for the end of p-values. They’d still like scientists to report them where appropriate, but not necessarily label them “significant” or not.

There’s likely to be argument around this strategy. Some might think it’s useful to have simple rules of thumb, or thresholds, to evaluate science. And we still need to have phrases in our language to describe scientific results. Erasing “statistical significance” might just confuse things.

In any case, changing the definition of statistical significance, or nixing it entirely, doesn’t address the real problem. And the real problem is the culture of science.

In 2016, Vox sent out a survey to more than 200 scientists asking, “If you could change one thing about how science works today, what would it be and why?” One of the clear themes in the responses: The institutions of science need to get better at rewarding failure.

One young scientist told us, “I feel torn between asking questions that I know will lead to statistical significance and asking questions that matter.”

The biggest problem in science isn’t statistical significance; it’s the culture. She felt torn because young scientists need publications to get jobs. Under the status quo, in order to get publications, you need statistically significant results. Statistical significance alone didn’t lead to the replication crisis. The institutions of science incentivized the behaviors that allowed it to fester.

Will you support Vox today?

We believe that everyone deserves to understand the world that they live in. That kind of knowledge helps create better citizens, neighbors, friends, parents, and stewards of this planet. Producing deeply researched, explanatory journalism takes resources. You can support this mission by making a financial gift to Vox today. Will you join us?

We accept credit card, Apple Pay, and Google Pay. You can also contribute via

what is the p value in research

Next Up In The Latest

Sign up for the newsletter today, explained.

Understand the world with a daily explainer plus the most compelling stories of the day.

Thanks for signing up!

Check your inbox for a welcome email.

Oops. Something went wrong. Please enter a valid email and try again.

what is the p value in research

How AI tells Israel who to bomb

Orange glowing lights blaze in the night sky over a dark city skyline.

Vox podcasts tackle the Israel-Hamas war 

Palestinians in a packed car with mattresses stacked on top.

Israel’s Rafah operation, explained

what is the p value in research

Eurovision is supposed to be fun and silly. This year is different.

Drake onstage with his arms open wide.

The Drake vs. Kendrick Lamar feud, explained

Rihanna, dressed in a large yellow fur cape, stands at the bottom of the steps to the Metropolitan Museum of Art. A cameraman in front of her aims the lens in her direction, and dozens of photographers line the stairs, taking pictures.

Why the Met Gala still matters

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 15, Issue 2
  • What is a p value and what does it mean?
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Dorothy Anne Forbes
  • Correspondence to Dorothy Anne Forbes Faculty of Nursing, University of Alberta, Level 3, Edmonton Clinic Health Academy, Edmonton, Alberta, T6G 1C9, Canada; dorothy.forbes{at}ualberta.ca

https://doi.org/10.1136/ebnurs-2012-100524

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Researchers aim to make the strongest possible conclusions from limited amounts of data. To do this, they need to overcome two problems. First, important differences in the findings can be obscured by natural variability and experimental imprecision. Thus, it is difficult to distinguish real differences from random variability. Second, researchers' natural inclination is to conclude that differences are real, and to minimise the contribution of random variability. Statistical probability minimises this from happening. 1

Statistical probability or p values reveal whether the findings in a research study are statistically significant, meaning that the findings are unlikely to have occurred by chance. To understand the p value concept, it is important to understand its relationship with the α level. Before conducting a study, researchers specify the α level which is most often set at 0.05 (5%). This conventional level was based on the writings of Sir Ronald Fisher, an influential statistician, who in 1926 reported that he preferred the 0.05 cut-off for separating the probable from the improbable. 2 Researchers who set α at 0.05 are willing to accept that there is a 5% chance that their findings are wrong. However, researchers may adopt probability cut-offs that are more generous (eg, an α set at 0.10 means there is a 10% chance that the conclusions are wrong) or more stringent (eg, an α set at 0.01 means there is a 1% chance that the conclusions are wrong). The design of the study, purpose or intuition may influence the researcher's setting of the α level. 2

To illustrate how setting the α level may affect the conclusions of a study, let us examine a research study that compared the annual incomes of hospital based nurses and community based nurses. The mean annual income for hospital based nurses was reported to be $70 000 and for community based nurses to be $60 000. The p value of this study was 0.08. If the researchers set the α level at 0.05, they would conclude that there was no significant difference between the annual incomes of hospital and community-based nurses, since the p value of 0.08 exceeded the α level of 0.05. However, if the α level had been set at 0.10, the p value of 0.08 would be less than the α level and the researchers would conclude that there was a significant difference between the annual incomes of hospital and community based nurses. Two very different conclusions. 3

It is easy to read far too much into the word significant because the statistical use of the word has a meaning entirely distinct from its usual meaning. Just because a difference is statistically significant does not mean that it is important or interesting. In the example above, at the 0.10 α level, although the findings are statistically significant, results due to chance occur 1 out of 10 times. Thus, chance of conclusion error is higher than when the α level is set at 0.05 and results due to chance occur 5 out of 100 times or 1 in 20 times. In the end, the reader must decide if the researchers selected the appropriate α level and whether the conclusions are meaningful or not.

  • ↵ Graphpad . What is a p value ? 2011 . http://www.graphpad.com/articles/pvalue.htm (accessed 10 Dec 2011) .
  • Munroe BH ,
  • Jacobsen BS
  • El-Masri MM

Competing interests None.

Read the full text or download the PDF:

  • How it works

P-Value: A Complete Guide

Published by Owen Ingram at August 31st, 2021 , Revised On August 3, 2023

You might have come across this term many times in hypothesis testing .  Can you tell me what p-value is and how to calculate it? For those who are new to this term, sit back and read this guide to find out all the answers. Those already familiar with it, continue reading because you might get a chance to dig deeper about the p-value and its significance in statistics .

Before we start with what a p-value is, there are a few other terms you must be clear of. And these are the null hypothesis and alternative hypothesis .

What are the Null Hypothesis and Alternative Hypothesis?

 The alternative hypothesis is your first hypothesis predicting a relationship between different variables . On the contrary, the null hypothesis predicts that there is no relationship between the variables you are playing with.

For instance, if you want to check the impact of two fertilizers on the growth of two sets of plants. Group A of plants is given fertilizer A, while B is given fertilizer B. Now by using a two-tailed t-test , you can find out the difference between the two fertilizers.

Null Hypothesis : There is no difference in growth between the two sets of plants.

Alternative Hypothesis: There is a difference in growth between the two groups.

What is the P-value?

The p-value in statistics is the probability of getting outcomes at least as extreme as the outcomes of a statistical hypothesis test, assuming the null hypothesis to be correct. To put it in simpler words, it is a calculated number from a statistical test that shows how likely you are to have found a set of observations if the null hypothesis were plausible.

This means that p-values are used as alternatives to rejection points for providing the smallest level of significance at which the null hypothesis can be rejected . If the p-value is small, it implies that the evidence in favour of the alternative hypothesis is bigger. Similarly, if the value is big, the evidence in favour of the alternative hypothesis would be small.

How is the P-value Calculated?

You can either use the p-value tables or statistical software to calculate the p-value. The calculated numbers are based on the known probability distribution of the statistic being tested.

The online p-value tables depict how frequently you can expect to see test statistics under the null hypothesis. P-value depends on the statistical test one uses to test a hypothesis.

  • Different statistical tests can have different predictions, hence developing different test statistics. Researchers can choose a statistical test depending on what best suits their data and the effect they want to test
  • The number of independent variables in your test determines how large or small the test statistic must be to produce the same p-value

Get statistical analysis help at an affordable price

  • An expert statistician will complete your work
  • Rigorous quality checks
  • Confidentiality and reliability
  • Any statistical software of your choice
  • Free Plagiarism Report

Get statistical analysis help at an affordable price

When is a P-value Statistically Significant?

Before we talk about when a p-value is statistically significant, let’s first find out what does it mean to be statistically significant.

Any guesses?

To be statistically significant is another way of saying that a p-value is so small that it might reject a null hypothesis.

Now the question is how small?

If a p-value is smaller than 0.05 then it is statistically significant. This means that the evidence against the null hypothesis is strong. The fact that there is less than a 5 per cent chance of the null hypothesis being correct and plausible, we can accept the alternative hypothesis and reject the null hypothesis.

Nevertheless , if the p-value is less than the threshold of significance , the null hypothesis can be rejected, but that does not mean there would be a 95 percent probability of the alternative hypothesis being true. Note that the p-value is dependent or conditioned upon the null hypothesis is plausible, but it is not related to the correctness and falsity of the alternative hypothesis.

When the p-value is greater than 0.05, it is not statistically significant. It also indicates that the evidence for the null hypothesis is strong. So, the alternative hypothesis, in this case, is rejected, and the null hypothesis is retained. An important thing to keep in mind here is that you still cannot accept the null hypothesis. You can only fail to reject it or reject it.

Here is a table showing hypothesis interpretations:

Is it clear now? We thought so! Let’s move on to the next heading, then.

How to Use P-value in Hypothesis Testing?

Follow these three simple steps to use p-value in hypothesis testing .

Step 1: Find the level of significance. Make sure to choose the significance level during the initial steps of the design of a hypothesis test. It is usually 0.10, 0.05, and 0.01.

Step 2: Now calculate the p-value. As we discussed earlier, there are two ways of calculating it. A simple way out would be using Microsoft Excel, which allows p-value calculation with Data Analysis ToolPak .

Step 3: Start comparing the p-value with the significance level and deduce conclusions accordingly. Following the general rule, if the value is less than the level of significance, there is enough evidence to reject the null hypothesis of an experiment.

FAQs About P-Value

What is a null hypothesis.

It is a statistical theory suggesting that there is no relationship between a set of variables .

What is an alternative hypothesis?

The alternative hypothesis is your first hypothesis predicting a relationship between different variables .

What is the p-value?

The p-value in statistics is the probability of getting outcomes at least as extreme as the outcomes of a statistical hypothesis test, assuming the null hypothesis to be correct. It is a calculated number from a statistical test that shows how likely you are to have found a set of observations if the null hypothesis were plausible.

What is the level of significance?

To be statistically significant is another way of saying that a p-value is so small that it might reject a null hypothesis. This table shows when the p-value is significant.

You May Also Like

A two-way ANOVA test examines the impact of independent variables on the expected outcome as well as their relationship to the outcome.

Confidence intervals tell us how confident we are in our results. It’s a range of values, bounded above, and below the statistical mean, that is likely to be where our true value lies.

The standard normal distribution is a special kind of normal distribution where the mean is 0, and the standard deviation is 1.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works
  • Open access
  • Published: 24 June 2020

P-values – a chronic conundrum

  • Jian Gao   ORCID: orcid.org/0000-0001-8101-740X 1  

BMC Medical Research Methodology volume  20 , Article number:  167 ( 2020 ) Cite this article

22k Accesses

19 Citations

27 Altmetric

Metrics details

In medical research and practice, the p -value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain language and concrete examples, this paper is intended to elucidate the p -value confusion from its root, to explicate the difference between significance and hypothesis testing, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p -value.

The confusion with p -values has plagued the research community and medical practitioners for decades. However, efforts to clarify it have been largely futile, in part, because intuitive yet mathematically rigorous educational materials are scarce. Additionally, the lack of a practical alternative to the p -value for guarding against randomness also plays a role. The p -value confusion is rooted in the misconception of significance and hypothesis testing. Most, including many statisticians, are unaware that p -values and significance testing formed by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson. And most otherwise great statistics textbooks tend to cobble the two paradigms together and make no effort to elucidate the subtle but fundamental differences between them. The p -value is a practical tool gauging the “strength of evidence” against the null hypothesis. It informs investigators that a p -value of 0.001, for example, is stronger than 0.05. However, p -values produced in significance testing are not the probabilities of type I errors as commonly misconceived. For a p -value of 0.05, the chance a treatment does not work is not 5%; rather, it is at least 28.9%.

Conclusions

A long-overdue effort to understand p -values correctly is much needed. However, in medical research and practice, just banning significance testing and accepting uncertainty are not enough. Researchers, clinicians, and patients alike need to know the probability a treatment will or will not work. Thus, the calibrated p -values (the probability that a treatment does not work) should be reported in research papers.

Peer Review reports

Without any exaggeration, humankind’s wellbeing is profoundly affected by p -values: Health depends on prevention and intervention, ascertaining their efficacies relies on research, and research findings hinge on p -values. The p-value is a sine qua non for deciding if a research finding is real or by chance, a treatment is effective or even harmful, a paper will be accepted or rejected, a grant will be funded or declined, or if a drug will be approved or denied by U.S. Food & Drug Administration (FDA).

Yet, the misconception of p -values is pervasive and virtually universal. “The P value is probably the most ubiquitous and at the same time, misunderstood, misinterpreted, and occasionally miscalculated index in all of biomedical research [ 1 ].” Even “among statisticians there is a near ubiquitous misinterpretation of p values as frequentist error probabilities [ 2 ].”

The extent of the p -value confusion is well illustrated by a survey of medical residents published in the Journal of the American Medical Association ( JAMA) . In the article, 88% of the residents expressed fair to complete confidence in understanding p -values, but 100% of them had the p-value interpretation wrong [ 1 , 3 ]. Make no mistake, they are the future experts and leaders in clinical research that will affect public health policies, treatment options, and ultimately people’s health.

The survey published in JAMA used multiple-choice format with four potential answers for a correct interpretation of p  > 0.05 [ 3 ]:

The chances are greater than 1 in 20 that a difference would be found again if the study were repeated.

The probability is less than 1 in 20 that a difference this large could occur by chance alone.

The probability is greater than 1 in 20 that a difference this large could occur by chance alone.

The chance is 95% that the study is correct.

How could it be possible that 100% of the residents selected incorrect answers when one of the possible choices was supposed to be correct? As reported in the paper [ 3 ], 58.8% of the residents selected choice c which was designated by the authors as the correct answer. The irony is that choice c is not correct either. In fact, none of the four choices are correct. So, not only were the residents who picked choice c wrong but also the authors as well. Keep in mind, the paper was peer-reviewed and published by one of the most prestigious medical journals in the world.

This is no coincidence -- most otherwise great statistics textbooks make no effort or fail to clarify the massive confusion about p -values, and even provide outright wrong interpretations. The confusion is near-universal among medical researchers and clinicians [ 4 , 5 , 6 ].

Unfortunately, the misunderstanding of p -values is not inconsequential. For a p-value of 0.05, the chance a treatment doesn’t work is not 5%; rather, it is at least 28.9% [ 7 ].

After decades of misunderstanding and inaction, the pendulum of p -values finally started to swing in 2014 when the American Statistical Association (ASA) was taunted by two pairs of questions and answers on its discussion forum:

Q: Why do so many colleges and grad schools teach p  = 0.05?

A: Because that’s still what the scientific community and journal editors use.

Q: Why do so many people still use p  = 0.05?

A: Because that’s what they were taught in college or grad school.

The questions and answers, posted by George Cobb, Professor Emeritus of Mathematics & Statistics from Mount Holyoke College, spurred the ASA Board into action. In 2015, for the first time, the ASA board decided to take on the challenge of developing a policy statement on p -values, much like the American Heart Association (AHA) policy statement on dietary fat and heart disease. After months of preparation, in October 2015, a group of 20 experts gathered at the ASA Office in Alexandria, Virginia and laid out the roadmap during a two-day meeting. Over the next three months, multiple drafts of the p -value statement were produced. On January 29, 2016, the ASA Executive Committee approved the p -value statement with six principles listed on what p -values are or are not [ 8 ].

Although the statement hardly made any ripples in medical journals, it grabbed many statisticians’ attention and ignited a rebellion against p -values among some scientists. In March 2019, Nature published a comment with over 800 signatories calling for an end of significance testing with p  < 0.05 [ 9 ]. At the same time, the American Statistician that carried the ASA’s p -value statement published a special issue with 43 articles exploring ways to report results without statistical significance testing. Unfortunately, no consensus was reached for a better alternative in gauging the reliability of studies, and the authors even disagreed on whether the p -value should continue to be used or abandoned. The only agreement reached was the abolishment of significance testing as summarized in the special issue’s editorial: “statistically significant” – don’t say it and don’t use it [ 10 ].

So, for researchers, practitioners, and journals in the medical field, what will replace significance testing? And what is significance testing anyway? Is it different from hypothesis testing? Should p -values be banned too? If not, how should p-values be used and interpreted? In healthcare or medicine, we must accept uncertainty as the editorial of the special issue urged, but do we need to know how likely a given treatment will work or not?

To answer these questions, we must get to the bottom of the misconception and confusion, and we must identify a practical alternative(s). However, despite numerous publications on this topic, few studies aimed for these goals are understandable to non-statisticians and retain mathematical rigor at the same time. This paper is intended to fill this gap by using plain language and concrete examples to elucidate the p -value confusion from its root, to intuitively describe the true meaning of p -values, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p -value.

The root of confusion

The p-value confusion began 100 years ago when the father of modern statistics, Ronald Aylmer Fisher, formed the paradigm of significance testing. But it should be noted Fisher bears no blame for the misconception; it is the users who tend to muddle Fisher’s significance testing with hypothesis testing developed by Jerzy Neyman and Egon Pearson. To clarify the confusion, this section uses concrete examples and plain language to illustrate the essence of significance and hypothesis testing and to explicate the difference between the p -value and the type I error.

  • Significance testing

Suppose a painkiller has a proven track record of lasting for 24 h and now another drug manufacturer claims its new over-the-counter painkiller lasts longer. An investigator wants to test if the claim is true. Instead of collecting data from all the patients who took the new medication, which is often infeasible, the investigator decided to randomly survey 50 patients to gather data on how long (hours) the new painkiller lasts. Thus, the investigator now has a random variable \( \overline{X} \) , the average hours from a sample of 50 patients. This is a random variable because the 50 patients are randomly selected, and nobody knows what value this variable will take before conducting the survey and calculating the average. Nevertheless, each survey does produce a fixed number, \( \overline{x} \) , which itself is not a random variable, rather it is a realization or observation of the random variable \( \overline{X} \) (hereafter, let \( \overline{X} \) denote a random variable and \( \overline{x} \) denote a fixed value, an observation of \( \overline{X} \) ).

Intuitively, if the survey yielded a value (average hours the painkiller lasts) very close to 24, say, 23 or 25, the investigator would not believe the new painkiller is worse or better. If the survey came to an average of 32 h the investigator would believe it indeed lasts longer. However, it would be hard to form an opinion if the survey showed an average of 22 or 26 h. Does the new painkiller really last shorter, longer, or it is due to random chance (after all, only 50 patients were surveyed)?

This is where the significance test formulated by Fisher in the 1920s comes in. Note that although modern significance testing began with the Student’s t -test in 1908, it was Fisher who extended the test to the testing of two samples, regression coefficients, as well as analysis of variance, and created the paradigm of significance testing.

In Fisher’s significance testing, the Central Limit Theorem (CLT) plays a vital role, which states that given a population with a mean of μ and a variance of σ 2 , regardless of the shape of its distribution, the sample mean statistic \( \overline{X} \) has a normal distribution with the same mean μ and variance σ 2 /n, or \( \frac{\left(\overline{X}-\upmu \right)}{\sigma /\sqrt{n}} \) has a standard normal distribution with a mean of 0 and a variance of 1, as long as the sample size n is large enough. In practice, the distribution of the study population is often unknown and n  ≥ 30 is considered sufficient for the sample mean statistic to have an approximately normal distribution.

In conducting the significance test, a null hypothesis is first formed, i.e., there is no difference between the new and old painkillers, or the new painkiller also lasts for 24 h (the mean of \( \overline{X} \) = μ =24). Under this assumption and based on CLT, \( \overline{X} \) is normally distributed with a mean of 24 and a variance of σ 2 /50. Assume σ 2  = 200 (typically σ 2 is unknown but can be estimated), then \( \overline{X} \) has a normal distribution N (24, 2) , or \( Z=\left(\overline{X}-24\right)/2 \) has a standard normal distribution with a mean of 0 and standard deviation of 1 (Z is a standardized random variable). The next step is to calculate z = \( \mid \overline{x}-24\mid /2 \) based on the survey data and then find the p -value or the probability of |Z| > z from a normal distribution table (z is a fixed value or an observation of Z). Fisher suggested if the p -value is smaller than 0.05 then the hypothesis is rejected. He argued that the farther the sample mean \( \overline{x} \) from the population mean μ, the smaller the p -value, the less likely the null hypothesis is true. Just as Fisher stated, “Either an exceptionally rare chance has occurred or the theory [H 0 ] is not true [ 11 ].”

Based on this paradigm, if the survey came back with an average of 26 h, i.e., \( \overline{x}=26, \) then z  = 1 and p  = 0.3173, as a result, the investigator accepts the null hypothesis (orthodoxically, fails to reject the null hypothesis), i.e., the new painkiller does not last longer and the difference between 24 and 26 h is due to chance or random factors. On the other hand, if the survey revealed an average of 28 h, i.e., \( \overline{x}=28, \) then z  = 2, and p  = 0.0455, thus the null hypothesis is rejected. In other words, the new painkiller is deemed lasting longer.

Now, can the p -value, 0.0455, be interpreted as the probability of the type I error, or only 4.55% chance the new painkiller does not last longer (no difference), or the probability that the difference between 24 and 28 h is due to chance, or the investigator could make a mistake by rejecting the null hypothesis but only wrong about 4.55% of the time? The answer is No.

So, what is a p -value? Precisely, a p-value tells us how often we would see a difference as extreme as or more extreme than what is observed if there really were no difference . Drawing a bell curve with the p -value on it will readily delineate this definition or concept.

In the example above, if the new painkiller also lasts for 24 h, the p-value of 0.0455 means there is a 4.55% chance that the investigator would observe \( \overline{x}\le 20 \) or \( \overline{x}\ge 28 \) ; it is not 4.55% chance the new painkiller also lasts for 24 h. It is categorically wrong to believe the p -value is the probability of the null hypothesis being true (there is no difference), or 1 – p is the probability of the null hypothesis being false (there is a difference) because the p -value is deduced based on the premise that the null hypothesis is true. The p-value, a conditional probability given H 0 is true, is totally invalidated if the null hypothesis is deemed not true.

In addition, p -values are data-dependent: each test (survey) produces a different p-value; for the same analysis, it is illogical to say the error rate or the type I error is 31.73% based on one sample (survey) and 4.55% based on another. There is no theoretical or empirical basis for such frequency interpretations. In fact, Fisher himself was fully aware that his p -value, a relative measure of evidence against the null hypothesis, does not bear any interpretation of the long-term error rate. When the p -value was misinterpreted, he protested the p-value was not the type I error rate, had no long-run frequentist characteristics, and should not be explained as a frequency of error if the test was repeated [ 12 ].

Interestingly, Fisher was an abstract thinker at the highest level, but often developed solutions and tests without solid theoretical foundation. He was an obstinate proponent of inductive inference, i.e., reasoning from specific to general, or from sample to population, which is reflected by his significance testing.

  • Hypothesis testing

On the contrary, mathematicians Jerzy Neyman and Egon Pearson dismissed the idea of inductive inference all together and insisted reasoning should be deductive, i.e., from general to specific. In 1928, they published the landmark paper on the theoretical foundation for a statistical inference method that they called “hypothesis test [ 12 ].” In the paper, they introduced the concepts of alternative hypothesis H 1, type I and type II errors, which were groundbreaking. The Neyman and Pearson’s hypothesis test is deductive in nature, i.e., reasoning from general to particular. The type I and type II errors, which must be set ahead, formulate a “rule of behavior” such that “in the long run of experience, we shall not be too often wrong,” as stated by Neyman and Pearson [ 13 ].

The hypothesis test can be illustrated by a four-step process with the painkiller example.

The first step is to lay out what the investigator seeks to test, i.e. to establish a null hypothesis, H 0, and an alternative hypothesis, H 1 :

The second step is to set the criteria for the decision, or to specify an acceptable rate of mistake if the test is conducted many times. Specifically, that is to set the probability of the type I error, α, and the probability of the type II error, β.

A type I error refers to the mistake of rejecting the null hypothesis when it is true (claiming the treatment works or the new drug lasts longer but actually it does not). Conventionally and almost universally, the probability of the type I error or α is set to 0.05, which means 5% of the time one will be wrong if carrying out the test many times. A type II error is the failure to reject the null hypothesis that is not true; the probability of the type II error, β, is conventionally set as 0.2, which is equivalent to a power of 0.8, the probability of detecting the difference if it exists. Table  1 summarizes the type I and type II errors.

The third step is to select a statistic and the associated distribution for the test. For the painkiller example, the statistic is Z = ( \( \overline{X}-24 \) )/2, and the distribution is the standard normal. Because the type I error has been set to 0.05 and Z has a standard normal distribution under the null hypothesis, as shown in Fig.  1 , 1.96 becomes the critical value, − 1.96  ≤ z ≤  1.96 becomes the acceptance region, and z < − 1.96 or z > 1.96 becomes the rejection regions.

figure 1

Standard Normal Distribution with Critical Value 1.96 

The final step is to calculate the z value and make a decision. Similar to significance testing, if the survey resulted in \( \overline{x}=26, \) then z = 1 < 1.96 and the investigator accepts the null hypothesis; if the survey revealed \( \overline{x}=28, \) then z = 2 > 1.96 and the investigator rejects the null hypothesis and accepts the alternative hypothesis. It is interesting to note, in significance testing, “one can never accept the null hypothesis, only failed to reject it,” while that is not the case in hypothesis testing.

Unlike Fisher’s significance test, the hypothesis test possesses a nice frequency explanation: one can be wrong by rejecting the null hypothesis but cannot be wrong more than 5% of the time in the long run if the test is performed many times. Quite intuitively, every time the null hypothesis is rejected (when z < − 1.96 or z > 1. 96) there is a chance that the null hypothesis is true, and a mistake is made. When the null hypothesis is true, Z = ( \( \overline{X}-24 \) )/2 is a random variable with a standard normal distribution as shown in Fig.  1 , thus 5% of the time z = ( \( \overline{x}-24 \) )/2 would fall outside (− 1.96, 1.96) and the decision will be wrong 5% of the time. Of course, when the null hypothesis is not true, rejecting it is not a mistake.

Noticeably, the p -value plays no role in hypothesis testing under the framework of the Neyman-Pearson paradigm [ 12 , 14 ]. However, most, including many statisticians, are unaware that p -values and significance testing created by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson [ 14 , 15 ], and many statistics textbooks tend to cobble them together [ 2 , 14 ]. The near-universal confusion is, at least in part, caused by the subtle similarities and differences between the two tests:

Both the significance and hypothesis tests use the same statistic and distribution, for example, Z = ( \( \overline{X}-24 \) )/2 and N (0, 1).

The hypothesis test compares the observed z with the critical value 1.96, while the significance test compares the p -value (based on z) to 0.05, which are linked by P (| Z | > 1.96) = 0.05.

The hypothesis test sets the type I error α at 0.05, while the significance test also uses 0.05 as the significance level.

One of the key differences is, for the p -value to be meaningful in significance testing, the null hypothesis must be true, while this is not the case for the critical value in hypothesis testing. Although the critical value is derived from α based on the null hypothesis, rejecting the null hypothesis is not a mistake when it is not true; when it is true, there is a 5% chance that z = ( \( \overline{x}-24 \) )/2 will fall outside (− 1.96, 1.96), and the investigator will be wrong 5% of the time (bear in mind, the null hypothesis is either true or false when a decision is made). In addition, the type I error and the resultant critical value is set ahead and fixed, while the p -value is a moving “target” varying from sample to sample even for the same test.

As if it was not confusing enough, the understanding and interpretation of p -values are also complicated by non-experimental studies where model misspecifications and even p-hacking are common, which often misleads the audience to believe the model and the findings are valid for its small p -values [ 16 ]. In fact, p-values have little value in assessing if the relationship between an outcome and exposure(s) is causal or just an artifact of confounding – one cannot claim the use of smartphones causes gun violence even if the p -value for their correlation is close to zero. To see the p-value problem at its core and to elucidate the confusion, the discussion of p-values should be in the context of experimental designs such as randomized controlled trials where the model or the outcome and exposure(s) are correctly specified.

The Link between P -values and Type I Errors

The p-value fallacy can be readily quantified under a Bayesian framework [ 7 , 17 , 18 ]. However, “those ominous words [Bayes theorem], with their associations of hazy prior probabilities and abstruse mathematical formulas, strike fear into the hearts of most us, clinician, researcher, and editor alike [ 19 ],” as Frank Davidoff, former Editor of the Annals of Internal Medicine , wrote. It is understandable but still unfortunate that Bayesian methods such as Bayes factors, despite their merit, are still considered exotic by the medical research community.

Thanks to Sellke, Bayarri, and Berger, the difference between the p -value and the type I error is quantified [ 7 ]. Based on the conditional frequentist approach, which was formalized by Kiefer and further developed by others [ 20 , 21 , 22 , 23 ], Berger and colleagues established the lower bound of the error rate P(H 0 │| Z| >z 0 ) or the type I error given the p-value [ 7 ]: 

As shown, the lower bound equation is mathematically straightforward. Noteworthy is that the derivation of the lower bound is also ingeniously elegant (a simplified proof is provided in the Supplementary File for those who are interested in it). The relationship between p -values and type I errors (lower bound) can be readily seen from Table  2 showing some of the commonly reported results [ 7 ].

As seen in Table 2 , the difference between p -values and the error probabilities (lower bound) is quite striking. A p-value of 0.05, commonly misinterpreted as only 5% chance the treatment does not work, seems to offer strong evidence against the null hypothesis; however, the true probability of the treatment not working is at least 0.289. Keep in mind, the relationship between the p-value and the type I error is the lower bound; in fact, many prefer to report the upper bound [ 6 , 7 ].

The discrepancy between the p-value and the lower-bound error rate explains the big puzzle of why so many wonder drugs and treatments worldwide lose their amazing power outside clinical trials [ 24 , 25 , 26 ]. This discrepancy likely also contributes to the frequently reported contradictory findings on risk factors and health outcomes in observational studies. For example, an early study published in the New England Journal of Medicine found drinking coffee was associated with a high risk of pancreatic cancer [ 27 ]. The finding became a big headline in The New York Times [ 28 ] and the leading author and probably many frightened readers stopped drinking coffee. Later studies, however, concluded the finding was a fluke [ 29 , 30 ]. Likewise, the p -value fallacy may also contribute to the ongoing confusion of dietary fat intake and heart disease. On the one hand, a meta-analysis published in Annals of Internal Medicine in 2014 concluded “Current evidence does not clearly support cardiovascular guidelines that encourage high consumption of polyunsaturated fatty acids and low consumption of total saturated fats [ 31 ].” On the other hand, in the 2017 recommendation, the American Heart Association (AHA) stated “Taking into consideration the totality of the scientific evidence, satisfying rigorous criteria for causality, we conclude strongly that lowering intake of saturated fat and replacing it with unsaturated fats, especially polyunsaturated fats, will lower the incidence of CVD [ 32 ].”

In short, the misunderstanding and misinterpretation of the relationship between the p -value and the type I error all too often exaggerate the true effects of treatments and risk factors, which in turn leads to conflicting findings with real public health consequences.

The future of P -values

It is readily apparent that the p-value conundrum poses a serious challenge to researchers and practitioners alike in healthcare with real-life consequences. To address the p-value complications, some believe the use of p -values should be banned or discouraged [ 33 , 34 ]. In fact, since 2015, Basic and Applied Social Psychology has officially banned significance tests and p-values [ 35 ], and Epidemiology has a longstanding policy discouraging the use of significance testing and p -values [ 36 , 37 ]. On the other hand, many are against a total ban [ 38 , 39 ]. P -values do possess practical utility -- they offer insight into what is observed and are the first line of defense against being fooled by randomness. You would be more suspicious of a coin being fair if nine heads turned up after ten flips versus, for example, if seven heads did. Similarly, you would like to see how strong the evidence is against the null hypothesis: say, a p-value of 0.0499 or 0.0001.

“It is hopeless to expect users to change their reliance on p -values unless they are offered an alternative way of judging the reliability of their conclusions [ 40 ].” Rather than banning the use of p-values, many believe the conventional significance level of 0.05 should be lowered for better research reproducibility [ 41 ]. In 2018, 72 statisticians and scientists made the case for changing p  < 0.05 to p  < 0.005 [ 42 ]. Inevitably, like most medical treatments, the proposed change is accompanied by some side effects: For instance, to achieve the same power of 80%, α = 0.005 requires a 70% larger sample size compared to α = 0.05, which could lead to fewer studies due to limited resources [ 43 ].

Other alternatives (e.g., second-generation p -values [ 44 ], and analysis of credibility [ 45 ]) have been proposed in the special issue of the American Statistician ; however, no consensus was reached. As a result, instead of recommending a ban of p -values, the accompanying editorial of the special issue called for an end of statistical significance testing [ 46 ]: “‘statistically significant’ – don’t say it and don’t use it [ 10 ].”

Will researchers and medical journals heed the “mandate” banning significance testing? It does not seem to be likely, at least thus far. Even if they do, it is no more than just a quibble – a significance test is done as long as the p -value is produced or reported – anyone seeing the result would know the p-value is greater or less than 0.05; the only difference is “Don’t ask, don’t tell.”

In any event, it is the right call to end dichotomizing the p-value and using it as the sole criterion to judge the results [ 47 ]. There is no practical difference between p  = 0.049 and p  = 0.051, and “God loves the .06 nearly as much as the .05 [ 48 ].” Furthermore, not all the results with a p -value close to 0.05 are valueless. Doctors and patients need to put p -values into context when making treatment choices, which can be well illustrated by a hypothetical but not unrealistic example. Suppose a study finds a new procedure (a kind of spine surgery) is effective in relieving debilitating neck and back pain with a p -value of 0.05, but when the procedure fails, it cripples the patient. If the patient believes there is only a 5% chance the procedure does not work or fails, he or she would probably take the chance. However, after learning the actual chance of failure is nearly 30% or higher based on the calibrated p-value, one would probably think twice. On the other hand, even if the p-value is 0.1 and the real chance of failure is nearly 40% or higher, if it does not cause serious side effects when the procedure fails, one would probably like to give it a try.

Taken together, in medicine or healthcare, the use of p -values needs more context (the balance of harms and benefits) than thresholds. However, banning significance testing and accepting uncertainty as called for by the editorial of the special issue are not enough [ 10 ]. When making treatment decisions, what researchers, practitioners, and patients alike need to know is the probability that a treatment does or does not work (the type I error). In this regard, the calibrated p -value, compared to other proposals [ 44 , 45 ], offers several advantages: (1) It provides a lower-bound, (2) it is fully frequentist although it can have a Bayesian interpretation, (3) it is easy to understand, and (4) it is easy to implement. Of course, other recommendations for improving the use of p -values may work well under different circumstances such as improving research reproducibility [ 49 , 50 ].

In medical research and practice, the p-value produced from significance testing has been widely misconstrued as the probability of the type I error, or the probability a treatment does not work. This misunderstanding comes with serious consequences: poor research reproducibility and inflated medical treatment effects. For a p -value of 0.05, the type I error or the chance a treatment does not work is not 5%; rather, it is at least 28.9%. Nevertheless, banning significance testing and accepting uncertainty, albeit well justified in many circumstances, offer little to apprise clinicians and patients of the probability a treatment will or will not work. In this respect, the calibrated p-value, a link between the p-value and the type I error, is practical and instructive.

In short, a long-overdue effort to understand p -values correctly is urgently needed and better education on statistical reasoning including Bayesian methods is desired [ 15 ]. Meanwhile, a rational action that medical journals can take is to require authors to report both conventional p-values and calibrated ones in research papers.

Availability of data and materials

Not applicable.

Abbreviations

U.S. Food & Drug Administration

Journal of the American Medical Association

American Statistical Association

Central Limit Theorem

American Heart Association

Goodman S. A dirty dozen: twelve p-value misconceptions. Semin Hematol. 2008;45(3):135–40.

Article   PubMed   Google Scholar  

Hubbard R, Bayarri MR. Confusion over measures of evidence (p's) versus errors (α's) in classical statistical testing. Am Stat. 2003;57(3):171–82.

Article   Google Scholar  

Windish DM, Huot SJ, Green ML. Medicine residents’ understanding of the biostatistics and results in the medical literature. JAMA. 2007;298:1010–22.

Article   CAS   PubMed   Google Scholar  

Berger JO, Sellke T. Testing a point null hypothesis: the irreconcilability of p-values and evidence (with discussions). J Am Stat Assoc. 1987;82(397):112–39.

Google Scholar  

Schervish MJ. P values: what they are and what they are not. Am Stat. 1996;50(3):203–6.

Berger JO. Could Fisher, Jeffreys and Neyman have agreed on testing? Stat Sci. 2003;18:1–32.

Sellke T, Bayarri MJ, Berger JO. Calibration of p value for testing precise null hypothesis. Am Stat. 2001;55(1):62–71.

Wassersteinm RL, Lazar NA. The ASA's statement on p-values: context, process, and purpose. Am Stat. 2016;70(2):129–33.

Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature. 2019;567(7748):305–7.

Wasserstein RL, Schirm AL, Lazar AN. Moving to a world beyond “p < 0.05”. Am Stat. 2019;73(S1):1–19.

Fisher RA. Statistical methods and scientific inference (2nd edition). Edinburgh: Oliver and Boyd; 1959.

Lehmann EL. Neyman's statistical philosophy. Probab Math Stat. 1995;15:29–36.

Neyman J, Pearson E. On the use and interpretation of certain test criteria for purpose of statistical inference. Biometrika. 1928;20:175–240.

Lehmann EL. The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two? J Am Stat Assoc. 1993;88(424):1242–9.

Gigerenzer G. Mindless statistics. J Socio-Econ. 2004;33:587–606.

Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337–50.

Article   PubMed   PubMed Central   Google Scholar  

Goodman SN. Toward evidence-based medical statistics. 1: the P value fallacy. Ann Intern Med. 1999;130(12):995–1004.

Goodman SN. Toward evidence-based medical statistics. 2: the Bayes factor. Ann Intern Med. 1999;130(12):1005–13.

Davidoff F. Standing statistics right up. Ann Intern Med. 1999;130(12):1019–21.

Kiefer J. Admissibility off conditional confidence procedures. Ann Math Statist. 1976;4:836–65.

Kiefer J. Conditional confidence statements and confidence estimators (with discussion). J Am Stat Assoc. 1977;72:789–827.

Berger JO, Brown LD, Wolper RL. A unified conditional frequentist and Bayesian test for fixed and sequential simple hypothesis testing. Ann Stat. 1994;22:1787–807.

Berger JO, Boukai B, Wang Y. Unified frequentist and Bayesian testing of a precise hypothesis (with discussion). Stat Sci. 1997;12:133–60.

Matthews R. The great health hoax. The Sunday Telegraph; 1998. Archived: http://junksciencearchive.com/sep98/matthews.html . Accessed 6 June 2020.

Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005;294(2):218–28.

Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008;19(5):640–8.

MacMahon B, Yen S, Trichopoulos D, Warren K, Nardi G. Coffee and cancer of the pancreas. N Engl J Med. 1981;304(11):630–3.

http://www.nytimes.com/1981/03/12/us/study-links-coffee-use-to-pancreas-cancer.html . Accessed 2 May 2020.

Hsieh CC, MacMahon B, Yen S, Trichopoulos D, Warren K, Nardi G. Coffee and pancreatic cancer (chapter 2). N Engl J Med. 1986;315(9):587–9.

CAS   PubMed   Google Scholar  

Turati F, Galeone C, Edefonti V. A meta-analysis of coffee consumption and pancreatic cancer. Ann Oncol. 2012;23(2):311–8.

Chowdhury R, Warnakula S, Kunutsor S, et al. Association of dietary, circulating, and supplement fatty acids with coronary risk: a systematic review and meta-analysis. Ann Intern Med. 2014;160:398–406.

Sacks FM, Lichtenstein AH, Wu JHY, et al. Dietary fats and cardiovascular disease: a presidential advisory from the American Heart Association. Circulation. 2017;136(3):e1–e23.

Goodman SN. Why is getting rid of p-values so hard? Musings on science and statistics. Am Stat. 2019;73(S1):26–30.

Tong C. Statistical inference enables bad science; statistical thinking enables good science. Am Stat. 2019;73(S1):246–26.

Trafimow D, Marks M. Editorial. Basic Appl Soc Psychol. 2015;37:1–2.

Lang JM, Rothman KJ, Cann CI. That confounded P-value. Epidemiology. 1998;9(1):7–8.

http://journals.lww.com/epidem/Pages/informationforauthors.aspx . Accessed 2 May 2020.

Kruege JI, Heck PR. Putting the p-value in its place. Am Stat. 2019;73(S1):122–8.

Greenland S. Valid p-values behave exactly as they should: some misleading criticisms of p-values and their resolution with s-values. Am Stat. 2019;73(S1):106–14.

Colquhoun D. The false positive risk: a proposal concerning what to do about p-values. Am Stat. 2019;73(S1):192–201.

Johnson VE. Revised standards for statistical evidence. Proc Natl Acad Sci U S A. 2013;110(48):19313–7.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Benjamin DJ, Berger JO, Johannesson M, et al. Redefine statistical significance. Nat Hum Behav. 2018;2:6–10.

Lakens D, Adolfi FG, Albers CJ, et al. Justify your alpha. Nat Hum Behav. 2018;2:168–71.

Blume JD, Greevy RA, Welty VF, et al. An introduction to second-generation p-values. Am Stat. 2019;73(S1):157–67.

Matthews RAJ. Moving towards the post p < 0.05 era via the analysis of credibility. Am Stat. 2019;73(S1):202–12.

McShane BB, Gal D, Gelman A, et al. Abandon statistical significance. Am Stat. 2019;73(S1):235–45.

Liao JM, Stack CB, Goodman S. Annals understanding clinical research: interpreting results with large p values. Ann Intern Med. 2018;169(7):485–6.

Rosnow RL, Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Am Psychol. 1989;44:1276–84.

Benjamin D, Berger JO. Three recommendations for improving the use of p-values. Am Stat. 2019;73(S1):186–91.

Held L, Ott M. On p-values and Bayes factors. Annual Review of Statistics and Its Application. 2018;5:393–419.

Download references

Acknowledgements

This material is based upon work supported (or supported in part) by the Department of Veterans Affairs, Veterans Health Administration, Office of Research and Development. The author is indebted to Mr. Fred Malphurs, a retired senior healthcare executive, a visionary leader, who devoted his entire 38-year career to Veterans healthcare, for his unwavering support of research to improve healthcare efficiency and effectiveness. The author is also grateful to the Reviewers and Editorial Board members for their insightful and constructive comments and advice. The author would also like to thank Andrew Toporek and an anonymous reviewer for their helpful suggestions and assistance.

Author information

Authors and affiliations.

Department of Veterans Affairs, Office of Productivity, Efficiency and Staffing (OPES, RAPID), Albany, USA

You can also search for this author in PubMed   Google Scholar

Contributions

JG conceived/designed the study and wrote the manuscript. The author read and approved the final manuscript.

Author’s information

Director of Analytical Methodologies, Office of Productivity, Efficiency and Staffing, RAPID, U.S. Department of Veterans Affairs.

Corresponding author

Correspondence to Jian Gao .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests, additional information, publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Gao, J. P-values – a chronic conundrum. BMC Med Res Methodol 20 , 167 (2020). https://doi.org/10.1186/s12874-020-01051-6

Download citation

Received : 21 February 2020

Accepted : 12 June 2020

Published : 24 June 2020

DOI : https://doi.org/10.1186/s12874-020-01051-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Type I error
  • Research reproducibility
  • Calibrated P -values

BMC Medical Research Methodology

ISSN: 1471-2288

what is the p value in research

what is the p value in research

The P value: What it really means

As nurses, we must administer nursing care based on the best available scientific evidence. But for many nurses, critical appraisal, the process used to determine the best available evidence, can seem intimidating. To make critical appraisal more approachable, let’s examine the P value and make sure we know what it is and what it isn’t.

Defining P value

The P value is the probability that the results of a study are caused by chance alone. To better understand this definition, consider the role of chance.

The concept of chance is illustrated with every flip of a coin. The true probability of obtaining heads in any single flip is 0.5, meaning that heads would come up in half of the flips and tails would come up in half of the flips. But if you were to flip a coin 10 times, you likely would not obtain heads five times and tails five times. You’d be more likely to see a seven-to-three split or a six-to-four split. Chance is responsible for this variation in results.

Just as chance plays a role in determining the flip of a coin, it plays a role in the sampling of a population for a scientific study. When subjects are selected, chance may produce an unequal distribution of a characteristic that can affect the outcome of the study. Statistical inquiry and the P value are designed to help us determine just how large a role chance plays in study results. We begin a study with the assumption that there will be no difference between the experimental and control groups. This assumption is called the null hypothesis. When the results of the study indicate that there is a difference, the P value helps us determine the likelihood that the difference is attributed to chance.

Competing hypotheses

In every study, researchers put forth two kinds of hypotheses: the research or alternative hypothesis and the null hypothesis. The research hypothesis reflects what the researchers hope to show—that there is a difference between the experimental group and the control group. The null hypothesis directly competes with the research hypothesis. It states that there is no difference between the experimental group and the control group.

It may seem logical that researchers would test the research hypothesis—that is, that they would test what they hope to prove. But the probability theory requires that they test the null hypothesis instead. To support the research hypothesis, the data must contradict the null hypothesis. By demonstrating a difference between the two groups, the data contradict the null hypothesis.

Testing the null hypothesis

Now that you know why we test the null hypothesis, let’s look at how we test the null hypothesis.

After formulating the null and research hypotheses, researchers decide on a test statistic they can use to determine whether to accept or reject the null hypothesis. They also propose a fixed-level P value. The fixed level P value is often set at .05 and serves as the value against which the test-generated P value must be compared. (See Why .05?)

A comparison of the two P values determines whether the null hypothesis is rejected or accepted. If the P value associated with the test statistic is less than the fixed-level P value, the null hypothesis is rejected because there’s a statistically significant difference between the two groups. If the P value associated with the test statistic is greater than the fixed-level P value, the null hypothesis is accepted because there’s no statistically significant difference between the groups.

The decision to use .05 as the threshold in testing the null hypothesis is completely arbitrary. The researchers credited with establishing this threshold warned against strictly adhering to it.

Remember that warning when appraising a study in which the test statistic is greater than .05. The savvy reader will consider other important measurements, including effect size, confidence intervals, and power analyses when deciding whether to accept or reject scientific findings that could influence nursing practice.

Real-world hypothesis testing

How does this play out in real life? Let’s assume that you and a nurse colleague are conducting a study to find out if patients who receive backrubs fall asleep faster than patients who do not receive backrubs.

1. State your null and research hypotheses

Your null hypothesis will be that there will be no difference in the average amount of time it takes patients in each group to fall asleep. Your research hypothesis will be that patients who receive backrubs fall asleep, on average, faster than those who do not receive backrubs. You will be testing the null hypothesis in hopes of supporting your research hypothesis.

2. Propose a fixed-level P value

Although you can choose any value as your fixed-level P value, you and your research colleague decide you’ll stay with the conventional .05. If you were testing a new medical product or a new drug, you would choose a much smaller P value (perhaps as small as .0001). That’s because you would want to be as sure as possible that any difference you see between groups is attributed to the new product or drug and not to chance. A fixed-level P value of .0001 would mean that the difference between the groups was attributed to chance only 1 time out of 10,000. For a study on backrubs, however, .05 seems appropriate.

3. Conduct hypothesis testing to calculate a probability value

You and your research colleague agree that a randomized controlled study will help you best achieve your research goals, and you design the process accordingly. After consenting to participate in the study, patients are randomized to one of two groups:

  • the experimental group that receives the intervention—the backrub group
  • the control group—the non-backrub group.

After several nights of measuring the number of minutes it takes each participant to fall asleep, you and your research colleague find that on average, the backrub group takes 19 minutes to fall asleep and the non-backrub group takes 24 minutes to fall asleep.

Now the question is: Would you have the same results if you conducted the study using two different groups of people? That is, what role did chance play in helping the backrub group fall asleep 5 minutes faster than the non-backrub group? To answer this, you and your colleague will use an independent samples t-test to calculate a probability value.

An independent samples t-test is a kind of hypothesis test that compares the mean values of two groups (backrub and non-backrub) on a given variable (time to fall asleep).

Hypothesis testing is really nothing more than testing the null hypothesis. In this case, the null hypothesis is that the amount of time needed to fall asleep is the same for the experimental group and the control group. The hypothesis test addresses this question: If there’s really no difference between the groups, what is the probability of observing a difference of 5 minutes or more, say 10 minutes or 15 minutes?

We can define the P value as the probability that the observed time difference resulted from chance. Some find it easier to understand the P value when they think of it in relationship to error. In this case, the P value is defined as the probability of committing a Type 1 error. (Type 1 error occurs when a true null hypothesis is incorrectly rejected.)

4. Compare and interpret the P value

Early on in your study, you and your colleague selected a fixed-level P value of .05, meaning that you were willing to accept that 5% of the time, your results might be caused by chance. Also, you used an independent samples t-test to arrive at a probability value that will help you determine the role chance played in obtaining your results. Let’s assume, for the sake of this example, that the probability value generated by the independent samples t-test is .01 (P = .01). Because this P value associated with the test statistic is less than the fixed-level statistic (.01 < .05), you can reject the null hypothesis. By doing so, you declare that there is a statistically significant difference between the experimental and control groups. (See Putting the P value in context.)

In effect, you’re saying that the chance of observing a difference of 5 minutes or more, when in fact there is no difference, is less than 5 in 100. If the P value associated with the test statistic would have been greater than .05, then you would accept the null hypothesis, which would mean that there is no statistically significant difference between the control and experimental groups. Accepting the null hypothesis would mean that a difference of 5 minutes or more between the two groups would occur more than 5 times in 100.

Putting the P value in context

Although the P value helps you interpret study results, keep in mind that many factors can influence the P value—and your decision to accept or reject the null hypothesis. These factors include the following:

  • Insufficient power. The study may not have been designed appropriately to detect an effect of the independent variable on the dependent variable. Therefore, a change may have occurred without your knowing it, causing you to incorrectly reject your hypothesis.
  • Unreliable measures. Instruments that don’t meet consistency or reliability standards may have been used to measure a particular phenomenon.
  • Threats to internal validity. Various biases, such as selection of patients, regression, history, and testing bias, may unduly influence study outcomes.

A decision to accept or reject study findings should focus not only on P value but also on other metrics including the following:

  • Confidence intervals (an estimated range of values with a high probability of including the true population value of a given parameter)
  • Effect size (a value that measures the magnitude of a treatment effect)

Remember, P value tells you only whether a difference exists between groups. It doesn’t tell you the magnitude of the difference.

5. Communicate your findings

The final step in hypothesis testing is communicating your findings. When sharing research findings (hypotheses) in writing or discussion, understand that they are statements of relationships or differences in populations. Your findings are not proved or disproved. Scientific findings are always subject to change. But each study leads to better understanding and, ideally, better outcomes for patients.

Key concepts

The P value isn’t the only concept you need to understand to analyze research findings. But it is a very important one. And chances are that understanding the P value will make it easier to understand other key analytical concepts.

Selected references

Burns N, Grove S: The Practice of Nursing Research: Conduct, Critique, and Utilization. 5th ed. Philadelphia: WB Saunders; 2004.

Glaser DN: The controversy of significance testing: misconceptions and alternatives. Am J Crit Care. 1999;8(5):291-296.

Kenneth J. Rempher, PhD, RN, MBA, CCRN, APRN,BC, is Director, Professional Nursing Practice at Sinai Hospital of Baltimore (Md.). Kathleen Urquico, BSN, RN, is a Direct Care Nurse in the Rubin Institute of Advanced Orthopedics at Sinai Hospital of Baltimore.

what is the p value in research

NurseLine Newsletter

  • First Name *
  • Last Name *
  • Hidden Referrer

*By submitting your e-mail, you are opting in to receiving information from Healthcom Media and Affiliates. The details, including your email address/mobile number, may be used to keep you informed about future products and services.

Test Your Knowledge

Recent posts.

what is the p value in research

Interpreting statistical significance in nursing research

what is the p value in research

Introduction to qualitative nursing research

what is the p value in research

Navigating statistics for successful project implementation

what is the p value in research

Nurse research and the institutional review board

What are descriptive statistics

Research 101: Descriptive statistics

what is the p value in research

Research 101: Forest plots

what is the p value in research

Understanding confidence intervals helps you make better clinical decisions

what is the p value in research

Differentiating statistical significance and clinical significance

Differentiating research, evidence-based practice, and quality improvement

Differentiating research, evidence-based practice, and quality improvement

what is the p value in research

Are you confident about confidence intervals?

what is the p value in research

Making sense of statistical power

what is the p value in research

Read the practical framework for leveling up your social media team.

  • · Brandwatch Academy
  • Forrester Wave

Brandwatch Consumer Research

Formerly the Falcon suite

Formerly Paladin

Published May 7 th 2024

Met Gala 2024: The Most Talked About Moments Online

What were the highlights of the Met Gala 2024? We analyzed the online conversation and here are the most talked about moments, celebrities, and designers.

The Met Gala makes the first Monday in May the most fashionable day of the year. 

It's the event where fashion becomes art and celebrities hit the carpet with never-before-seen looks. This year's theme "Sleeping Beauties: Reawakening Fashion" and the dress code "The Garden of Time" made people curious: how would designers interpret them and turn them into spectacular looks on the red carpet? 

What were the most talked about moments of the Met Gala 2024? Which celebrities and designers stole the show? And how did AI influence the conversation? We have you covered.

With Brandwatch Consumer Research , we looked at online conversations during the Met Gala event (5:30pm EST - 8:30pm EST) from public social posts, blogs, forums, and news sites to identify the most talked about moments.

The hype around the Met Gala 

The Met Gala is one of the biggest fashion events in the world. Everyone looks forward to seeing who will attend and, perhaps more importantly, what they will wear. 

This year's Met Gala sparked many online discussions, with over 488k unique authors generating over 2.21m mentions during the event. This year’s event was slightly more popular in terms of online conversation than last year's Met Gala, which generated 2.1 million mentions. 

65% of emotion-categorized mentions around the Met Gala expressed joy. The most frequently used emoji are 😭, ✨, and 💙. The ‘loud crying face’ emoji was used by viewers when they were stunned or amused by an outfit, or they missed the presence of their favorite celebrity. K-pop fans used the sparkle emoji and blue heart to express their love for the band Stray Kids and Blackpink member Jennie Kim, who all attended the Met Gala.

What were the highlights of the Met Gala? Let's take a look at the most talked about hosts, celebrities, and designers.

The hosts 

This year's host lineup included some impressive names: Bad Bunny, Chris Hemsworth, Jennifer Lopez, and Zendaya. They all helped Anna Wintour, who has organized the Met Gala since 1995, attract big names and make the event a success.

The hosts generated over 90k online conversations during the red carpet broadcast. People discussed their hosting duties in general but were especially excited about their outfits. And one outshone the other hosts: Zendaya. The actress wore not one, but two outfits at the Met Gala, and nearly 60% of all host-related conversations were about her.    

Her first outfit was designed by Maison Margiela, a dark blue mermaid dress with a hummingbird on her shoulder. Her look was completed with dark makeup by celebrity stylist Law Roach. She then changed into a second look for the red carpet, a dramatic black Givenchy dress from 1996. 

While the online conversations around the female hosts were mostly positive, the picture was different for the male hosts. They received more negative than positive mentions. For example, people criticized Chris Hemsworth's decision to wear a plain suit. In fact, suits in general were a negative topic in Met Gala conversations this year. Viewers were hoping for more exciting looks from the male attendees.

You might like

Consumer trends in the media and entertainment industry.

We analyzed millions of online conversations to identify the latest consumer trends in the media and entertainment industry.

The most-talked about celebrities

One of the most discussed topics is the guest list. Until close to the event, the people invited and attending is a well-kept secret, adding to the anticipation around the event. And this year was no different. Half of the online conversations around the Met Gala mention celebrities. Who stole the show?

Brandwatch image

The biggest peak was caused by Ariana Grande. No other celebrity managed to generate so much chatter in one minute. It was the singer's second Met Gala, and this time she not only attended the event but also performed. 

The first peak was caused by host Zendaya, who wore her first outfit, and the last peak was caused by singer Lana Del Rey, who wore a dramatic Alexander McQueen gown.

In terms of overall volume in online conversations, there were other winners. Here are the top five most-talked about celebrities:

  • Stray Kids – 533k mentions
  • Jennie Kim – 236k mentions
  • Ariana Grande – 74k mentions
  • Lana Del Rey – 30k mentions
  • Mona Patel – 23k mentions

K-pop stars Stray Kids and Blackpink's Jennie Kim outpaced all other celebrities when it comes to online chatter. All members of Stray Kids attended the Met Gala for the first time, wearing custom suits by Tommy Hilfiger. It was the second Met Gala for Jennie Kim, who wore a short blue dress by Alaïa. A surprise was Mona Patel. Her stunning gown caused a lot of discussion about who she was. The Indian fashion entrepreneur wore kinetic butterflies with moving wings on her arms.

Post about Stray Kids

Post about jennie kim, post about ariana grande, post about lana del rey, post about mona patel.

There were also discussions about celebrities who did not attend the Met Gala. Taylor Swift, Rihanna, and Katy Perry were just a few of those speculated about. All couldn't make it or decided to skip. Or didn't they? It didn't take long for AI images to go viral showing celebrities like Katy Perry on the red carpet. The post featuring Katy Perry's fake image was viewed over 13 million times on X and received hundreds of thousands of likes at the time of writing as people were convinced the images were real. The singer had to clarify on Instagram that she really wasn't at the Met Gala.

The most popular designers

There’s no Met Gala without the designers. None of the glamor and pomp would be possible without the creativity and imagination of the biggest fashion brands. The Met Gala is one of the biggest opportunities for fashion brands to showcase their unique creations. Who got the most attention this year?

Brandwatch image

There's a secret ingredient for the top two most mentioned designers: K-pop. If you want a lot of online buzz about your brand, you better work with K-pop stars. Tommy Hilfiger received by far the most online mentions of any designer for dressing all the members of the band Stray Kids for the Met Gala. Stray Kids are one of the biggest stars in the K-pop world right now, and any brand that works with them is going to get a lot of attention. 

This is a strategy that brands also use at fashion weeks. Felix of Stray Kids walked the runway for Louis Vuitton at Paris Fashion Week this year and made Louis Vuitton the number one most mentioned brand in fashion week conversations.

The Social Data Behind the Biggest Fashion Week Shows

What fashion trends will dominate in 2024? We analyzed the online conversations around the “big four” fashion week shows held in New York, London, Milan, and Paris.

The Met Gala attracts a lot of attention and has a huge impact on setting trends and consumer behavior. But it's not just for fashion brands. More and more brands from other industries are recognizing the opportunity to capitalize on the hype before, during, and after the event. The Met Gala is a great opportunity for brands of all sizes and industries to jump on the bandwagon with tailored marketing activities and campaigns. 

With social listening platforms like Brandwatch Consumer Research , brands can keep an eye on these types of events and tap into trends as they emerge.

Michaela Vogl

Marketing Content Specialist

Share this post

Brandwatch bulletin.

Offering up analysis and data on everything from the events of the day to the latest consumer trends. Subscribe to keep your finger on the world’s pulse.

Free report

We analyzed millions of online conversations to identify the latest consumer trends in the media and entertainment industry..

what is the p value in research

We value your privacy

We use cookies to improve your experience and give you personalized content. Do you agree to our cookie policy?

By using our site you agree to our use of cookies — I Agree

Falcon.io is now part of Brandwatch. You're in the right place!

Existing customer? Log in to access your existing Falcon products and data via the login menu on the top right of the page. New customer? You'll find the former Falcon products under 'Social Media Management' if you go to 'Our Suite' in the navigation.

Paladin is now Influence. You're in the right place!

Brandwatch acquired Paladin in March 2022. It's now called Influence, which is part of Brandwatch's Social Media Management solution. Want to access your Paladin account? Use the login menu at the top right corner.

  • Open access
  • Published: 27 April 2024

Associations between trans fatty acids and systemic immune-inflammation index: a cross-sectional study

  • Xiao-Feng Zhu 1 ,
  • Yu-Qi Hu 2 ,
  • Zhi-Cheng Dai 3 ,
  • Xiu-Juan Li 2 &
  • Jing Zhang 4  

Lipids in Health and Disease volume  23 , Article number:  122 ( 2024 ) Cite this article

292 Accesses

Metrics details

Previous studies have demonstrated that trans fatty acids (TFAs) intake was linked to an increased risk of chronic diseases. As a novel systemic inflammatory biomarker, the clinical value and efficacy of the systemic immune-inflammation index (SII) have been widely explored. However, the association between TFAs and SII is still unclear. Therefore, the study aims to investigate the connection between TFAs and SII in US adults.

The study retrieved data from the National Health and Nutrition Examination Survey (NHANES) for the years 1999–2000 and 2009–2010. Following the exclusion of ineligible participants, the study encompassed a total of 3047 individuals. The research employed a multivariate linear regression model to investigate the connection between circulating TFAs and SII. Furthermore, the restricted cubic spline (RCS) model was utilized to evaluate the potential nonlinear association. Subgroup analysis was also conducted to investigate the latent interactive factors.

In this investigation, participants exhibited a mean age of 47.40 years, with 53.91% of them being female. Utilizing a multivariate linear regression model, the independent positive associations between the log2-transformed palmitelaidic acid, the log2 transformed-vaccenic acid, the log2-transformed elaidic acid, the log2-transformed linolelaidic acid, and the log2-transformed-total sum of TFAs with the SII (all P  < 0.05) were noted. In the RCS analysis, no nonlinear relationship was observed between the log2-transformed palmitelaidic acid, the log2 transformed-vaccenic acid, the log2-transformed elaidic acid, the log2-transformed linolelaidic acid, the log2-transformed-total sum of TFAs and the SII (all P for nonlinear > 0.05). For the stratified analysis, the relationship between the circulating TFAs and the SII differed by the obesity status and the smoking status.

Conclusions

A positive association was investigated between three types of TFA, the sum of TFAs, and the SII in the US population. Additional rigorously designed studies are needed to verify the results and explore the potential mechanism.

Introduction

Trans fatty acids (TFAs) are a specific type of unsaturated acids that are naturally occurring and artificially produced. In the U.S., dietary TFAs account for 2–3% of the energy intake, primarily from processed foods, including baked products and packaged snacks [ 1 ]. However, TFAs are not essential to the human body and are detrimental to health. Earlier investigations have established that the intake of TFAs is associated with an increase in lipid levels [ 2 , 3 ], which may lead to an increased prevalence of cardiovascular diseases [ 4 ]. Moreover, studies based on in vivo and in vitro models found that the TFAs could not only modulate the microbiome in the mice but also induce inflammation and oxidative stress [ 5 , 6 ], which are associated with the risk of some common chronic diseases [ 7 ].

It has been proposed that inflammation is a major factor in the development of diseases. To better evaluate the systematic inflammation of patients in clinical practice, a novel blood inflammation biomarker called the systematic immune-inflammation index (SII) has been proposed, which could be calculated based on three types of blood cells (lymphocytes, neutrophils, and platelets) [ 8 ]. As an easily accessible indicator, plenty of studies have investigated and confirmed its prognostic value in diabetes, lung cancer, and the general population [ 9 , 10 , 11 ]. A study based on 6003 Chinese adults discovered that the SII was significantly associated with hypertension over a long-term period [ 12 ]. In addition, recent studies have found that elevated SII may increase the risk of diabetic retinopathy and cognitive impairment, as well as the severity of carotid artery stenosis [ 13 , 14 , 15 ].

Some studies have reported that a few dietary factors, including dietary fiber, vitamin D and selenium, may influence systemic inflammation in humans [ 16 , 17 , 18 ]. However, information on the association between TFAs and systemic inflammation is limited. Given the widespread use of TFAs and the excellent efficacy of SII, exploring the relationship between circulating TFAs and SII may provide some novel insights into the adverse effects of TFAs on inflammation. Hence, National Health and Nutrition Examination Survey (NHANES) data collected during the years 1999–2000 and 2009–2010 were used in the study to explore the connections between plasma TFAs and SII among U.S. adults.

Study population

NHANES is a large database that could be freely accessed by researchers around the globe. The Centers for Disease Control and Prevention (CDC) conducted the NHANES project on a two-year cycle to evaluate the nutritional and medical status of non-institutionalized individuals living in the U.S. Approximately 5000 civilians living in the communities were selected by authorities across each cycle. The complex sampling and multi-stage methodology was utilized in the sample survey to generate nationally representative data.

The research selected participants’ data from two survey cycles of the database (1999–2000 and 2009–2010), for which the level of circulating TFAs was available. In this study, a total of 20,502 participants aged ≥ 20 years were first extracted. Then, we excluded 13,642 samples with missing data on TFAs in the second step and 29 samples with missing data on SII in the third step. Furthermore, 3784 participants with missing data on the covariates were also regarded as ineligible. Finally, 3047 eligible U.S. adults from the NHANES were included to conduct a cross-sectional study. The flowchart of the inclusion and exclusion criteria is shown in Fig.  1 . The protocol was approved by the Ethical Review Committee of the National Health Council, and each individual gave written informed consent.

figure 1

Flow chart of participant selection. Abbreviations: NHANES, National Health and Nutrition Examination Survey, SII, Systemic immune-inflammation index

Measurement of circulating TFA

Previous studies have reported detailed methods and approaches to evaluate the level of plasma TFA [ 19 , 20 ]. In brief, participants’ blood samples were obtained in the morning after a fasting period following the protocol outlined by the CDC. Subsequently, TFA isomers were identified by their chromatographic retention times and specific mass-to-charge ratios. Quantification of metabolites was conducted using established standard solutions, incorporating stable isotope-labeled fatty acids as internal standards. The total amount of TFAs was determined as follows: Sum TFAs = vaccenic acid + linoelaidic acid + palmitelaidic acid + elaidic acid.

Identification of SII

The study derived the SII by multiplying the number of neutrophils by the number of platelets, followed by dividing by the number of lymphocytes. The level of the complete blood cell count is expressed as ×103 cells/µl and was assessed by blood analysis equipment, which is conducted by professional laboratory staff.

Considering the clinical facts, the potential confounding factors were included in the study. Demographic factors, including age, gender, race, education, poverty income ratio (PIR), and marital status, were evaluated through a questionnaire conducted at the mobile examination center. Race was categorized into five groups: Mexican American, non-Hispanic Black, non-Hispanic White, other Hispanic, and other races. Marital status was categorized as married/living with a partner, widowed/divorced/separated, or never married. Smoking status was defined based on lifetime cigarette consumption, with categories for never smoked, ever smoked, and current smoker. Alcohol consumption was determined by the mean alcohol intake over a two-day diet obtained through dietary recall. Education level was stratified into three groups: less than high school, high school graduate, and more than high school. Trained medical personnel measured and calculated participants’ body mass index (BMI) during interviews. Information on cardiovascular disease (CVD), hypertension, cancer, and diabetes mellitus (DM) was collected through questionnaires. Specifically speaking, participants were considered CVD patients, based on the previous studies [ 21 , 22 , 23 ]. The direct immunoassay-related equipment was utilized for examining the level of the lipids in individuals. Serum uric acid levels were measured using the colorimetric method in laboratory tests, and the estimated glomerular filtration rate (eGFR) was calculated following established research protocols [ 24 ].

Statistical analysis

Based on the CDC guideline, all analyses involved in the study took clustering, multi-stage, and sample weights into consideration. Given the skewed distribution of TFAs, a log2 transformation was applied for the regression analysis. The baseline characteristics of participants were stratified by the tertiles of sum TFAs. Continuous variables were presented as mean ± standard error using weighted linear regression models, while categorical variables were expressed as percentages through the Rao-Scott chi-square test. Subsequently, the research employed the multivariate linear regression model to examine the relationship between TFAs and SII. The effect size (β) and 95% confidence intervals (CI) were calculated for statistical assessment. Model 1 was unadjusted, while Model 2 accounted for age, gender, and race. Model 3 was adjusted for the all latent confounders we included for the present investigation to verify the robustness of the results. Additionally, the restricted cubic spline (RCS) model was utilized to investigate potential non-linear associations involving four main types of TFAs, the sum TFAs, and SII. Furthermore, subgroup analysis and interactive P values were utilized to probe potential interaction effects among stratified variables. All analyses were conducted using R software (version 4.2.1).

Baseline characteristics of the study participants

Table  1 presents the weighted basic characteristics of 3047 individuals. In the study population, the average age was 47.40 years, and 53.91% were female. Additionally, the mean levels of the circulating palmitelaidic acid, vaccenic acid, elaidic acid and linolelaidic acid were 5.05 µmol/L, 25.87 µmol/L, 20.99 µmol/L, and 2.07 µmol/L, respectively. After classifying by sum TFAs tertiles, individuals with higher circulating TFAs were more likely to be older, non-Hispanic White, have lower educational attainment, married/living with a partner, current smokers, less alcohol consumption, lower eGFR, and higher SII. However, no statistically significant difference was shown in gender, PIR, uric acid, CVD, hypertension, DM, and cancer across the three groups. Interestingly, BMI was shown to be highest in the T2 group with an average of 29.24 kg/m2 and the population in the T2 group had the highest age with an average of 48.49 years.

Relationship between TFAs and SII

The multivariate linear regression model was performed and detailed results were shown in Table  2 . In the crude model (model 1), the four types of TFA and the sum of TFAs were significantly and positively related to SII. After adjusting for age, sex, and race (model 2), the relationship was weakened. After adjusting for the covariates that were included in the study in Model 3, the connection between the log2-transformed palmitelaidic acid (β = 56.84, 95% CI = 30.93, 82.74, P  < 0.001), the log2-transformed vaccenic acid (β = 32.28, 95% CI = 14.99, 49.57, P  = 0.002), the log2-transformed elaidic acid (β = 40.31, 95% CI = 23.09, 57.54, P  < 0.001), the log2-transformed-linolelaidic acid (β = 27.04, 95% CI = 6.10, 47.97, P  = 0.016), the log2-transformed sum TFAs (β = 40.33, 95% CI = 21.29, 59.38, P  < 0.001) and SII remain robust. Compared to the T1 group, individuals in the T3 group of palmitelaidic acid (β = 75.19, 95% CI = 25.38, 125.00, P  = 0.007), vaccenic acid (β = 62.02, 95% CI = 11.02, 113.02, P  = 0.022), elaidic acid (β = 84.43, 95% CI = 34.80, 134.07, P  = 0.003), and sum TFAs (β = 78.08, 95% CI = 31.74, 124.41, P  = 0.003) were significantly had higher SII. However, the population in the T3 group of the linolelaidic acid was not observed to have a higher SII ( P  > 0.05).

Furthermore, the study performed the RCS analysis for four main types of TFA and the sum of TFAs which was shown in Fig.  2 . Judging from the results, no significant nonlinear correlation was observed between four main types of TFAs, the sum TFAs and SII (all P for nonlinear > 0.05).

figure 2

The restricted cubic splines analysis of the association between log2-Palmitelaidic acid ( A ), log2-Vaccenic acid ( B ), log2-Elaidic acid ( C ), log2-Linolelaidic acid ( D ), log2-Sum TFAs ( E ) and SII. Abbreviations: TFAs, trans fatty acids, SII, Systemic immune-inflammation index

Subgroup analysis

The stratified analysis was utilized to explore the potential interactive factors in the relationship between TFAs and SII. The results were shown in Tables  3 , 4 , 5 , 6 and 7 . For the circulating palmitelaidic acid, vaccenic acid, elaidic acid, and the sum TFAs, they were more pronounced in never smokers (all P for interaction < 0.05). Additionally, the linolelaidic acid was more positively related to the SII in individuals with lower BMI, and a history of never having smoked ( P for interaction < 0.05).

To our knowledge, there is currently limited research investigating the association between TFAs and SII. Therefore, we employed various advanced statistical models to comprehensively evaluate the influence of TFAs on SII levels. These findings revealed a positive correlation between palmitelaidic acid, vaccenic acid, elaidic acid, the total sum of TFAs, and SII in fully adjusted models. Notably, significant interactions were observed between smoking and certain TFAs.

SII is increasingly recognized as a potential biomarker for conditions such as gastrointestinal malignancies, prostate cancer, cardiovascular illnesses, and others [ 25 , 26 , 27 ]. In a cross-sectional study involving 730 healthy women from the Nurses’ Health Investigation I cohort, Lopez-Garcia et al. noted a positive correlation between TFAs intake and plasma concentrations of C-reactive protein (CRP), sE-selectin, sICAM-1, tumor necrosis factor-alpha receptors 2, and sVCAM-1 [ 28 ]. These findings were consistent with other interventional and observational studies that suggest consumption of TFAs could elevate inflammatory markers in the blood such as CRP, interleukin-1β, chemokine ligand 2 and interleukin-6 (IL-6) [ 27 , 29 , 30 ]. Further evidence from in vitro tests and animal models shows that TFAs can activate and accumulate macrophages, as well as activate NF-κB and enhance osteopontin production in the liver [ 31 , 32 , 33 , 34 ].

Another possible explanation for the correlation between TFAs and SII is the reduced proportion of gram-negative sulfate-reducing bacteria after a meal high in TFAs according to Ge et al. [ 35 ]. The bacteria’s subsequent overproduction of hydrogen sulfide (H 2 S) may be a factor in inflammatory bowel disease and bowel illnesses linked to inflammation [ 36 ]. By reducing the disulfide bonds in the mucus network, H 2 S promotes the breakdown of the mucus barrier and increases the permeability of the mucus layer [ 37 ]. When the mucus barrier is breached, germs and toxins can get in intimate contact with the colonic epithelium, which can lead to inflammation [ 37 ]. Owing to these inflammatory variables, a conceivable biological process that results in greater SII is excessive consumption of TFAs with pro-inflammatory properties.

The subgroup analysis and interaction tests conducted in this study revealed a noteworthy positive correlation between total TFAs and SII within subgroups categorized by smoking status, while the similar connection between the Linolelaidic acid and SII within subgroups categorized by BMI and smoking status. According to these findings, there was a higher positive association between SII scores and TFAs among nonsmokers. Previous studies have demonstrated that inflammation is frequently involved in the pathogenesis of illnesses associated with cigarette smoking [ 38 ]. The subgroup analysis’s findings further imply that the association between SII and TFAs varies according to BMI. Patients with a BMI under 30 kg/m² showed a greater correlation between TFAs and SII. Previous studies have connected TFA intake to higher BMI levels [ 39 ]. Studies suggest that BMI, a risk factor for various cancers, is associated with an elevation in SII [ 40 ]. Collectively, these results imply that those with high amounts of circulating TFAs should be closely detected for elevated SII, especially those without harmful lifestyle choices, which was consistent with previous findings [ 41 , 42 ]. Nevertheless, additional investigations are necessary to clarify the specific mechanisms involved.

Strengths and limitations

The research offers some fresh perspectives in this area. First, the study assessed the connection between TFAs and SII in U.S. adults for the first time. In addition, subgroup analyses were carried out to guarantee consistent results, and a wide range of potential confounding factors were taken into account in this study. Furthermore, after controlling for a wide range of potential confounders, the study discovered that the dose-response correlations of SII with all types of TFAs level and the sum TFAs were not nonlinear. However, some limitations of the investigation must be acknowledged. Initially, due to regulatory modifications in the past decade, the findings derived from data collected between 1999 and 2000 and 2009–2010 may not precisely depict the present scenario of TFAs intake among adults in the US. Furthermore, the results could not suggest the habits of the diet and lifestyle and the level of circulating trans fatty acids in the current Americans. Nevertheless, these results could establish a foundational reference point for subsequent analyses, given that they are grounded in the most recent data accessible for the entire adult US population. Second, even though the research employed the blood cell count-based comprehensive index as a biomarker of systemic immune inflammation, more research is necessary to determine the relationship between TFAs exposure and other biomarkers including CRP and IL-6. Thirdly, given the cross-sectional study design employed, the investigation is unable to establish causation from these findings. Consequently, even though variables were taken into account, measurement errors and uncontrolled confounders might have had an impact on the results.

In this cross-sectional study, the circulating TFAs were investigated to be positively associated with SII, and a nonlinear relationship was found. Notably, these associations could be more weakened or more pronounced in different subgroups. Briefly, the findings of the study emphasize the potential role of TFAs in systemic inflammation severity and provide new insights into controlling systemic inflammation levels in the US general population from a dietary health perspective. Nevertheless, additional research is essential to explore the cause-and-effect relationship and to elucidate the specific underlying mechanism.

Data availability

The study utilized data from the National Health and Nutrition Examination Survey (NHANES), which is publicly available in the NHANES repository, https://www.cdc.gov/nchs/nhanes .

Micha R, Mozaffarian D. Trans fatty acids: effects on cardiometabolic health and implications for policy. Prostaglandins Leukot Essent Fatty Acids. 2008;79:147–52.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Liska DJ, Cook CM, Wang DD, Gaine PC, Baer DJ. Trans fatty acids and cholesterol levels: an evidence map of the available science. Food Chem Toxicol. 2016;98:269–81.

Article   CAS   PubMed   Google Scholar  

Verneque BJF, Machado AM, de Abreu Silva L, Lopes ACS, Duarte CK. Ruminant and industrial trans-fatty acids consumption and cardiometabolic risk markers: a systematic review. Crit Rev Food Sci Nutr. 2020;62:2050–60.

Article   PubMed   Google Scholar  

Mozaffarian D, Katan MB, Ascherio A, Stampfer MJ, Willett WC. Trans fatty acids and Cardiovascular Disease. N Engl J Med. 2006;354:1601–13.

Liu H, Nan B, Yang C, Li X, Yan H, Yuan Y. Elaidic acid induced NLRP3 inflammasome activation via ERS-MAPK signaling pathways in Kupffer cells. Biochim et Biophys Acta (BBA) - Mol Cell Biology Lipids 2022, 1867.

Mohammadi F, Green M, Tolsdorf E, Greffard K, Leclercq M, Bilodeau J-F, Droit A, Foster J, Bertrand N, Rudkowska I. Industrial and ruminant trans-fatty acids-enriched diets differentially modulate the Microbiome and Fecal metabolites in C57BL/6 mice. Nutrients 2023, 15.

Islam MA, Amin MN, Siddiqui SA, Hossain MP, Sultana F, Kabir MR. Trans fatty acids and lipid profile: a serious risk factor to cardiovascular disease, cancer and diabetes. Diabetes Metabolic Syndrome: Clin Res Reviews. 2019;13:1643–7.

Article   Google Scholar  

Hu B, Yang X-R, Xu Y, Sun Y-F, Sun C, Guo W, Zhang X, Wang W-M, Qiu S-J, Zhou J, Fan J. Systemic Immune-inflammation index predicts prognosis of patients after curative resection for Hepatocellular Carcinoma. Clin Cancer Res. 2014;20:6212–22.

Sun W, Zhang P, Ye B, Situ M-Y, Wang W, Yu Y. Systemic immune-inflammation index predicts survival in patients with resected lung invasive mucinous adenocarcinoma. Translational Oncol 2024, 40.

Wang H, Nie H, Bu G, Tong X, Bai X. Systemic immune-inflammation index (SII) and the risk of all-cause, cardiovascular, and cardio-cerebrovascular mortality in the general population. Eur J Med Res 2023, 28.

Yang C, Yang Q, Xie Z, Peng X, Liu H, Xie C. Association of systemic immune-inflammation-index with all-cause and cause-specific mortality among type 2 diabetes: a cohort study base on population. Endocrine 2023.

Ma L-L, Xiao H-B, Zhang J, Liu Y-H, Hu L-K, Chen N, Chu X, Dong J, Yan Y-X. Association between systemic immune inflammatory/inflammatory response index and hypertension: a cohort study of functional community. Nutr Metabolism Cardiovasc Dis 2023.

Kelesoglu S, Yilmaz Y, Elcik D, Bireciklioglu F, Ozdemir F, Balcı F, Tuncay A, Kalay N. Increased serum systemic Immune-inflammation index is independently Associated with severity of carotid artery stenosis. Angiology. 2022;74:790–7.

Li JQ, Zhang YR, Wang HF, Guo Y, Shen XN, Li MM, Song JH, Tan L, Xie AM, Yu JT. Exploring the links among peripheral immunity, biomarkers, cognition, and neuroimaging in Alzheimer’s disease. Alzheimer’s Dementia: Diagnosis Assess Disease Monit 2023, 15.

Wang S, Pan X, Jia B, Chen S. Exploring the correlation between the systemic Immune inflammation index (SII), systemic inflammatory response index (SIRI), and type 2 Diabetic Retinopathy. Diabetes Metabolic Syndrome Obes. 2023;16:3827–36.

Gonçalves de Carvalho CMR, Ribeiro SML. Aging, low-grade systemic inflammation and vitamin D: a mini-review. Eur J Clin Nutr. 2016;71:434–40.

Qi X, Li Y, Fang C, Jia Y, Chen M, Chen X, Jia J. The associations between dietary fibers intake and systemic immune and inflammatory biomarkers, a multi-cycle study of NHANES 2015–2020. Front Nutr 2023, 10.

Rayman MP. The importance of selenium to human health. Lancet. 2000;356:233–41.

Kuiper HC, Wei N, McGunigale SL, Vesper HW. Quantitation of trans-fatty acids in human blood via isotope dilution-gas chromatography-negative chemical ionization-mass spectrometry. J Chromatogr B. 2018;1076:35–43.

Article   CAS   Google Scholar  

Vesper HW, Caudill SP, Kuiper HC, Yang Q, Ahluwalia N, Lacher DA, Pirkle JL. Plasma trans-fatty acid concentrations in fasting adults declined from NHANES 1999–2000 to 2009–2010. Am J Clin Nutr. 2017;105:1063–9.

Chen F, Song Y, Li W, Xu H, Dan H, Chen Q. Association between periodontitis and mortality of patients with cardiovascular diseases: a cohort study based on NHANES. J Periodontol. 2024;95:175–84.

Chen Y, Lin W, Fu L, Liu H, Jin S, Ye X, Pu S, Xue Y. Muscle quality index and cardiovascular disease among US population-findings from NHANES 2011–2014. BMC Public Health. 2023;23:2388.

Article   PubMed   PubMed Central   Google Scholar  

Dang K, Wang X, Hu J, Zhang Y, Cheng L, Qi X, Liu L, Ming Z, Tao X, Li Y. The association between triglyceride-glucose index and its combination with obesity indicators and cardiovascular disease: NHANES 2003–2018. Cardiovasc Diabetol. 2024;23:8.

Levey AS, Stevens LA, Schmid CH, Zhang Y, Castro AF, Feldman HI, Kusek JW, Eggers P, Van Lente F, Greene T, Coresh J. A New equation to Estimate glomerular filtration rate. Ann Intern Med 2009, 150.

Ye Z, Hu T, Wang J, Xiao R, Liao X, Liu M, Sun Z. Systemic immune-inflammation index as a potential biomarker of cardiovascular diseases: a systematic review and meta-analysis. Front Cardiovasc Med. 2022;9:933913.

Meng L, Yang Y, Hu X, Zhang R, Li X. Prognostic value of the pretreatment systemic immune-inflammation index in patients with prostate cancer: a systematic review and meta-analysis. J Translational Med. 2023;21:79.

Zhang Y, Lin S, Yang X, Wang R, Luo L. Prognostic value of pretreatment systemic immune-inflammation index in patients with gastrointestinal cancers. J Cell Physiol. 2019;234:5555–63.

Lopez-Garcia E, Schulze MB, Meigs JB, Manson JE, Rifai N, Stampfer MJ, Willett WC, Hu FB. Consumption of trans fatty acids is related to plasma biomarkers of inflammation and endothelial dysfunction. J Nutr. 2005;135:562–6.

Mozaffarian D, Pischon T, Hankinson SE, Rifai N, Joshipura K, Willett WC, Rimm EB. Dietary intake of trans fatty acids and systemic inflammation in women. Am J Clin Nutr. 2004;79:606–12.

Baer DJ, Judd JT, Clevidence BA, Tracy RP. Dietary fatty acids affect plasma markers of inflammation in healthy men fed controlled diets: a randomized crossover study. Am J Clin Nutr. 2004;79:969–73.

Machado RM, Nakandakare ER, Quintao ECR, Cazita PM, Koike MK, Nunes VS, Ferreira FD, Afonso MS, Bombo RPA, Machado-Lima A, et al. Omega-6 polyunsaturated fatty acids prevent atherosclerosis development in LDLr-KO mice, in spite of displaying a pro-inflammatory profile similar to trans fatty acids. Atherosclerosis. 2012;224:66–74.

Afonso MS, Lavrador MSF, Koike MK, Cintra DE, Ferreira FD, Nunes VS, Castilho G, Gioielli LA, Paula Bombo R, Catanozi S et al. Dietary interesterified fat enriched with palmitic acid induces atherosclerosis by impairing macrophage cholesterol efflux and eliciting inflammation. J Nutr Biochem 2016, 32.

Hu X, Tanaka N, Guo R, Lu Y, Nakajima T, Gonzalez FJ, Aoyama T. PPARα protects against trans-fatty-acid-containing diet-induced steatohepatitis. J Nutr Biochem. 2017;39:77–85.

Larner DP, Morgan SA, Gathercole LL, Doig CL, Guest P, Weston C, Hazeldine J, Tomlinson JW, Stewart PM, Lavery GG. Male 11β-HSD1 Knockout Mice Fed Trans-Fats and Fructose are not protected from metabolic syndrome or nonalcoholic fatty liver disease. Endocrinology. 2016;157:3493–504.

Ge Y, Liu W, Tao H, Zhang Y, Liu L, Liu Z, Qiu B, Xu T. Effect of industrial trans-fatty acids-enriched diet on gut microbiota of C57BL/6 mice. Eur J Nutr. 2019;58:2625–38.

Figliuolo VR, Dos Santos LM, Abalo A, Nanini H, Santos A, Brittes NM, Bernardazzi C, de Souza HSP, Vieira LQ, Coutinho-Silva R, Coutinho CMLM. Sulfate-reducing bacteria stimulate gut immune responses and contribute to inflammation in experimental colitis. Life Sci. 2017;189:29–38.

Ijssennagger N, van der Meer R, van Mil SWC. Sulfide as a mucus barrier-breaker in inflammatory bowel disease? Trends Mol Med. 2016;22:190–9.

Bhalla DK, Hirata F, Rishi AK, Gairola CG. Cigarette smoke, inflammation, and lung injury: a mechanistic perspective. J Toxicol Environ Health Part B Crit Reviews. 2009;12:45–64.

Hastert TA, Otto deO MC, L-S F, S BT, S LM, T MY, J DR, B A. Association of plasma phospholipid polyunsaturated and trans fatty acids with body mass index: results from the multi-ethnic study of atherosclerosis. Int J Obes. 2018;42:433–40.

Iyengar NM, Gucalp A, Dannenberg AJ, Hudis CA. Obesity and Cancer mechanisms: Tumor Microenvironment and inflammation. J Clin Oncology: Official J Am Soc Clin Oncol. 2016;34:4270–6.

Li H, Wu X, Bai Y, Wei W, Li G, Fu M, Jie J, Wang C, Guan X, Feng Y, et al. Physical activity attenuates the associations of systemic immune-inflammation index with total and cause-specific mortality among middle-aged and older populations. Sci Rep. 2021;11:12532.

You Y, Chen Y, Fang W, Li X, Wang R, Liu J, Ma X. The association between sedentary behavior, exercise, and sleep disturbance: a mediation analysis of inflammatory biomarkers. Front Immunol. 2022;13:1080782.

Download references

Acknowledgements

The authors appreciate the time and effort given by participants during the data collection phase of the NHANES project.

The study did not receive any funding.

Author information

Authors and affiliations.

Department of Clinical Medicine, The Nanshan College of Guangzhou Medical University, Guangzhou, 511436, China

Xiao-Feng Zhu

Department of Clinical Medicine, The Third Clinical School of Guangzhou Medical University, Guangzhou, 511436, China

Yu-Qi Hu & Xiu-Juan Li

Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201600, China

Zhi-Cheng Dai

Second Department of Infectious Disease, Shanghai Fifth People’s Hospital, Fudan University, Shanghai, 201100, China

You can also search for this author in PubMed   Google Scholar

Contributions

ZXF contributed to data collection, analysis, study design, and manuscript writing. HYQ, DZC, and LXJ contributed to the writing of the manuscript, and ZJ contributed to the project design and administration. All authors have granted their approval for the manuscript.

Corresponding author

Correspondence to Jing Zhang .

Ethics declarations

Ethical approval and consent to participate.

The ethical review committee of the National Centre for Health Statistics approved all NHANES protocols and written informed consent was obtained from all participants. All the additional materials, including protocol numbers, are available at https://www.cdc.gov/nchs/nhanes/about_nhanes.htm . The authors confirmed that the whole procedure of the study was conducted under Protocol, which is available at https://www.cdc.gov/nchs/nhanes.htm .

Consent for publication

Not applicable.

Conflicts of interest

The authors declare no conflicts of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhu, XF., Hu, YQ., Dai, ZC. et al. Associations between trans fatty acids and systemic immune-inflammation index: a cross-sectional study. Lipids Health Dis 23 , 122 (2024). https://doi.org/10.1186/s12944-024-02109-w

Download citation

Received : 22 January 2024

Accepted : 15 April 2024

Published : 27 April 2024

DOI : https://doi.org/10.1186/s12944-024-02109-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Systemic immunity inflammation index
  • Cross-sectional study
  • National health and nutrition examination survey
  • Trans fatty acids

Lipids in Health and Disease

ISSN: 1476-511X

what is the p value in research

Appointments at Mayo Clinic

Meditation: a simple, fast way to reduce stress.

Meditation can wipe away the day's stress, bringing with it inner peace. See how you can easily learn to practice meditation whenever you need it most.

If stress has you anxious, tense and worried, you might try meditation. Spending even a few minutes in meditation can help restore your calm and inner peace.

Anyone can practice meditation. It's simple and doesn't cost much. And you don't need any special equipment.

You can practice meditation wherever you are. You can meditate when you're out for a walk, riding the bus, waiting at the doctor's office or even in the middle of a business meeting.

Understanding meditation

Meditation has been around for thousands of years. Early meditation was meant to help deepen understanding of the sacred and mystical forces of life. These days, meditation is most often used to relax and lower stress.

Meditation is a type of mind-body complementary medicine. Meditation can help you relax deeply and calm your mind.

During meditation, you focus on one thing. You get rid of the stream of thoughts that may be crowding your mind and causing stress. This process can lead to better physical and emotional well-being.

Benefits of meditation

Meditation can give you a sense of calm, peace and balance that can benefit your emotional well-being and your overall health. You also can use it to relax and cope with stress by focusing on something that calms you. Meditation can help you learn to stay centered and keep inner peace.

These benefits don't end when your meditation session ends. Meditation can help take you more calmly through your day. And meditation may help you manage symptoms of some medical conditions.

Meditation and emotional and physical well-being

When you meditate, you may clear away the information overload that builds up every day and contributes to your stress.

The emotional and physical benefits of meditation can include:

  • Giving you a new way to look at things that cause stress.
  • Building skills to manage your stress.
  • Making you more self-aware.
  • Focusing on the present.
  • Reducing negative feelings.
  • Helping you be more creative.
  • Helping you be more patient.
  • Lowering resting heart rate.
  • Lowering resting blood pressure.
  • Helping you sleep better.

Meditation and illness

Meditation also might help if you have a medical condition. This is most often true if you have a condition that stress makes worse.

A lot of research shows that meditation is good for health. But some experts believe there's not enough research to prove that meditation helps.

With that in mind, some research suggests that meditation may help people manage symptoms of conditions such as:

  • Chronic pain.
  • Depression.
  • Heart disease.
  • High blood pressure.
  • Irritable bowel syndrome.
  • Sleep problems.
  • Tension headaches.

Be sure to talk to your healthcare professional about the pros and cons of using meditation if you have any of these or other health conditions. Sometimes, meditation might worsen symptoms linked to some mental health conditions.

Meditation doesn't replace medical treatment. But it may help to add it to other treatments.

Types of meditation

Meditation is an umbrella term for the many ways to get to a relaxed state. There are many types of meditation and ways to relax that use parts of meditation. All share the same goal of gaining inner peace.

Ways to meditate can include:

Guided meditation. This is sometimes called guided imagery or visualization. With this method of meditation, you form mental images of places or things that help you relax.

You try to use as many senses as you can. These include things you can smell, see, hear and feel. You may be led through this process by a guide or teacher.

  • Mantra meditation. In this type of meditation, you repeat a calming word, thought or phrase to keep out unwanted thoughts.

Mindfulness meditation. This type of meditation is based on being mindful. This means being more aware of the present.

In mindfulness meditation, you focus on one thing, such as the flow of your breath. You can notice your thoughts and feelings. But let them pass without judging them.

  • Qigong. This practice most often combines meditation, relaxation, movement and breathing exercises to restore and maintain balance. Qigong (CHEE-gung) is part of Chinese medicine.
  • Tai chi. This is a form of gentle Chinese martial arts training. In tai chi (TIE-CHEE), you do a series of postures or movements in a slow, graceful way. And you do deep breathing with the movements.
  • Yoga. You do a series of postures with controlled breathing. This helps give you a more flexible body and a calm mind. To do the poses, you need to balance and focus. That helps you to focus less on your busy day and more on the moment.

Parts of meditation

Each type of meditation may include certain features to help you meditate. These may vary depending on whose guidance you follow or who's teaching a class. Some of the most common features in meditation include:

Focused attention. Focusing your attention is one of the most important elements of meditation.

Focusing your attention is what helps free your mind from the many things that cause stress and worry. You can focus your attention on things such as a certain object, an image, a mantra or even your breathing.

  • Relaxed breathing. This technique involves deep, even-paced breathing using the muscle between your chest and your belly, called the diaphragm muscle, to expand your lungs. The purpose is to slow your breathing, take in more oxygen, and reduce the use of shoulder, neck and upper chest muscles while breathing so that you breathe better.

A quiet setting. If you're a beginner, meditation may be easier if you're in a quiet spot. Aim to have fewer things that can distract you, including no television, computers or cellphones.

As you get more skilled at meditation, you may be able to do it anywhere. This includes high-stress places, such as a traffic jam, a stressful work meeting or a long line at the grocery store. This is when you can get the most out of meditation.

  • A comfortable position. You can practice meditation whether you're sitting, lying down, walking, or in other positions or activities. Just try to be comfortable so that you can get the most out of your meditation. Aim to keep good posture during meditation.
  • Open attitude. Let thoughts pass through your mind without judging them.

Everyday ways to practice meditation

Don't let the thought of meditating the "right" way add to your stress. If you choose to, you can attend special meditation centers or group classes led by trained instructors. But you also can practice meditation easily on your own. There are apps to use too.

And you can make meditation as formal or informal as you like. Some people build meditation into their daily routine. For example, they may start and end each day with an hour of meditation. But all you really need is a few minutes a day for meditation.

Here are some ways you can practice meditation on your own, whenever you choose:

Breathe deeply. This is good for beginners because breathing is a natural function.

Focus all your attention on your breathing. Feel your breath and listen to it as you inhale and exhale through your nostrils. Breathe deeply and slowly. When your mind wanders, gently return your focus to your breathing.

Scan your body. When using this technique, focus attention on each part of your body. Become aware of how your body feels. That might be pain, tension, warmth or relaxation.

Mix body scanning with breathing exercises and think about breathing heat or relaxation into and out of the parts of your body.

  • Repeat a mantra. You can create your own mantra. It can be religious or not. Examples of religious mantras include the Jesus Prayer in the Christian tradition, the holy name of God in Judaism, or the om mantra of Hinduism, Buddhism and other Eastern religions.

Walk and meditate. Meditating while walking is a good and healthy way to relax. You can use this technique anywhere you're walking, such as in a forest, on a city sidewalk or at the mall.

When you use this method, slow your walking pace so that you can focus on each movement of your legs or feet. Don't focus on where you're going. Focus on your legs and feet. Repeat action words in your mind such as "lifting," "moving" and "placing" as you lift each foot, move your leg forward and place your foot on the ground. Focus on the sights, sounds and smells around you.

Pray. Prayer is the best known and most widely used type of meditation. Spoken and written prayers are found in most faith traditions.

You can pray using your own words or read prayers written by others. Check the self-help section of your local bookstore for examples. Talk with your rabbi, priest, pastor or other spiritual leader about possible resources.

Read and reflect. Many people report that they benefit from reading poems or sacred texts and taking a few moments to think about their meaning.

You also can listen to sacred music, spoken words, or any music that relaxes or inspires you. You may want to write your thoughts in a journal or discuss them with a friend or spiritual leader.

  • Focus your love and kindness. In this type of meditation, you think of others with feelings of love, compassion and kindness. This can help increase how connected you feel to others.

Building your meditation skills

Don't judge how you meditate. That can increase your stress. Meditation takes practice.

It's common for your mind to wander during meditation, no matter how long you've been practicing meditation. If you're meditating to calm your mind and your mind wanders, slowly return to what you're focusing on.

Try out ways to meditate to find out what types of meditation work best for you and what you enjoy doing. Adapt meditation to your needs as you go. Remember, there's no right way or wrong way to meditate. What matters is that meditation helps you reduce your stress and feel better overall.

Related information

  • Relaxation techniques: Try these steps to lower stress - Related information Relaxation techniques: Try these steps to lower stress
  • Stress relievers: Tips to tame stress - Related information Stress relievers: Tips to tame stress
  • Video: Need to relax? Take a break for meditation - Related information Video: Need to relax? Take a break for meditation

There is a problem with information submitted for this request. Review/update the information highlighted below and resubmit the form.

From Mayo Clinic to your inbox

Sign up for free and stay up to date on research advancements, health tips, current health topics, and expertise on managing health. Click here for an email preview.

Error Email field is required

Error Include a valid email address

To provide you with the most relevant and helpful information, and understand which information is beneficial, we may combine your email and website usage information with other information we have about you. If you are a Mayo Clinic patient, this could include protected health information. If we combine this information with your protected health information, we will treat all of that information as protected health information and will only use or disclose that information as set forth in our notice of privacy practices. You may opt-out of email communications at any time by clicking on the unsubscribe link in the e-mail.

Thank you for subscribing!

You'll soon start receiving the latest Mayo Clinic health information you requested in your inbox.

Sorry something went wrong with your subscription

Please, try again in a couple of minutes

  • Meditation: In depth. National Center for Complementary and Integrative Health. https://nccih.nih.gov/health/meditation/overview.htm. Accessed Dec. 23, 2021.
  • Mindfulness meditation: A research-proven way to reduce stress. American Psychological Association. https://www.apa.org/topics/mindfulness/meditation. Accessed Dec. 23, 2021.
  • AskMayoExpert. Meditation. Mayo Clinic. 2021.
  • Papadakis MA, et al., eds. Meditation. In: Current Medical Diagnosis & Treatment 2022. 61st ed. McGraw Hill; 2022. https://accessmedicine.mhmedical.com. Accessed Dec. 23, 2021.
  • Hilton L, et al. Mindfulness meditation for chronic pain: Systematic review and meta-analysis. Annals of Behavioral Medicine. 2017; doi:10.1007/s12160-016-9844-2.
  • Seaward BL. Meditation. In: Essentials of Managing Stress. 5th ed. Jones & Bartlett Learning; 2021.
  • Seaward BL. Managing Stress: Principles and Strategies for Health and Well-Being. 9th ed. Burlington, Mass.: Jones & Bartlett Learning; 2018.

Products and Services

  • A Book: Mayo Clinic Handbook for Happiness
  • A very happy brain
  • Alternative cancer treatments: 11 options to consider
  • Brain tumor
  • Brain Tumor
  • What is a brain tumor? A Mayo Clinic expert explains
  • Brain tumor FAQs
  • Living with Brain Tumors
  • Long Term Brain Cancer Survivor
  • Mayo Clinic Minute: Meditation is good medicine
  • Meditation 2.0: A new way to meditate
  • Parkinson's disease
  • Punk Guitarist Survives Brain Tumor
  • Guided meditation video

Mayo Clinic does not endorse companies or products. Advertising revenue supports our not-for-profit mission.

  • Opportunities

Mayo Clinic Press

Check out these best-sellers and special offers on books and newsletters from Mayo Clinic Press .

  • Mayo Clinic on Incontinence - Mayo Clinic Press Mayo Clinic on Incontinence
  • The Essential Diabetes Book - Mayo Clinic Press The Essential Diabetes Book
  • Mayo Clinic on Hearing and Balance - Mayo Clinic Press Mayo Clinic on Hearing and Balance
  • FREE Mayo Clinic Diet Assessment - Mayo Clinic Press FREE Mayo Clinic Diet Assessment
  • Mayo Clinic Health Letter - FREE book - Mayo Clinic Press Mayo Clinic Health Letter - FREE book
  • Meditation A simple fast way to reduce stress

Your gift holds great power – donate today!

Make your tax-deductible gift and be a part of the cutting-edge research and care that's changing medicine.

IMAGES

  1. Understanding P-Values and Statistical Significance

    what is the p value in research

  2. P-Value: What It Is, How to Calculate It, and Why It Matters

    what is the p value in research

  3. What is p-value: How to Calculate It and Statistical Significance

    what is the p value in research

  4. The p value

    what is the p value in research

  5. P-value Question Example

    what is the p value in research

  6. Understanding P-values in Data Science

    what is the p value in research

VIDEO

  1. Calculating a P Value for a One Sample Mean t dist Hypothesis test

  2. What is P-value in hypothesis testing

  3. SPSS for newbies: the p-value made simple

  4. P-VALUE CONCEPT AND EXAMPLE #shorts #statistics #data #datanalysis #analysis #mean

  5. P value explained

  6. Write p-value as p less than 0.001 instead of p =.000. Why?

COMMENTS

  1. Understanding P-values

    Reporting p values. P values of statistical tests are usually reported in the results section of a research paper, along with the key information needed for readers to put the p values in context - for example, correlation coefficient in a linear regression, or the average difference between treatment groups in a t-test.. Example: Reporting the results In our comparison of mouse diet A and ...

  2. Understanding P-Values and Statistical Significance

    A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p -value, the less likely the results occurred by random chance, and the ...

  3. Hypothesis Testing, P Values, Confidence Intervals, and Significance

    P Values. P values are used in research to determine whether the sample estimate is significantly different from a hypothesized value. The p-value is the probability that the observed effect within the study would have occurred by chance if, in reality, there was no true effect. Conventionally, data yielding a p<0.05 or p<0.01 is considered ...

  4. p-value

    The p -value is used in the context of null hypothesis testing in order to quantify the statistical significance of a result, the result being the observed value of the chosen statistic . [note 2] The lower the p -value is, the lower the probability of getting that result if the null hypothesis were true. A result is said to be statistically ...

  5. P-Value: What It Is, How to Calculate It, and Why It Matters

    P-Value: The p-value is the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event. The p-value is used as an ...

  6. How to Find the P value: Process and Calculations

    In this case, our t-value of 2.289 produces a p value between 0.02 and 0.05 for a two-tailed test. Our results are statistically significant, and they are consistent with the calculator's more precise results. Displaying the P value in a Chart. In the example above, you saw how to calculate a p-value starting with the sample statistics.

  7. The clinician's guide to p values, confidence intervals, and magnitude

    What a p-value cannot tell us, ... Phillips M. Letter to the editor: editorial: threshold p values in orthopaedic research-we know the problem. What is the solution? Clin Orthop. 2019;477:1756-8.

  8. Interpreting P values

    The particular value of the p-value provides additional information. When the p-value is less than 0.05, it's significant, but there's a vast difference if the p-value is 0.045 or 0.001. When the p-value is near 0.05, it can be significant but the evidence against the null is fairly weak.

  9. An Explanation of P-Values and Statistical Significance

    The textbook definition of a p-value is: A p-value is the probability of observing a sample statistic that is at least as extreme as your sample statistic, given that the null hypothesis is true. For example, suppose a factory claims that they produce tires that have a mean weight of 200 pounds. An auditor hypothesizes that the true mean weight ...

  10. S.3.2 Hypothesis Testing (P-Value Approach)

    The P -value is, therefore, the area under a tn - 1 = t14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually. Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests.

  11. P-Value in Statistical Hypothesis Tests: What is it?

    A p value is used in hypothesis testing to help you support or reject the null hypothesis. The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. P values are expressed as decimals although it may be easier to understand what they are if you convert ...

  12. What is p-value: How to Calculate It and Statistical Significance

    What is a p-value. The p-value, or probability value, is the probability that your results occurred randomly given that the null hypothesis is true. P-values are used in hypothesis testing to find evidence that differences in values or groups exist. P-values are determined through the calculation of the test statistic for the test you are using ...

  13. What the P values really tell us

    The P value means the probability, for a given statistical model that, when the null hypothesis is true, the statistical summary would be equal to or more extreme than the actual observed results [ 2 ]. Therefore, P values only indicate how incompatible the data are with a specific statistical model (usually with a null-hypothesis).

  14. P-value: What is and what is not

    The p-value is the probability of the observed data given that the null hypothesis is true, which is a probability that measures the consistency between the data and the hypothesis being tested if, and only if, the statistical model used to compute the p-value is correct ( 9 ). The smaller the p-value the greater the discrepancy: "If p is ...

  15. An Easy Introduction to Statistical Significance (With Examples)

    The p value determines statistical significance. An extremely low p value indicates high statistical significance, while a high p value means low or no statistical significance. Example: Hypothesis testing. To test your hypothesis, you first collect data from two groups. The experimental group actively smiles, while the control group does not.

  16. p-value Calculator

    Welcome to our p-value calculator! You will never again have to wonder how to find the p-value, as here you can determine the one-sided and two-sided p-values from test statistics, following all the most popular distributions: normal, t-Student, chi-squared, and Snedecor's F.. P-values appear all over science, yet many people find the concept a bit intimidating.

  17. P-values and "statistical significance": what they actually mean

    A "true" effect can sometimes yield a p-value of greater than .05. And we know from recent years that science is rife with false-positive studies that achieved values of less than .05 ...

  18. What is a p value and what does it mean?

    Statistical probability or p values reveal whether the findings in a research study are statistically significant, meaning that the findings are unlikely to have occurred by chance. To understand the p value concept, it is important to understand its relationship with the α level. Before conducting a study, researchers specify the α level ...

  19. P-Value: A Complete Guide

    The p-value in statistics is the probability of getting outcomes at least as extreme as the outcomes of a statistical hypothesis test, assuming the null hypothesis to be correct. To put it in simpler words, it is a calculated number from a statistical test that shows how likely you are to have found a set of observations if the null hypothesis ...

  20. P-values

    The p -value is a practical tool gauging the "strength of evidence" against the null hypothesis. It informs investigators that a p -value of 0.001, for example, is stronger than 0.05. However, p -values produced in significance testing are not the probabilities of type I errors as commonly misconceived.

  21. The P Value and Statistical Significance: Misunderstandings

    The calculation of a P value in research and especially the use of a threshold to declare the statistical significance of the P value have both been challenged in recent years. There are at least two important reasons for this challenge: research data contain much more meaning than is summarized in a P value and its statistical significance, and these two concepts are frequently misunderstood ...

  22. The P value: What it really means

    The P value is the probability that the results of a study are caused by chance alone. To better understand this definition, consider the role of chance. The concept of chance is illustrated with every flip of a coin. The true probability of obtaining heads in any single flip is 0.5, meaning that heads would come up in half of the flips and ...

  23. Met Gala 2024: The Most Talked About Moments Online

    This year's Met Gala sparked many online discussions, with over 488k unique authors generating over 2.21m mentions during the event. This year's event was slightly more popular in terms of online conversation than last year's Met Gala, which generated 2.1 million mentions. 65% of emotion-categorized mentions around the Met Gala expressed joy.

  24. Home

    Zillow Research aims to be the most open, authoritative source for timely and accurate housing data and unbiased insight. ... The Numbers MARCH 2024 U.S. Typical Home Value (Zillow Home Value Index) $355,696 (4.6% YoY) MARCH 2024 U.S. Typical Rent (Zillow Observed Rent Index) $1,983 (3.6% YOY) MARCH 2024 Change in Typical Home Value Since Pre ...

  25. Global trends and scenarios for terrestrial biodiversity and ...

    During the past century, humans have caused biodiversity loss at rates that are 30 to 120 times higher than the mean extinction rates in the Cenozoic fossil record ().Although multiple proximate causes drive this loss, ultimately, a growing human population and economy have demanded increasing land and natural resources, causing habitat conversion and loss ().

  26. Associations between trans fatty acids and systemic immune-inflammation

    Background Previous studies have demonstrated that trans fatty acids (TFAs) intake was linked to an increased risk of chronic diseases. As a novel systemic inflammatory biomarker, the clinical value and efficacy of the systemic immune-inflammation index (SII) have been widely explored. However, the association between TFAs and SII is still unclear. Therefore, the study aims to investigate the ...

  27. Meditation: A simple, fast way to reduce stress

    Meditation is a type of mind-body complementary medicine. Meditation can help you relax deeply and calm your mind. During meditation, you focus on one thing. You get rid of the stream of thoughts that may be crowding your mind and causing stress. This process can lead to better physical and emotional well-being.

  28. The Value of p-Value in Biomedical Research

    The p -value is one of the most widely used statistical terms in decision making in biomedical research, which assists the investigators to conclude about the significance of a research consideration. Up today, most researchers base their decision on the value of the probability p. However, the term p -value is often miss- or over- interpreted ...

  29. P

    The threshold value, P < 0.05 is arbitrary. As has been said earlier, it was the practice of Fisher to assign P the value of 0.05 as a measure of evidence against null effect. One can make the "significant test" more stringent by moving to 0.01 (1%) or less stringent moving the borderline to 0.10 (10%).