• Privacy Policy

Research Method

Home » Inferential Statistics – Types, Methods and Examples

Inferential Statistics – Types, Methods and Examples

Table of Contents

Inferential Statistics

Inferential Statistics

Inferential statistics is a branch of statistics that involves making predictions or inferences about a population based on a sample of data taken from that population. It is used to analyze the probabilities, assumptions, and outcomes of a hypothesis .

The basic steps of inferential statistics typically involve the following:

  • Define a Hypothesis: This is often a statement about a parameter of a population, such as the population mean or population proportion.
  • Select a Sample: In order to test the hypothesis, you’ll select a sample from the population. This should be done randomly and should be representative of the larger population in order to avoid bias.
  • Collect Data: Once you have your sample, you’ll need to collect data. This data will be used to calculate statistics that will help you test your hypothesis.
  • Perform Analysis: The collected data is then analyzed using statistical tests such as the t-test, chi-square test, or ANOVA, to name a few. These tests help to determine the likelihood that the results of your analysis occurred by chance.
  • Interpret Results: The analysis can provide a probability, called a p-value, which represents the likelihood that the results occurred by chance. If this probability is below a certain level (commonly 0.05), you may reject the null hypothesis (the statement that there is no effect or relationship) in favor of the alternative hypothesis (the statement that there is an effect or relationship).

Inferential Statistics Types

Inferential statistics can be broadly categorized into two types: parametric and nonparametric. The selection of type depends on the nature of the data and the purpose of the analysis.

Parametric Inferential Statistics

These are statistical methods that assume data comes from a type of probability distribution and makes inferences about the parameters of the distribution. Common parametric methods include:

  • T-tests : Used when comparing the means of two groups to see if they’re significantly different.
  • Analysis of Variance (ANOVA) : Used to compare the means of more than two groups.
  • Regression Analysis : Used to predict the value of one variable (dependent) based on the value of another variable (independent).
  • Chi-square test for independence : Used to test if there is a significant association between two categorical variables.
  • Pearson’s correlation : Used to test if there is a significant linear relationship between two continuous variables.

Nonparametric Inferential Statistics

These are methods used when the data does not meet the requirements necessary to use parametric statistics, such as when data is not normally distributed. Common nonparametric methods include:

  • Mann-Whitney U Test : Non-parametric equivalent to the independent samples t-test.
  • Wilcoxon Signed-Rank Test : Non-parametric equivalent to the paired samples t-test.
  • Kruskal-Wallis Test : Non-parametric equivalent to the one-way ANOVA.
  • Spearman’s rank correlation : Non-parametric equivalent to the Pearson correlation.
  • Chi-square test for goodness of fit : Used to test if the observed frequencies for a categorical variable match the expected frequencies.

Inferential Statistics Formulas

Inferential statistics use various formulas and statistical tests to draw conclusions or make predictions about a population based on a sample from that population. Here are a few key formulas commonly used:

Confidence Interval for a Mean:

When you have a sample and want to make an inference about the population mean (µ), you might use a confidence interval.

The formula for a confidence interval around a mean is:

[Sample Mean] ± [Z-score or T-score] * (Standard Deviation / sqrt[n]) where:

  • Sample Mean is the mean of your sample data
  • Z-score or T-score is the value from the Z or T distribution corresponding to the desired confidence level (Z is used when the population standard deviation is known or the sample size is large, otherwise T is used)
  • Standard Deviation is the standard deviation of the sample
  • sqrt[n] is the square root of the sample size

Hypothesis Testing:

Hypothesis testing often involves calculating a test statistic, which is then compared to a critical value to decide whether to reject the null hypothesis.

A common test statistic for a test about a mean is the Z-score:

Z = (Sample Mean - Hypothesized Population Mean) / (Standard Deviation / sqrt[n])

where all variables are as defined above.

Chi-Square Test:

The Chi-Square Test is used when dealing with categorical data.

The formula is:

χ² = Σ [ (Observed-Expected)² / Expected ]

  • Observed is the actual observed frequency
  • Expected is the frequency we would expect if the null hypothesis were true

The t-test is used to compare the means of two groups. The formula for the independent samples t-test is:

t = (mean1 - mean2) / sqrt [ (sd1²/n1) + (sd2²/n2) ] where:

  • mean1 and mean2 are the sample means
  • sd1 and sd2 are the sample standard deviations
  • n1 and n2 are the sample sizes

Inferential Statistics Examples

Sure, inferential statistics are used when making predictions or inferences about a population from a sample of data. Here are a few real-time examples:

  • Medical Research: Suppose a pharmaceutical company is developing a new drug and they’re currently in the testing phase. They gather a sample of 1,000 volunteers to participate in a clinical trial. They find that 700 out of these 1,000 volunteers reported a significant reduction in their symptoms after taking the drug. Using inferential statistics, they can infer that the drug would likely be effective for the larger population.
  • Customer Satisfaction: Suppose a restaurant wants to know if its customers are satisfied with their food. They could survey a sample of their customers and ask them to rate their satisfaction on a scale of 1 to 10. If the average rating was 8.5 from a sample of 200 customers, they could use inferential statistics to infer that the overall customer population is likely satisfied with the food.
  • Political Polling: A polling company wants to predict who will win an upcoming presidential election. They poll a sample of 10,000 eligible voters and find that 55% prefer Candidate A, while 45% prefer Candidate B. Using inferential statistics, they infer that Candidate A has a higher likelihood of winning the election.
  • E-commerce Trends: An e-commerce company wants to improve its recommendation engine. They analyze a sample of customers’ purchase history and notice a trend that customers who buy kitchen appliances also frequently buy cookbooks. They use inferential statistics to infer that recommending cookbooks to customers who buy kitchen appliances would likely increase sales.
  • Public Health: A health department wants to assess the impact of a health awareness campaign on smoking rates. They survey a sample of residents before and after the campaign. If they find a significant reduction in smoking rates among the surveyed group, they can use inferential statistics to infer that the campaign likely had an impact on the larger population’s smoking habits.

Applications of Inferential Statistics

Inferential statistics are extensively used in various fields and industries to make decisions or predictions based on data. Here are some applications of inferential statistics:

  • Healthcare: Inferential statistics are used in clinical trials to analyze the effect of a treatment or a drug on a sample population and then infer the likely effect on the general population. This helps in the development and approval of new treatments and drugs.
  • Business: Companies use inferential statistics to understand customer behavior and preferences, market trends, and to make strategic decisions. For example, a business might sample customer satisfaction levels to infer the overall satisfaction of their customer base.
  • Finance: Banks and financial institutions use inferential statistics to evaluate the risk associated with loans and investments. For example, inferential statistics can help in determining the risk of default by a borrower based on the analysis of a sample of previous borrowers with similar credit characteristics.
  • Quality Control: In manufacturing, inferential statistics can be used to maintain quality standards. By analyzing a sample of the products, companies can infer the quality of all products and decide whether the manufacturing process needs adjustments.
  • Social Sciences: In fields like psychology, sociology, and education, researchers use inferential statistics to draw conclusions about populations based on studies conducted on samples. For instance, a psychologist might use a survey of a sample of people to infer the prevalence of a particular psychological trait or disorder in a larger population.
  • Environment Studies: Inferential statistics are also used to study and predict environmental changes and their impact. For instance, researchers might measure pollution levels in a sample of locations to infer overall pollution levels in a wider area.
  • Government Policies: Governments use inferential statistics in policy-making. By analyzing sample data, they can infer the potential impacts of policies on the broader population and thus make informed decisions.

Purpose of Inferential Statistics

The purposes of inferential statistics include:

  • Estimation of Population Parameters: Inferential statistics allows for the estimation of population parameters. This means that it can provide estimates about population characteristics based on sample data. For example, you might want to estimate the average weight of all men in a country by sampling a smaller group of men.
  • Hypothesis Testing: Inferential statistics provides a framework for testing hypotheses. This involves making an assumption (the null hypothesis) and then testing this assumption to see if it should be rejected or not. This process enables researchers to draw conclusions about population parameters based on their sample data.
  • Prediction: Inferential statistics can be used to make predictions about future outcomes. For instance, a researcher might use inferential statistics to predict the outcomes of an election or forecast sales for a company based on past data.
  • Relationships Between Variables: Inferential statistics can also be used to identify relationships between variables, such as correlation or regression analysis. This can provide insights into how different factors are related to each other.
  • Generalization: Inferential statistics allows researchers to generalize their findings from the sample to the larger population. It helps in making broad conclusions, given that the sample is representative of the population.
  • Variability and Uncertainty: Inferential statistics also deal with the idea of uncertainty and variability in estimates and predictions. Through concepts like confidence intervals and margins of error, it provides a measure of how confident we can be in our estimations and predictions.
  • Error Estimation : It provides measures of possible errors (known as margins of error), which allow us to know how much our sample results may differ from the population parameters.

Limitations of Inferential Statistics

Inferential statistics, despite its many benefits, does have some limitations. Here are some of them:

  • Sampling Error : Inferential statistics are often based on the concept of sampling, where a subset of the population is used to infer about the population. There’s always a chance that the sample might not perfectly represent the population, leading to sampling errors.
  • Misleading Conclusions : If assumptions for statistical tests are not met, it could lead to misleading results. This includes assumptions about the distribution of data, homogeneity of variances, independence, etc.
  • False Positives and Negatives : There’s always a chance of a Type I error (rejecting a true null hypothesis, or a false positive) or a Type II error (not rejecting a false null hypothesis, or a false negative).
  • Dependence on Quality of Data : The accuracy and validity of inferential statistics depend heavily on the quality of data collected. If data are biased, inaccurate, or collected using flawed methods, the results won’t be reliable.
  • Limited Predictive Power : While inferential statistics can provide estimates and predictions, these are based on the current data and may not fully account for future changes or variables not included in the model.
  • Complexity : Some inferential statistical methods can be quite complex and require a solid understanding of statistical principles to implement and interpret correctly.
  • Influenced by Outliers : Inferential statistics can be heavily influenced by outliers. If these extreme values aren’t handled properly, they can lead to misleading results.
  • Over-reliance on P-values : There’s a tendency in some fields to overly rely on p-values to determine significance, even though p-values have several limitations and are often misunderstood.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Graphical Methods

Graphical Methods – Types, Examples and Guide

What Is Inferential Statistics?

what is inferential data analysis in research

Inferential statistics help us draw conclusions about how a hypothesis will play out or to determine a general parameter about a larger sample. We often use this process to compare two groups of subjects to make greater generalizations about a larger overall population.

Inferential Statistics vs. Descriptive Statistics

Related Reading From Built In Experts What Is Descriptive Statistics?

What Are Inferential Statistics Used For?

Inferential statistics are generally used in two ways: to set parameters about a group and then create hypotheses about how data will perform when scaled.

Inferential statistics are among the most useful tools for making educated predictions about how a set of data will scale when applied to a larger population of subjects. These statistics help set a benchmark for hypothesis testing, as well as a general idea of where specific parameters will land when scaled to a larger data set, such as the larger set’s mean.

 This process can determine a population’s z-score (where a subject will land on a bell curve) and set data up for further testing.

What’s the Difference Between Descriptive and Inferential Statistics?

Descriptive statistics are meant to illustrate data exactly as it is presented, meaning no predictions or generalizations should be used in the presentation of this data. More detailed descriptive statistics will present factors like the mean of a sample, the standard deviation of a sample or describe the sample’s probability shape.

Inferential statistics, on the other hand, rely on the use of generalizations based on data acquired from subjects. These statistics use the same sample of data as descriptive statistics, but exist to make assumptions about how a larger group of subjects will perform based on the performance of the existing subjects, with scalability factors to account for variations in larger groups.

Inferential statistics essentially do one of two things: estimate a population’s parameter, such as the mean or average, or set a hypothesis for further analysis.

What Is an Example of Inferential Statistics?

Any situation where data is extracted from a group of subjects and then used to make inferences about a larger group is an example of inferential statistics at work.

Though data sets may have a tendency to become large and have many variables, inferential statistics do not have to be complicated equations. For example, if you poll 100 people on whether or not they enjoy coffee, and 85 of those 100 people answer yes, while 15 answer no, the data will show that 85 percent of the sample enjoy coffee. Using that data, you might then infer that 85 percent of the general population enjoy coffee, while 15 percent of people do not.

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Great Companies Need Great People. That's Where We Come In.

Grad Coach

Quant Analysis 101: Inferential Statistics

Everything You Need To Get Started (With Examples)

By: Derek Jansen (MBA) | Reviewers: Kerryn Warren (PhD) | October 2023

If you’re new to quantitative data analysis , one of the many terms you’re likely to hear being thrown around is inferential statistics. In this post, we’ll provide an introduction to inferential stats, using straightforward language and loads of examples . 

Overview: Inferential Statistics

What are inferential statistics.

  • Descriptive vs inferential statistics

Correlation

  • Key takeaways

At the simplest level, inferential statistics allow you to test whether the patterns you observe in a sample are likely to be present in the population – or whether they’re just a product of chance.

In stats-speak, this “Is it real or just by chance?” assessment is known as statistical significance . We won’t go down that rabbit hole in this post, but this ability to assess statistical significance means that inferential statistics can be used to test hypotheses and in some cases, they can even be used to make predictions .

That probably sounds rather conceptual – let’s look at a practical example.

Let’s say you surveyed 100 people (this would be your sample) in a specific city about their favourite type of food. Reviewing the data, you found that 70 people selected pizza (i.e., 70% of the sample). You could then use inferential statistics to test whether that number is just due to chance , or whether it is likely representative of preferences across the entire city (this would be your population).

PS – you’d use a chi-square test for this example, but we’ll get to that a little later.

Inferential statistics help you understand whether the patterns you observe in a sample are likely to be present in the population.

Inferential vs Descriptive

At this point, you might be wondering how inferentials differ from descriptive statistics. At the simplest level, descriptive statistics summarise and organise the data you already have (your sample), making it easier to understand.

Inferential statistics, on the other hand, allow you to use your sample data to assess whether the patterns contained within it are likely to be present in the broader population , and potentially, to make predictions about that population.

It’s example time again…

Let’s imagine you’re undertaking a study that explores shoe brand preferences among men and women. If you just wanted to identify the proportions of those who prefer different brands, you’d only require descriptive statistics .

However, if you wanted to assess whether those proportions differ between genders in the broader population (and that the difference is not just down to chance), you’d need to utilise inferential statistics .

In short, descriptive statistics describe your sample, while inferential statistics help you understand whether the patterns in your sample are likely to reflect within the population .

Free Webinar: Research Methodology 101

Let’s look at some inferential tests

Now that we’ve defined inferential statistics and explained how it differs from descriptive statistics, let’s take a look at some of the most common tests within the inferential realm . It’s worth highlighting upfront that there are many different types of inferential tests and this is most certainly not a comprehensive list – just an introductory list to get you started.

A t-test is a way to compare the means (averages) of two groups to see if they are meaningfully different, or if the difference is just by chance. In other words, to assess whether the difference is statistically significant . This is important because comparing two means side-by-side can be very misleading if one has a high variance and the other doesn’t (if this sounds like gibberish, check out our descriptive statistics post here ).

As an example, you might use a t-test to see if there’s a statistically significant difference between the exam scores of two mathematics classes taught by different teachers . This might then lead you to infer that one teacher’s teaching method is more effective than the other.

It’s worth noting that there are a few different types of t-tests . In this example, we’re referring to the independent t-test , which compares the means of two groups, as opposed to the mean of one group at different times (i.e., a paired t-test). Each of these tests has its own set of assumptions and requirements, as do all of the tests we’ll discuss here – but we’ll save assumptions for another post!

Comparing two means (averages) side-by-side can be very misleading if one mean has a high variance and the other mean doesn't.

While a t-test compares the means of just two groups, an ANOVA (which stands for Analysis of Variance) can compare the means of more than two groups at once . Again, this helps you assess whether the differences in the means are statistically significant or simply a product of chance.

For example, if you want to know whether students’ test scores vary based on the type of school they attend – public, private, or homeschool – you could use ANOVA to compare the average standardised test scores of the three groups .

Similarly, you could use ANOVA to compare the average sales of a product across multiple stores. Based on this data, you could make an inference as to whether location is related to (affects) sales.

In these examples, we’re specifically referring to what’s called a one-way ANOVA , but as always, there are multiple types of ANOVAs for different applications. So, be sure to do your research before opting for any specific test.

Example of anova

While t-tests and ANOVAs test for differences in the means across groups, the Chi-square test is used to see if there’s a difference in the proportions of various categories . In stats speak, the Chi-square test assesses whether there’s a statistically significant relationship between two categorical variables (i.e., nominal or ordinal data). If you’re not familiar with these terms, check out our explainer video here .

As an example, you could use a Chi-square test to check if there’s a link between gender (e.g., male and female) and preference for a certain category of car (e.g., sedans or SUVs). Similarly, you could use this type of test to see if there’s a relationship between the type of breakfast people eat (cereal, toast, or nothing) and their university major (business, math or engineering).

Correlation analysis looks at the relationship between two numerical variables (like height or weight) to assess whether they “move together” in some way. In stats-speak, correlation assesses whether a statistically significant relationship exists between two variables that are interval or ratio in nature .

For example, you might find a correlation between hours spent studying and exam scores. This would suggest that generally, the more hours people spend studying, the higher their scores are likely to be.

Similarly, a correlation analysis may reveal a negative relationship between time spent watching TV and physical fitness (represented by VO2 max levels), where the more time spent in front of the television, the lower the physical fitness level.

When running a correlation analysis, you’ll be presented with a correlation coefficient (also known as an r-value), which is a number between -1 and 1. A value close to 1 means that the two variables move in the same direction , while a number close to -1 means that they move in opposite directions . A correlation value of zero means there’s no clear relationship between the two variables.

What’s important to highlight here is that while correlation analysis can help you understand how two variables are related, it doesn’t prove that one causes the other . As the adage goes, correlation is not causation.

Example of correlation

While correlation allows you to see whether there’s a relationship between two numerical variables, regression takes it a step further by allowing you to make predictions about the value of one variable (called the dependent variable) based on the value of one or more other variables (called the independent variables).

For example, you could use regression analysis to predict house prices based on the number of bedrooms, location, and age of the house. The analysis would give you an equation that lets you plug in these factors to estimate a house’s price. Similarly, you could potentially use regression analysis to predict a person’s weight based on their height, age, and daily calorie intake.

It’s worth noting that in these examples, we’ve been talking about multiple regression , as there are multiple independent variables. While this is a popular form of regression, there are many others, including simple linear, logistic and multivariate. As always, be sure to do your research before selecting a specific statistical test.

As with correlation, keep in mind that regression analysis alone doesn’t prove causation . While it can show that variables are related and help you make predictions, it can’t prove that one variable causes another to change. Other factors that you haven’t included in your model could be influencing the results. To establish causation, you’d typically need a very specific research design that allows you to control all (or at least most) variables.

Let’s Recap

We’ve covered quite a bit of ground. Here’s a quick recap of the key takeaways:

  • Inferential stats allow you to assess whether patterns in your sample are likely to be present in your population
  • Some common inferential statistical tests include t-tests, ANOVA, chi-square, correlation and regression .
  • Inferential statistics alone do not prove causation . To identify and measure causal relationships, you need a very specific research design.

If you’d like 1-on-1 help with your inferential statistics, check out our private coaching service , where we hold your hand throughout the quantitative research process.

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

What is descriptive statistics?

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Logo for University of Southern Queensland

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

15 Quantitative analysis: Inferential statistics

Inferential statistics are the statistical procedures that are used to reach conclusions about associations between variables. They differ from descriptive statistics in that they are explicitly designed to test hypotheses. Numerous statistical procedures fall into this category—most of which are supported by modern statistical software such as SPSS and SAS. This chapter provides a short primer on only the most basic and frequent procedures. Readers are advised to consult a formal text on statistics or take a course on statistics for more advanced procedures.

Basic concepts

British philosopher Karl Popper said that theories can never be proven, only disproven. As an example, how can we prove that the sun will rise tomorrow? Popper said that just because the sun has risen every single day that we can remember does not necessarily mean that it will rise tomorrow, because inductively derived theories are only conjectures that may or may not be predictive of future phenomena. Instead, he suggested that we may assume a theory that the sun will rise every day without necessarily proving it, and if the sun does not rise on a certain day, the theory is falsified and rejected. Likewise, we can only reject hypotheses based on contrary evidence, but can never truly accept them because the presence of evidence does not mean that we will not observe contrary evidence later. Because we cannot truly accept a hypothesis of interest (alternative hypothesis), we formulate a null hypothesis as the opposite of the alternative hypothesis, and then use empirical evidence to reject the null hypothesis to demonstrate indirect, probabilistic support for our alternative hypothesis.

A second problem with testing hypothesised relationships in social science research is that the dependent variable may be influenced by an infinite number of extraneous variables and it is not plausible to measure and control for all of these extraneous effects. Hence, even if two variables may seem to be related in an observed sample, they may not be truly related in the population, and therefore inferential statistics are never certain or deterministic, but always probabilistic.

\alpha

General linear model

Most inferential statistical procedures in social science research are derived from a general family of statistical models called the general linear model (GLM). A model is an estimated mathematical equation that can be used to represent a set of data, and linear refers to a straight line. Hence, a GLM is a system of equations that can be used to represent linear patterns of relationships in observed data.

Two-variable linear model

Two-group comparison

t

where the numerator is the difference in sample means between the treatment group (Group 1) and the control group (Group 2) and the denominator is the standard error of the difference between the two groups, which in turn, can be estimated as:

\[ s_{\overline{X}_{1}-\overline{X}_{2}} = \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}} }\,.\]

Factorial designs

2 \times 2

Other quantitative analysis

There are many other useful inferential statistical techniques—based on variations in the GLM—that are briefly mentioned here. Interested readers are referred to advanced textbooks or statistics courses for more information on these techniques:

Factor analysis is a data reduction technique that is used to statistically aggregate a large number of observed measures (items) into a smaller set of unobserved (latent) variables called factors based on their underlying bivariate correlation patterns. This technique is widely used for assessment of convergent and discriminant validity in multi-item measurement scales in social science research.

Discriminant analysis is a classificatory technique that aims to place a given observation in one of several nominal categories based on a linear combination of predictor variables. The technique is similar to multiple regression, except that the dependent variable is nominal. It is popular in marketing applications, such as for classifying customers or products into categories based on salient attributes as identified from large-scale surveys.

Logistic regression (or logit model) is a GLM in which the outcome variable is binary (0 or 1) and is presumed to follow a logistic distribution, and the goal of the regression analysis is to predict the probability of the successful outcome by fitting data into a logistic curve. An example is predicting the probability of heart attack within a specific period, based on predictors such as age, body mass index, exercise regimen, and so forth. Logistic regression is extremely popular in the medical sciences. Effect size estimation is based on an ‘odds ratio’, representing the odds of an event occurring in one group versus the other.

Probit regression (or probit model) is a GLM in which the outcome variable can vary between 0 and 1—or can assume discrete values 0 and 1—and is presumed to follow a standard normal distribution, and the goal of the regression is to predict the probability of each outcome. This is a popular technique for predictive analysis in the actuarial science, financial services, insurance, and other industries for applications such as credit scoring based on a person’s credit rating, salary, debt and other information from their loan application. Probit and logit regression tend to demonstrate similar regression coefficients in comparable applications (binary outcomes), however the logit model is easier to compute and interpret.

Path analysis is a multivariate GLM technique for analysing directional relationships among a set of variables. It allows for examination of complex nomological models where the dependent variable in one equation is the independent variable in another equation, and is widely used in contemporary social science research.

Time series analysis is a technique for analysing time series data, or variables that continually changes with time. Examples of applications include forecasting stock market fluctuations and urban crime rates. This technique is popular in econometrics, mathematical finance, and signal processing. Special techniques are used to correct for autocorrelation, or correlation within values of the same variable across time.

Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Basic Inferential Statistics: Theory and Application

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

The heart of statistics is inferential statistics. Descriptive statistics are typically straightforward and easy to interpret. Unlike descriptive statistics, inferential statistics are often complex and may have several different interpretations.

The goal of inferential statistics is to discover some property or general pattern about a large group by studying a smaller group of people in the hopes that the results will generalize to the larger group. For example, we may ask residents of New York City their opinion about their mayor. We would probably poll a few thousand individuals in New York City in an attempt to find out how the city as a whole views their mayor. The following section examines how this is done.

A population is the entire group of people you would like to know something about. In our previous example of New York City, the population is all of the people living in New York City. It should not include people from England, visitors in New York, or even people who know a lot about New York City.

A sample is a subset of the population. Just like you may sample different types of ice cream at the grocery store, a sample of a population should be just a smaller version of the population.

It is extremely important to understand how the sample being studied was drawn from the population. The sample should be as representative of the population as possible. There are several valid ways of creating a sample from a population, but inferential statistics works best when the sample is drawn at random from the population. Given a large enough sample, drawing at random ensures a fair and representative sample of a population.

Comparing two or more groups

Much of statistics, especially in medicine and psychology, is used to compare two or more groups and attempts to figure out if the two groups are different from one another.

Example: Drug X

Let us say that a drug company has developed a pill, which they think increases the recovery time from the common cold. How would they actually find out if the pill works or not? What they might do is get two groups of people from the same population (say, people from a small town in Indiana who had just caught a cold) and administer the pill to one group, and give the other group a placebo. They could then measure how many days each group took to recover (typically, one would calculate the mean of each group). Let's say that the mean recovery time for the group with the new drug was 5.4 days, and the mean recovery time for the group with the placebo was 5.8 days.

The question becomes, is this difference due to random chance, or does taking the pill actually help you recover from the cold faster? The means of the two groups alone does not help us determine the answer to this question. We need additional information.

Sample Size

If our example study only consisted of two people (one from the drug group and one from the placebo group) there would be so few participants that we would not have much confidence that there is a difference between the two groups. That is to say, there is a high probability that chance explains our results (any number of explanations might account for this, for example, one person might be younger, and thus have a better immune system). However, if our sample consisted of 1,000 people in each group, then the results become much more robust (while it might be easy to say that one person is younger than another, it is hard to say that 1,000 random people are younger than another 1,000 random people). If the sample is drawn at random from the population, then these 'random' variations in participants should be approximately equal in the two groups, given that the two groups are large. This is why inferential statistics works best when there are lots of people involved.

Be wary of statistics that have small sample sizes, unless they are in a peer-reviewed journal. Professional statisticians can interpret results correctly from small sample sizes, and often do, but not everyone is a professional, and novice statisticians often incorrectly interpret results. Also, if your author has an agenda, they may knowingly misinterpret results. If your author does not give a sample size, then he or she is probably not a professional, and you should be wary of the results. Sample sizes are required information in almost all peer-reviewed journals, and therefore, should be included in anything you write as well.

Variability

Even if we have a large enough sample size, we still need more information to reach a conclusion. What we need is some measure of variability. We know that the typical person takes about 5-6 days to recover from a cold, but does everyone recover around 5-6 days, or do some people recover in 1 day, and others recover in 10 days? Understanding the spread of the data will tell us how effective the pill is. If everyone in the placebo group takes exactly 5.8 days to recover, then it is clear that the pill has a positive effect, but if people have a wide variability in their length of recovery (and they probably do) then the picture becomes a little fuzzy. Only when the mean, sample size, and variability have been calculated can a proper conclusion be made. In our case, if the sample size is large, and the variability is small, then we would receive a small p-value (probability-value). Small p-values are good, and this term is prominent enough to warrant further discussion.

In classic inferential statistics, we make two hypotheses before we start our study, the null hypothesis, and the alternative hypothesis.

Null Hypothesis: States that the two groups we are studying are the same.

Alternative Hypothesis: States that the two groups we are studying are different.

The goal in classic inferential statistics is to prove the null hypothesis wrong. The logic says that if the two groups aren't the same, then they must be different. A low p-value indicates a low probability that the null hypothesis is correct (thus, providing evidence for the alternative hypothesis).

Remember: It's good to have low p-values.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Social Sci LibreTexts

1.15: Chapter 15 Quantitative Analysis Inferential Statistics

  • Last updated
  • Save as PDF
  • Page ID 84791

  • William Pelz
  • Herkimer College via Lumen Learning

Inferential statistics are the statistical procedures that are used to reach conclusions about associations between variables. They differ from descriptive statistics in that they are explicitly designed to test hypotheses. Numerous statistical procedures fall in this category, most of which are supported by modern statistical software such as SPSS and SAS. This chapter provides a short primer on only the most basic and frequent procedures; readers are advised to consult a formal text on statistics or take a course on statistics for more advanced procedures.

Basic Concepts

British philosopher Karl Popper said that theories can never be proven, only disproven. As an example, how can we prove that the sun will rise tomorrow? Popper said that just because the sun has risen every single day that we can remember does not necessarily mean that it will rise tomorrow, because inductively derived theories are only conjectures that may or may not be predictive of future phenomenon. Instead, he suggested that we may assume a theory that the sun will rise every day without necessarily proving it, and if the sun does not rise on a certain day, the theory is falsified and rejected. Likewise, we can only reject hypotheses based on contrary evidence but can never truly accept them because presence of evidence does not mean that we may not observe contrary evidence later. Because we cannot truly accept a hypothesis of interest (alternative hypothesis), we formulate a null hypothesis as the opposite of the alternative hypothesis, and then use empirical evidence to reject the null hypothesis to demonstrate indirect, probabilistic support for our alternative hypothesis.

A second problem with testing hypothesized relationships in social science research is that the dependent variable may be influenced by an infinite number of extraneous variables and it is not plausible to measure and control for all of these extraneous effects. Hence, even if two variables may seem to be related in an observed sample, they may not be truly related in the population, and therefore inferential statistics are never certain or deterministic, but always probabilistic.

How do we know whether a relationship between two variables in an observed sample is significant, and not a matter of chance? Sir Ronald A. Fisher, one of the most prominent statisticians in history, established the basic guidelines for significance testing. He said that a statistical result may be considered significant if it can be shown that the probability of it being rejected due to chance is 5% or less. In inferential statistics, this probability is called the p-value , 5% is called the significance level (α), and the desired relationship between the p-value and α is denoted as: p≤0.05. The significance level is the maximum level of risk that we are willing to accept as the price of our inference from the sample to the population. If the p-value is less than 0.05 or 5%, it means that we have a 5% chance of being incorrect in rejecting the null hypothesis or having a Type I error. If p>0.05, we do not have enough evidence to reject the null hypothesis or accept the alternative hypothesis.

We must also understand three related statistical concepts: sampling distribution, standard error, and confidence interval. A sampling distribution is the theoretical distribution of an infinite number of samples from the population of interest in your study. However, because a sample is never identical to the population, every sample always has some inherent level of error, called the standard error . If this standard error is small, then statistical estimates derived from the sample (such as sample mean) are reasonably good estimates of the population. The precision of our sample estimates is defined in terms of a confidence interval (CI). A 95% CI is defined as a range of plus or minus two standard deviations of the mean estimate, as derived from different samples in a sampling distribution. Hence, when we say that our observed sample estimate has a CI of 95%, what we mean is that we are confident that 95% of the time, the population parameter is within two standard deviations of our observed sample estimate. Jointly, the p-value and the CI give us a good idea of the probability of our result and how close it is from the corresponding population parameter.

General Linear Model

Most inferential statistical procedures in social science research are derived from a general family of statistical models called the general linear model (GLM). A model is an estimated mathematical equation that can be used to represent a set of data, and linear refers to a straight line. Hence, a GLM is a system of equations that can be used to represent linear patterns of relationships in observed data.

image54.jpg

Pritha Bhandari

Other students also liked, descriptive statistics | definitions, types, examples, understanding confidence intervals | easy examples & formulas, how to calculate variance | calculator, analysis & examples.

Popular searches

  • How to Get Participants For Your Study
  • How to Do Segmentation?
  • Conjoint Preference Share Simulator
  • MaxDiff Analysis
  • Likert Scales
  • Reliability & Validity

Request consultation

Do you need support in running a pricing or product study? We can help you with agile consumer research and conjoint analysis.

Looking for an online survey platform?

Conjointly offers a great survey tool with multiple question types, randomisation blocks, and multilingual support. The Basic tier is always free.

Research Methods Knowledge Base

  • Navigating the Knowledge Base
  • Foundations
  • Measurement
  • Research Design
  • Conclusion Validity
  • Data Preparation
  • Descriptive Statistics
  • Dummy Variables
  • General Linear Model
  • Posttest-Only Analysis
  • Factorial Design Analysis
  • Randomized Block Analysis
  • Analysis of Covariance
  • Nonequivalent Groups Analysis
  • Regression-Discontinuity Analysis
  • Regression Point Displacement
  • Table of Contents

Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of surveys.

Completely free for academics and students .

Inferential Statistics

With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what’s going on in our data.

Here, I concentrate on inferential statistics that are useful in experimental and quasi-experimental research design or in program outcome evaluation. Perhaps one of the simplest inferential test is used when you want to compare the average performance of two groups on a single measure to see if there is a difference. You might want to know whether eighth-grade boys and girls differ in math test scores or whether a program group differs on the outcome measure from a control group. Whenever you wish to compare the average performance between two groups you should consider the t-test for differences between groups .

Most of the major inferential statistics come from a general family of statistical models known as the General Linear Model . This includes the t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), regression analysis, and many of the multivariate methods like factor analysis, multidimensional scaling, cluster analysis, discriminant function analysis, and so on. Given the importance of the General Linear Model, it’s a good idea for any serious social researcher to become familiar with its workings. The discussion of the General Linear Model here is very elementary and only considers the simplest straight-line model. However, it will get you familiar with the idea of the linear model and help prepare you for the more complex analyses described below.

One of the keys to understanding how groups are compared is embodied in the notion of the “dummy” variable. The name doesn’t suggest that we are using variables that aren’t very smart or, even worse, that the analyst who uses them is a “dummy”! Perhaps these variables would be better described as “proxy” variables. Essentially a dummy variable is one that uses discrete numbers, usually 0 and 1, to represent different groups in your study. Dummy variables are a simple idea that enable some pretty complicated things to happen. For instance, by including a simple dummy variable in an model, I can model two separate lines (one for each treatment group) with a single equation. To see how this works, check out the discussion on dummy variables .

One of the most important analyses in program outcome evaluations involves comparing the program and non-program group on the outcome variable or variables. How we do this depends on the research design we use. research designs are divided into two major types of designs : experimental and quasi-experimental . Because the analyses differ for each, they are presented separately.

Experimental Analysis

The simple two-group posttest-only randomized experiment is usually analyzed with the simple t-test or one-way ANOVA . The factorial experimental designs are usually analyzed with the Analysis of Variance (ANOVA) Model . Randomized Block Designs use a special form of ANOVA blocking model that uses dummy-coded variables to represent the blocks. The Analysis of Covariance Experimental Design uses, not surprisingly, the Analysis of Covariance statistical model .

Quasi-Experimental Analysis

The quasi-experimental designs differ from the experimental ones in that they don’t use random assignment to assign units (e.g. people) to program groups. The lack of random assignment in these designs tends to complicate their analysis considerably. For example, to analyze the Nonequivalent Groups Design (NEGD) we have to adjust the pretest scores for measurement error in what is often called a Reliability-Corrected Analysis of Covariance model . In the Regression-Discontinuity Design , we need to be especially concerned about curvilinearity and model misspecification. Consequently, we tend to use a conservative analysis approach that is based on polynomial regression that starts by overfitting the likely true function and then reducing the model based on the results. The Regression Point Displacement Design has only a single treated unit. Nevertheless, the analysis of the RPD design is based directly on the traditional ANCOVA model.

When you’ve investigated these various analytic models, you’ll see that they all come from the same family – the General Linear Model . An understanding of that model will go a long way to introducing you to the intricacies of data analysis in applied and social research contexts.

Cookie Consent

Conjointly uses essential cookies to make our site work. We also use additional cookies in order to understand the usage of the site, gather audience analytics, and for remarketing purposes.

For more information on Conjointly's use of cookies, please read our Cookie Policy .

Which one are you?

I am new to conjointly, i am already using conjointly.

Data Analysis in Quantitative Research

  • Reference work entry
  • First Online: 13 January 2019
  • Cite this reference work entry

what is inferential data analysis in research

  • Yong Moon Jung 2  

1780 Accesses

2 Citations

Quantitative data analysis serves as part of an essential process of evidence-making in health and social sciences. It is adopted for any types of research question and design whether it is descriptive, explanatory, or causal. However, compared with qualitative counterpart, quantitative data analysis has less flexibility. Conducting quantitative data analysis requires a prerequisite understanding of the statistical knowledge and skills. It also requires rigor in the choice of appropriate analysis model and the interpretation of the analysis outcomes. Basically, the choice of appropriate analysis techniques is determined by the type of research question and the nature of the data. In addition, different analysis techniques require different assumptions of data. This chapter provides introductory guides for readers to assist them with their informed decision-making in choosing the correct analysis models. To this end, it begins with discussion of the levels of measure: nominal, ordinal, and scale. Some commonly used analysis techniques in univariate, bivariate, and multivariate data analysis are presented for practical examples. Example analysis outcomes are produced by the use of SPSS (Statistical Package for Social Sciences).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Armstrong JS. Significance tests harm progress in forecasting. Int J Forecast. 2007;23(2):321–7.

Article   Google Scholar  

Babbie E. The practice of social research. 14th ed. Belmont: Cengage Learning; 2016.

Google Scholar  

Brockopp DY, Hastings-Tolsma MT. Fundamentals of nursing research. Boston: Jones & Bartlett; 2003.

Creswell JW. Research design: qualitative, quantitative, and mixed methods approaches. Thousand Oaks: Sage; 2014.

Fawcett J. The relationship of theory and research. Philadelphia: F. A. Davis; 1999.

Field A. Discovering statistics using IBM SPSS statistics. London: Sage; 2013.

Grove SK, Gray JR, Burns N. Understanding nursing research: building an evidence-based practice. 6th ed. St. Louis: Elsevier Saunders; 2015.

Hair JF, Black WC, Babin BJ, Anderson RE, Tatham RD. Multivariate data analysis. Upper Saddle River: Pearson Prentice Hall; 2006.

Katz MH. Multivariable analysis: a practical guide for clinicians. Cambridge: Cambridge University Press; 2006.

Book   Google Scholar  

McHugh ML. Scientific inquiry. J Specialists Pediatr Nurs. 2007; 8 (1):35–7. Volume 8, Issue 1, Version of Record online: 22 FEB 2007

Pallant J. SPSS survival manual: a step by step guide to data analysis using IBM SPSS. Sydney: Allen & Unwin; 2016.

Polit DF, Beck CT. Nursing research: principles and methods. Philadelphia: Lippincott Williams & Wilkins; 2004.

Trochim WMK, Donnelly JP. Research methods knowledge base. 3rd ed. Mason: Thomson Custom Publishing; 2007.

Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics. Boston: Pearson Education.

Wells CS, Hin JM. Dealing with assumptions underlying statistical tests. Psychol Sch. 2007;44(5):495–502.

Download references

Author information

Authors and affiliations.

Centre for Business and Social Innovation, University of Technology Sydney, Ultimo, NSW, Australia

Yong Moon Jung

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Yong Moon Jung .

Editor information

Editors and affiliations.

School of Science and Health, Western Sydney University, Penrith, NSW, Australia

Pranee Liamputtong

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this entry

Cite this entry.

Jung, Y.M. (2019). Data Analysis in Quantitative Research. In: Liamputtong, P. (eds) Handbook of Research Methods in Health Social Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-10-5251-4_109

Download citation

DOI : https://doi.org/10.1007/978-981-10-5251-4_109

Published : 13 January 2019

Publisher Name : Springer, Singapore

Print ISBN : 978-981-10-5250-7

Online ISBN : 978-981-10-5251-4

eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Come to the Right Conclusion with Inferential Analysis

March 23, 2020

by Mara Calvello

inferential analysis@2x-1

In this post

Descriptive analysis vs. inferential analysis, linear regression analysis, correlation analysis, analysis of variance, analysis of covariance, confidence interval, chi-square test, advantages of inferential analysis, limitations of inferential analysis.

We’re all guilty of jumping to conclusions from time to time.

Whether it's convincing yourself that no one is going to buy a ticket for the conference you’ve worked so hard to plan or that arriving at the airport two hours in advance simply isn’t enough time, we’ve all done it.

Outside of our daily lives, it’s easy to jump to inaccurate conclusions at work, no matter the industry. When we do this, we’re essentially generalizing, but what if you could make these generalizations more accurately? It’s possible when you run inferential analysis tests.

What is inferential analysis?

Inferential analysis is used to draw and measure the reliability of conclusions about a population that is based on information gathered from a sample of the population. Since inferential analysis doesn’t sample everyone in a population, the results will always contain some level of uncertainty.

When diving into statistical analysis , oftentimes the size of the population we’re looking to analyze is too large, making it impossible to study everyone. In these cases, data is collected using random samples of individuals within a specific population. Then, inferential analysis is used on the data to come to conclusions about the overall population.

Because it’s often impossible to measure an entire population of people, inferential analysis relies on gathering data from a sample of individuals within the population. Essentially, inferential analysis is used to try to infer from a sample of data what the population might think or show.

There are two main ways of going about this:

  • Estimating parameters: Taking a statistic from a data sample (like the sample mean) and using it to conclude something about the population (the population mean).
  • Hypothesis tests: The use of data samples to answer specific research questions.

In estimating parameters, the sample is used to estimate a value that describes the entire population, in addition to a confidence interval. Then, the estimate is created.

In hypothesis testing, data is used to determine if it is strong enough to support or reject an assumption.

The two main types of statistical analysis that people use most often are descriptive analysis and inferential analysis. Because of this, it’s not uncommon for the two to be confused for each other, even though they provide data analysts with different insights into the data that is collected.

While one can’t show the whole picture, when used together, they provide a powerful tool into data visualization and prediction analytics , since they rely on the same set of data.

Descriptive statistical analysis gives information that describes the data in some way. This is sometimes done with charts and graphs made with data visualization software to explain what the data presents. This method of statistical analysis isn’t used to draw conclusions, only to summarize the information.

Inferential statistical analysis is the method that will be used to draw the conclusions. It allows users to infer or conclude trends about a larger population based on the samples that are analyzed. Basically, it takes data from a sample and then makes conclusions about a larger population or group.

This type of statistical analysis is often used to study the relationship between variables within a sample, allowing for conclusions and generalizations that accurately represent the population. And unlike descriptive analysis, businesses can test a hypothesis and come up with various conclusions from this data.

Descriptive analysis vs inferential analysis

Let’s think of it this way. You’re at a baseball game and ask a sample of 100 fans if they like hotdogs. You could make a bar graph of yes or no answers, which would be descriptive analysis. Or you could use your research to conclude that 93% of the population (all baseball fans in all the baseball stadiums) like hotdogs, which would be inferential analysis.

Types of inferential analysis tests

There are many types of inferential analysis tests that are in the statistics field. Which one you choose to use will depend on your sample size, hypothesis you’re trying to solve, and the size of the population being tested.

Linear regression analysis is used to understand the relationship between two variables (X and Y) in a data set as a way to estimate the unknown variable to make future projections on events and goals.

The main objective of regression analysis is to estimate the values of a random variable (Z) based on the values of your known (or fixed) variables (X and Y). This is typically represented by a scatter plot, like the one below.

Linear regression analysis

One key advantage of using regression within your analysis is that it provides a detailed look at data and includes an equation that can be used for predictive analytics and optimizing data in the future.

The formula for regression analysis is:

Y = a + b(x)

A → refers to the y-intercept, the value of y when x = 0

B → refers to the slope, or rise over run

Another inferential analysis test is correlation analysis, which is used to understand the extent to which two variables are dependent on one another. This analysis essentially tests the strength of the relationship between two variables, and if their correlation is strong or weak.

The correlation between two variables can also be negative or positive, depending on the variables. Variables are considered “uncorrelated” when a change in one does not affect the other.

An example of this would be price and demand. This is because an increase in demand causes a corresponding increase in price. The price would increase because more consumers want something and are willing to pay more for it.

Overall, the objective of correlation analysis is to find the numerical value that shows the relationship between the two variables and how they move together. Like regression, this is typically done by utilizing data visualization software to create a graph.

Correlation analysis

Related: Learn more about the ins and outs of correlations vs regression , including the differences and which method your business should be using.

The analysis of variance (ANOVA) statistical method is used to test and analyze the differences between two or more means from a data set. This is done by examining the amount of variation between the samples.

In simplest terms, ANOVA provides a statistical test of whether two or more population means are equal, in addition to generalizing the t-test between two means.

Learn more: A t-test is used to show how significant the differences between two groups are. Essentially, it allows for the understanding of if differences (measured in means/averages) could have happened by chance.

This method will allow for the testing of groups to see if there’s a difference between them. For example, you may test students at two different high schools who take the same exam to see if one high school tests higher than the other.

ANOVA can also be broken down into two types:

  • One-way: Only one independent variable with two levels. An example would be a brand of peanut butter.
  • Two-way: Two independent variables that can have multiple levels. An example would be a brand of peanut butter and the calories.

A level is simply the different groups within the variable. So, using the same example as above, the levels of brands of peanut butter might be Jif, Skippy, or Peter Pan. The levels for calories could be smooth, creamy, or organic.

Analysis of covariance (ANCOVA) is a unique blend of analysis of variance (ANOVA) and regression. ANCOVA can show what additional information is available when considering one independent variable, or factor, at a time, without influencing others.

It is often used:

  • For an extension of multiple regression as a way to compare multiple regression lines
  • To control covariates (other variables) that aren’t the main focus of your study
  • For an extension of the analysis of variance
  • To study combinations of other variables of interest
  • To control for factors that cannot be randomized but that can be measured

ANCOVA can also be used to pretest or posttest an analysis when regression to the mean will affect your posttest measurement of the statistic.

As an example, let’s say your business creates new pharmaceuticals for the public that lowers blood pressure. You may conduct a study that monitors four treatment groups and one control group.

If you use ANOVA, you’ll be able to tell if the treatment does, in fact, lower blood pressure. When you incorporate ANCOVA, you can control other factors that might influence the outcome, like family life, occupation, or other prescription drug use.

A confidence interval is a tool that is used in inferential analysis that estimates a parameter, usually the mean, of an entire population. Essentially, it’s how much uncertainty there is with any particular statistic and is typically used with a margin of error.

The confidence interval is expressed with a number that reflects how sure you are that the results of the survey or poll are what you’d expect if it were possible to survey the entire population.

For instance, if the results of a poll or survey have a 98% confidence interval, then this defines the range of values that you can be 98% certain contains the population mean. To come to this conclusion, three pieces of information are needed:

  • Confidence level : Describes the uncertainty associated with a sampling method
  • Statistic: Data collected from the survey or poll
  • Margin of error : How many percentage points your results will differ from the real population value

A chi-square test, otherwise known as an x2 test, is used to identify the difference between groups when all of the variables are nominal (also known as, a variable with values that don’t have a numerical value), like gender, salary gap, political affiliation, and so on.

These tests are typically used with specific contingency tables that group observations based on common characteristics.

Questions that the chi-square test could answer might be:

  • Are education level and marital status related for all people in the United States?
  • Is there a relationship between voter intent and political party membership?
  • Does gender affect which holiday people favor?

Usually, these tests are done using the statistical analysis method called simple random sampling to collect data from a specific sample to potentially come to an accurate conclusion. If we use the first question listed above, the data may look like:

These contingency tables are used as a starting point to organize the data collected through simple random sampling.

There are many advantages to using inferential analysis, mainly that it provides a surplus of detailed information – much more than you’d have after running a descriptive analysis test.

This information provides researchers and analysts with comprehensive insights into relationships between two variables. It can also show awareness toward cause and effect and predictions regarding trends and patterns throughout industries.

Plus, since it is so widely used in the business world as well as academia, it’s a universally accepted method of statistical analysis.

When it comes to inferential statistics, there are two main limitations.

The first limitation comes from the fact that since the data being analyzed is from a population that hasn’t been fully measured, data analysts can’t ever be 100% sure that the statistics being calculated are correct. Since inferential analysis is based on the process of using values measured in a sample to conclude the values that would be measured from the total population, there will always be some level of uncertainty regarding the results.

The second limitation is that some inferential tests require the analyst or researcher to make an educated guess based on theories to run the tests. Similar to the first limitation, there will be uncertainty surrounding these guesses, which will also mean some repercussions on the reliability of the results of some statistical tests.

Don’t jump to conclusions

Before you jump to a potentially inaccurate conclusion regarding data, make sure to take advantage of the information that awaits within an inferential analysis test.

No matter the type of conclusion you’re looking to come to, or the hypothesis you start with, you may be surprised by the results an inferential analysis test can bring.

Looking for statistical analysis software to better interpret all of your data sets? Or maybe a tool that makes even the most complex statistical analysis simple and conclusive? Check out our list of unbiased reviews on G2!

Mara Calvello photo

Mara Calvello is a Content Marketing Manager at G2. She graduated with a Bachelor of Arts from Elmhurst College (now Elmhurst University). Mara's expertise lies within writing for HR, Design, SaaS Management, Social Media, and Technology categories. In her spare time, Mara is either at the gym, exploring the great outdoors with her rescue dog Zeke, enjoying Italian food, or right in the middle of a Harry Potter binge.

Recommended Articles

what is inferential data analysis in research

Data Mining Techniques You Need to Unlock Quality Insights

In today's rapidly growing technological workspace, businesses have more data than ever before.

what is inferential data analysis in research

Cohort Analysis: An Insider Look at Your Customer's Behavior

Whether you do it subconsciously or on purpose, it’s human nature to put things into groups.

what is inferential data analysis in research

8 Customer Data Analysis Best Practices You Need to Know

Customer data analysis is all about getting a better understanding of who your customer is at...

by Mike Rossi

Never miss a post.

Subscribe to keep your fingers on the tech pulse.

By submitting this form, you are agreeing to receive marketing communications from G2.

Descriptive and Inferential Statistics

When analysing data, such as the marks achieved by 100 students for a piece of coursework, it is possible to use both descriptive and inferential statistics in your analysis of their marks. Typically, in most research conducted on groups of people, you will use both descriptive and inferential statistics to analyse your results and draw conclusions. So what are descriptive and inferential statistics? And what are their differences?

Descriptive Statistics

Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data.

Descriptive statistics are very important because if we simply presented our raw data it would be hard to visualize what the data was showing, especially if there was a lot of it. Descriptive statistics therefore enables us to present the data in a more meaningful way, which allows simpler interpretation of the data. For example, if we had the results of 100 pieces of students' coursework, we may be interested in the overall performance of those students. We would also be interested in the distribution or spread of the marks. Descriptive statistics allow us to do this. How to properly describe data through statistics and graphs is an important topic and discussed in other Laerd Statistics guides. Typically, there are two general types of statistic that are used to describe data:

  • Measures of central tendency: these are ways of describing the central position of a frequency distribution for a group of data. In this case, the frequency distribution is simply the distribution and pattern of marks scored by the 100 students from the lowest to the highest. We can describe this central position using a number of statistics, including the mode, median, and mean. You can learn more in our guide: Measures of Central Tendency .
  • Measures of spread: these are ways of summarizing a group of data by describing how spread out the scores are. For example, the mean score of our 100 students may be 65 out of 100. However, not all students will have scored 65 marks. Rather, their scores will be spread out. Some will be lower and others higher. Measures of spread help us to summarize how spread out these scores are. To describe this spread, a number of statistics are available to us, including the range, quartiles, absolute deviation, variance and standard deviation .

When we use descriptive statistics it is useful to summarize our group of data using a combination of tabulated description (i.e., tables), graphical description (i.e., graphs and charts) and statistical commentary (i.e., a discussion of the results).

Inferential Statistics

We have seen that descriptive statistics provide information about our immediate group of data. For example, we could calculate the mean and standard deviation of the exam marks for the 100 students and this could provide valuable information about this group of 100 students. Any group of data like this, which includes all the data you are interested in, is called a population . A population can be small or large, as long as it includes all the data you are interested in. For example, if you were only interested in the exam marks of 100 students, the 100 students would represent your population. Descriptive statistics are applied to populations, and the properties of populations, like the mean or standard deviation, are called parameters as they represent the whole population (i.e., everybody you are interested in).

Often, however, you do not have access to the whole population you are interested in investigating, but only a limited number of data instead. For example, you might be interested in the exam marks of all students in the UK. It is not feasible to measure all exam marks of all students in the whole of the UK so you have to measure a smaller sample of students (e.g., 100 students), which are used to represent the larger population of all UK students. Properties of samples, such as the mean or standard deviation, are not called parameters, but statistics . Inferential statistics are techniques that allow us to use these samples to make generalizations about the populations from which the samples were drawn. It is, therefore, important that the sample accurately represents the population. The process of achieving this is called sampling (sampling strategies are discussed in detail in the section, Sampling Strategy , on our sister site). Inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a sample is not expected to perfectly represent the population. The methods of inferential statistics are (1) the estimation of parameter(s) and (2) testing of statistical hypotheses .

We have provided some answers to common FAQs on the next page . Alternatively, why not now read our guide on Types of Variable?

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

what is inferential data analysis in research

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

employee engagement software

Top 20 Employee Engagement Software Solutions

May 3, 2024

customer experience software

15 Best Customer Experience Software of 2024

May 2, 2024

Journey Orchestration Platforms

Journey Orchestration Platforms: Top 11 Platforms in 2024

employee pulse survey tools

Top 12 Employee Pulse Survey Tools Unlocking Insights in 2024

May 1, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Chapter 15 Quantitative Analysis Inferential Statistics

Inferential statistics are the statistical procedures that are used to reach conclusions about associations between variables. They differ from descriptive statistics in that they are explicitly designed to test hypotheses. Numerous statistical procedures fall in this category, most of which are supported by modern statistical software such as SPSS and SAS. This chapter provides a short primer on only the most basic and frequent procedures; readers are advised to consult a formal text on statistics or take a course on statistics for more advanced procedures.

Basic Concepts

British philosopher Karl Popper said that theories can never be proven, only disproven. As an example, how can we prove that the sun will rise tomorrow? Popper said that just because the sun has risen every single day that we can remember does not necessarily mean that it will rise tomorrow, because inductively derived theories are only conjectures that may or may not be predictive of future phenomenon. Instead, he suggested that we may assume a theory that the sun will rise every day without necessarily proving it, and if the sun does not rise on a certain day, the theory is falsified and rejected. Likewise, we can only reject hypotheses based on contrary evidence but can never truly accept them because presence of evidence does not mean that we may not observe contrary evidence later. Because we cannot truly accept a hypothesis of interest (alternative hypothesis), we formulate a null hypothesis as the opposite of the alternative hypothesis, and then use empirical evidence to reject the null hypothesis to demonstrate indirect, probabilistic support for our alternative hypothesis.

A second problem with testing hypothesized relationships in social science research is that the dependent variable may be influenced by an infinite number of extraneous variables and it is not plausible to measure and control for all of these extraneous effects. Hence, even if two variables may seem to be related in an observed sample, they may not be truly related in the population, and therefore inferential statistics are never certain or deterministic, but always probabilistic.

How do we know whether a relationship between two variables in an observed sample is significant, and not a matter of chance? Sir Ronald A. Fisher, one of the most prominent statisticians in history, established the basic guidelines for significance testing. He said that a statistical result may be considered significant if it can be shown that the probability of it being rejected due to chance is 5% or less. In inferential statistics, this probability is called the p-value , 5% is called the significance level (α), and the desired relationship between the p-value and α is denoted as: p≤0.05. The significance level is the maximum level of risk that we are willing to accept as the price of our inference from the sample to the population. If the p-value is less than 0.05 or 5%, it means that we have a 5% chance of being incorrect in rejecting the null hypothesis or having a Type I error. If p>0.05, we do not have enough evidence to reject the null hypothesis or accept the alternative hypothesis.

We must also understand three related statistical concepts: sampling distribution, standard error, and confidence interval. A sampling distribution is the theoretical distribution of an infinite number of samples from the population of interest in your study. However, because a sample is never identical to the population, every sample always has some inherent level of error, called the standard error . If this standard error is small, then statistical estimates derived from the sample (such as sample mean) are reasonably good estimates of the population. The precision of our sample estimates is defined in terms of a confidence interval (CI). A 95% CI is defined as a range of plus or minus two standard deviations of the mean estimate, as derived from different samples in a sampling distribution. Hence, when we say that our observed sample estimate has a CI of 95%, what we mean is that we are confident that 95% of the time, the population parameter is within two standard deviations of our observed sample estimate. Jointly, the p-value and the CI give us a good idea of the probability of our result and how close it is from the corresponding population parameter.

General Linear Model

Most inferential statistical procedures in social science research are derived from a general family of statistical models called the general linear model (GLM). A model is an estimated mathematical equation that can be used to represent a set of data, and linear refers to a straight line. Hence, a GLM is a system of equations that can be used to represent linear patterns of relationships in observed data.

what is inferential data analysis in research

Figure 15.1. Two-variable linear model.

The simplest type of GLM is a two-variable linear model that examines the relationship between one independent variable (the cause or predictor) and one dependent variable (the effect or outcome). Let us assume that these two variables are age and self-esteem respectively. The bivariate scatterplot for this relationship is shown in Figure 15.1, with age (predictor) along the horizontal or x-axis and self-esteem (outcome) along the vertical or y-axis. From the scatterplot, it appears that individual observations representing combinations of age and self-esteem generally seem to be scattered around an imaginary upward sloping straight line. We can estimate parameters of this line, such as its slope and intercept from the GLM. From high-school algebra, recall that straight lines can be represented using the mathematical equation y = mx + c, where m is the slope of the straight line (how much does y change for unit change in x) and c is the intercept term (what is the value of y when x is zero). In GLM, this equation is represented formally as:

y = β 0 + β 1 x + ε

where β 0 is the slope, β 1 is the intercept term, and ε is the error term . ε represents the deviation of actual observations from their estimated values, since most observations are close to the line but do not fall exactly on the line (i.e., the GLM is not perfect). Note that a linear model can have more than two predictors. To visualize a linear model with two predictors, imagine a three-dimensional cube, with the outcome (y) along the vertical axis, and the two predictors (say, x 1 and x 2 ) along the two horizontal axes along the base of the cube. A line that describes the relationship between two or more variables is called a regression line, β 0 and β 1 (and other beta values) are called regression coefficients , and the process of estimating regression coefficients is called regression analysis . The GLM for regression analysis with n predictor variables is:

y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + … + β n x n + ε

In the above equation, predictor variables x i may represent independent variables or covariates (control variables). Covariates are variables that are not of theoretical interest but may have some impact on the dependent variable y and should be controlled, so that the residual effects of the independent variables of interest are detected more precisely. Covariates capture systematic errors in a regression equation while the error term (ε) captures random errors. Though most variables in the GLM tend to be interval or ratio-scaled, this does not have to be the case. Some predictor variables may even be nominal variables (e.g., gender: male or female), which are coded as dummy variables . These are variables that can assume one of only two possible values: 0 or 1 (in the gender example, “male” may be designated as 0 and “female” as 1 or vice versa). A set of n nominal variables is represented using n–1 dummy variables. For instance, industry sector, consisting of the agriculture, manufacturing, and service sectors, may be represented using a combination of two dummy variables (x 1 , x 2 ), with (0, 0) for agriculture, (0, 1) for manufacturing, and (1, 1) for service. It does not matter which level of a nominal variable is coded as 0 and which level as 1, because 0 and 1 values are treated as two distinct groups (such as treatment and control groups in an experimental design), rather than as numeric quantities, and the statistical parameters of each group are estimated separately.

The GLM is a very powerful statistical tool because it is not one single statistical method, but rather a family of methods that can be used to conduct sophisticated analysis with different types and quantities of predictor and outcome variables. If we have a dummy predictor variable, and we are comparing the effects of the two levels (0 and 1) of this dummy variable on the outcome variable, we are doing an analysis of variance (ANOVA). If we are doing ANOVA while controlling for the effects of one or more covariate, we have an analysis of covariance (ANCOVA). We can also have multiple outcome variables (e.g., y 1 , y 1 , … y n ), which are represented using a “system of equations” consisting of a different equation for each outcome variable (each with its own unique set of regression coefficients). If multiple outcome variables are modeled as being predicted by the same set of predictor variables, the resulting analysis is called multivariate regression . If we are doing ANOVA or ANCOVA analysis with multiple outcome variables, the resulting analysis is a multivariate ANOVA (MANOVA) or multivariate ANCOVA (MANCOVA) respectively. If we model the outcome in one regression equation as a predictor in another equation in an interrelated system of regression equations, then we have a very sophisticated type of analysis called structural equation modeling . The most important problem in GLM is model specification , i.e., how to specify a regression equation (or a system of equations) to best represent the phenomenon of interest. Model specification should be based on theoretical considerations about the phenomenon being studied, rather than what fits the observed data best. The role of data is in validating the model, and not in its specification.

Two-Group Comparison

One of the simplest inferential analyses is comparing the post-test outcomes of treatment and control group subjects in a randomized post-test only control group design, such as whether students enrolled to a special program in mathematics perform better than those in a traditional math curriculum. In this case, the predictor variable is a dummy variable (1=treatment group, 0=control group), and the outcome variable, performance, is ratio scaled (e.g., score of a math test following the special program). The analytic technique for this simple design is a one-way ANOVA (one-way because it involves only one predictor variable), and the statistical test used is called a Student’s t-test (or t-test, in short).

The t-test was introduced in 1908 by William Sealy Gosset, a chemist working for the Guiness Brewery in Dublin, Ireland to monitor the quality of stout – a dark beer popular with 19 th century porters in London. Because his employer did not want to reveal the fact that it was using statistics for quality control, Gosset published the test in Biometrika using his pen name “Student” (he was a student of Sir Ronald Fisher), and the test involved calculating the value of t, which was a letter used frequently by Fisher to denote the difference between two groups. Hence, the name Student’s t-test, although Student’s identity was known to fellow statisticians.

The t-test examines whether the means of two groups are statistically different from each other (non-directional or two-tailed test), or whether one group has a statistically larger (or smaller) mean than the other (directional or one-tailed test). In our example, if we wish to examine whether students in the special math curriculum perform better than those in traditional curriculum, we have a one-tailed test. This hypothesis can be stated as:

where μ 1 represents the mean population performance of students exposed to the special curriculum (treatment group) and μ 2 is the mean population performance of students with traditional curriculum (control group). Note that the null hypothesis is always the one with the “equal” sign, and the goal of all statistical significance tests is to reject the null hypothesis.

what is inferential data analysis in research

Hypothesis Testing

Hypothesis testing is a type of inferential statistics that is used to test assumptions and draw conclusions about the population from the available sample data. It involves setting up a null hypothesis and an alternative hypothesis followed by conducting a statistical test of significance. A conclusion is drawn based on the value of the test statistic, the critical value , and the confidence intervals . A hypothesis test can be left-tailed, right-tailed, and two-tailed. Given below are certain important hypothesis tests that are used in inferential statistics.

Z Test: A z test is used on data that follows a normal distribution and has a sample size greater than or equal to 30. It is used to test if the means of the sample and population are equal when the population variance is known. The right tailed hypothesis can be set up as follows:

Null Hypothesis: \(H_{0}\) : \(\mu = \mu_{0}\)

Alternate Hypothesis: \(H_{1}\) : \(\mu > \mu_{0}\)

Test Statistic: z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\overline{x}\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation and n is the sample size.

Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.

T Test: A t test is used when the data follows a student t distribution and the sample size is lesser than 30. It is used to compare the sample and population mean when the population variance is unknown. The hypothesis test for inferential statistics is given as follows:

Test Statistics: t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\)

Decision Criteria: If the t statistic > t critical value then reject the null hypothesis.

F Test: An f test is used to check if there is a difference between the variances of two samples or populations. The right tailed f hypothesis test can be set up as follows:

Null Hypothesis: \(H_{0}\) : \(\sigma_{1}^{2} = \sigma_{2}^{2}\)

Alternate Hypothesis: \(H_{1}\) : \(\sigma_{1}^{2} > \sigma_{2}^{2}\)

Test Statistic: f = \(\frac{\sigma_{1}^{2}}{\sigma_{2}^{2}}\), where \(\sigma_{1}^{2}\) is the variance of the first population and \(\sigma_{2}^{2}\) is the variance of the second population.

Decision Criteria: If the f test statistic > f test critical value then reject the null hypothesis.

Confidence Interval: A confidence interval helps in estimating the parameters of a population. For example, a 95% confidence interval indicates that if a test is conducted 100 times with new samples under the same conditions then the estimate can be expected to lie within the given interval 95 times. Furthermore, a confidence interval is also useful in calculating the critical value in hypothesis testing.

Apart from these tests, other tests used in inferential statistics are the ANOVA test, Wilcoxon signed-rank test, Mann-Whitney U test, Kruskal-Wallis H test, etc.

Regression Analysis

Regression analysis is used to quantify how one variable will change with respect to another variable. There are many types of regressions available such as simple linear, multiple linear, nominal, logistic, and ordinal regression. The most commonly used regression in inferential statistics is linear regression. Linear regression checks the effect of a unit change of the independent variable in the dependent variable. Some important formulas used in inferential statistics for regression analysis are as follows:

Regression Coefficients :

The straight line equation is given as y = \(\alpha\) + \(\beta x\), where \(\alpha\) and \(\beta\) are regression coefficients.

\(\beta = \frac{\sum_{1}^{n}\left ( x_{i}-\overline{x} \right )\left ( y_{i}-\overline{y} \right )}{\sum_{1}^{n}\left ( x_{i}-\overline{x} \right )^{2}}\)

\(\beta = r_{xy}\frac{\sigma_{y}}{\sigma_{x}}\)

\(\alpha = \overline{y}-\beta \overline{x}\)

Here, \(\overline{x}\) is the mean, and \(\sigma_{x}\) is the standard deviation of the first data set. Similarly, \(\overline{y}\) is the mean, and \(\sigma_{y}\) is the standard deviation of the second data set.

Inferential Statistics Examples

Inferential statistics is very useful and cost-effective as it can make inferences about the population without collecting the complete data. Some inferential statistics examples are given below:

  • Suppose the mean marks of 100 students in a particular country are known. Using this sample information the mean marks of students in the country can be approximated using inferential statistics.
  • Suppose a coach wants to find out how many average cartwheels sophomores at his college can do without stopping. A sample of a few students will be asked to perform cartwheels and the average will be calculated. Inferential statistics will use this data to make a conclusion regarding how many cartwheel sophomores can perform on average.

Inferential Statistics vs Descriptive Statistics

Descriptive and inferential statistics are used to describe data and make generalizations about the population from samples. The table given below lists the differences between inferential statistics and descriptive statistics.

Related Articles:

  • Probability and Statistics
  • Data Handling
  • Summary Statistics

Important Notes on Inferential Statistics

  • Inferential statistics makes use of analytical tools to draw statistical conclusions regarding the population data from a sample.
  • Hypothesis testing and regression analysis are the types of inferential statistics.
  • Sampling techniques are used in inferential statistics to determine representative samples of the entire population.
  • Z test, t-test, linear regression are the analytical tools used in inferential statistics.

Examples on Inferential Statistics

Example 1: After a new sales training is given to employees the average sale goes up to $150 (a sample of 25 employees was examined) with a standard deviation of $12. Before the training, the average sale was $100. Check if the training helped at \(\alpha\) = 0.05.

Solution: The t test in inferential statistics is used to solve this problem.

\(\overline{x}\) = 150, \(\mu\) = 100, s = 12, n = 25

\(H_{0}\) : \(\mu = 100\)

\(H_{1}\) : \(\mu > 100\)

t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\)

The degrees of freedom is given by 25 - 1 = 24

Using the t table at \(\alpha\) = 0.05, the critical value is T(0.05, 24) = 1.71

As 20.83 > 1.71 thus, the null hypothesis is rejected and it is concluded that the training helped in increasing the average sales.

Answer: Reject Null Hypothesis.

Example 2: A test was conducted with the variance = 108 and n = 8. Certain changes were made in the test and it was again conducted with variance = 72 and n = 6. At a 0.05 significance level was there any improvement in the test results?

Solution: The f test in inferential statistics will be used

\(H_{0}\) : \(s_{1}^{2} = s_{2}^{2}\)

\(H_{1}\) : \(s_{1}^{2} > s_{2}^{2}\)

\(n_{1}\) = 8, \(n_{2}\) = 6

\(df_{1}\) = 8 - 1 = 7

\(df_{2}\) = 6 - 1 = 5

\(s_{1}^{2}\) = 108, \(s_{2}^{2}\) = 72

The f test formula is given as follows:

F = \(\frac{s_{1}^{2}}{s_{2}^{2}}\) = 106 / 72

Now from the F table the critical value F(0.05, 7, 5) = 4.88

Inferential Statistics Example

As 4.88 < 1.5, thus, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest that the test results improved.

Answer: Fail to reject the null hypothesis.

Example 3: After a new sales training is given to employees the average sale goes up to $150 (a sample of 49 employees was examined). Before the training, the average sale was $100 with a standard deviation of $12. Check if the training helped at \(\alpha\) = 0.05.

Solution: This is similar to example 1. However, as the sample size is 49 and the population standard deviation is known, thus, the z test in inferential statistics is used.

\(\overline{x}\) = 150, \(\mu\) = 100, \(\sigma\) = 12, n = 49

t = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\)

From the z table at \(\alpha\) = 0.05, the critical value is 1.645.

As 29.2 > 1.645 thus, the null hypothesis is rejected and it is concluded that the training was useful in increasing the average sales.

Answer: Reject the null hypothesis.

go to slide go to slide go to slide

what is inferential data analysis in research

Book a Free Trial Class

FAQs on Inferential Statistics

What is the meaning of inferential statistics.

Inferential statistics is a field of statistics that uses several analytical tools to draw inferences and make generalizations about population data from sample data.

What are the Types of Inferential Statistics?

There are two main types of inferential statistics that use different methods to draw conclusions about the population data. These are regression analysis and hypothesis testing.

What are the Different Sampling Methods Used in Inferential Statistics?

It is necessary to choose the correct sample from the population so as to represent it accurately. Some important sampling strategies used in inferential statistics are simple random sampling, stratified sampling, cluster sampling, and systematic sampling.

What are the Different Types of Hypothesis Tests In Inferential Statistics?

The most frequently used hypothesis tests in inferential statistics are parametric tests such as z test, f test, ANOVA test , t test as well as certain non-parametric tests such as Wilcoxon signed-rank test.

What is Inferential Statistics Used For?

Inferential statistics is used for comparing the parameters of two or more samples and makes generalizations about the larger population based on these samples.

Is Z Score a Part of Inferential Statistics?

Yes, z score is a fundamental part of inferential statistics as it determines whether a sample is representative of its population or not. Furthermore, it is also indirectly used in the z test.

What is the Difference Between Descriptive and Inferential Statistics?

Descriptive statistics is used to describe the features of some known dataset whereas inferential statistics analyzes a sample in order to draw conclusions regarding the population.

IMAGES

  1. Basic Concepts of Inferential statistics

    what is inferential data analysis in research

  2. Inferential Statistics

    what is inferential data analysis in research

  3. Inferential Statistics

    what is inferential data analysis in research

  4. Descriptive and Inferential Statistics/Population and Sample/

    what is inferential data analysis in research

  5. Data Analysis 101: The types of analysis you can conduct

    what is inferential data analysis in research

  6. What Are Examples Of Inferential Statistics

    what is inferential data analysis in research

VIDEO

  1. Inferential data analysis

  2. Inferential analysis in STATA

  3. statistics in Descriptive and Inferential

  4. 1- Descriptive Statistics versus Inferential Statistics

  5. SPSS Workshop Part 3: Descriptive and inferential statistics full

  6. Social Work Research: Inferential Data Analysis (Part 2) (Chapter 22)

COMMENTS

  1. Inferential Statistics

    Example: Inferential statistics. You randomly select a sample of 11th graders in your state and collect data on their SAT scores and other characteristics. You can use inferential statistics to make estimates and test hypotheses about the whole population of 11th graders in the state based on your sample data.

  2. Inferential Statistics

    Inferential statistics is a branch of statistics that involves making predictions or inferences about a population based on a sample of data taken from that population. It is used to analyze the probabilities, assumptions, and outcomes of a hypothesis. The basic steps of inferential statistics typically involve the following:

  3. What Is Inferential Statistics? (Definition, Uses, Example)

    What Is Inferential Statistics? Inferential statistics is the practice of using sampled data to draw conclusions or make predictions about a larger sample data sample or population. Inferential statistics help us draw conclusions about how a hypothesis will play out or to determine a general parameter about a larger sample.

  4. Quant Analysis 101: Inferential Statistics

    Inferential stats allow you to assess whether patterns in your sample are likely to be present in your population. Some common inferential statistical tests include t-tests, ANOVA, chi-square, correlation and regression. Inferential statistics alone do not prove causation. To identify and measure causal relationships, you need a very specific ...

  5. Quantitative analysis: Inferential statistics

    Most inferential statistical procedures in social science research are derived from a general family of statistical models called the general linear model (GLM). A model is an estimated mathematical equation that can be used to represent a set of data, and linear refers to a straight line. Hence, a GLM is a system of equations that can be used ...

  6. Basic Inferential Statistics

    The goal in classic inferential statistics is to prove the null hypothesis wrong. The logic says that if the two groups aren't the same, then they must be different. A low p-value indicates a low probability that the null hypothesis is correct (thus, providing evidence for the alternative hypothesis).

  7. 1.15: Chapter 15 Quantitative Analysis Inferential Statistics

    In inferential statistics, this probability is called the p-value , 5% is called the significance level (α), and the desired relationship between the p-value and α is denoted as: p≤0.05. The significance level is the maximum level of risk that we are willing to accept as the price of our inference from the sample to the population.

  8. Inferential Statistics

    In inferential statistics, data are analyzed from a sample to make inferences (deductions) and generalize the results to the population. Many of the tests were created in political and social sciences, using polling data to understand and predict behaviors (hypothesis) within a population, for example, the results of an election.

  9. Inferential Statistics: Definition, Uses

    Inferential statistics use statistical models to help you compare your sample data to other samples or to previous research. Most research uses statistical models called the Generalized Linear model and include Student's t-tests, ANOVA (Analysis of Variance ), regression analysis and various other models that result in straight-line ...

  10. Inferential Statistics: Definition, Types + Examples

    Inferential statistics is an important part of the data unit of analysis and research because it lets us make predictions and draw conclusions about whole populations based on data from a small sample. It is a complicated and advanced field that requires careful thought about assumptions and data quality, but it can give important research ...

  11. Basics of statistics for primary care research

    Statistical analysis is a method of aggregating numeric data and drawing inferences about variables. Statistical procedures may be broadly classified into (1) statistics that describe data—descriptive statistics; and (2) statistics that make inferences about more general situations beyond the actual data set—inferential statistics.

  12. Inferential Statistics: Data Analysis

    Inferential Statistics: Inferential Statistics makes inferences and predictions about extensive data by considering a sample data from the original data. It uses probability to reach conclusions. The process of " inferring " insights from a sample data is called " Inferential Statistics .". The best real-world example of " Inferential ...

  13. Inferential Statistics

    Example: Inferential statistics. You randomly select a sample of 11th graders in your state and collect data on their SAT scores and other characteristics. You can use inferential statistics to make estimates and test hypotheses about the whole population of 11th graders in the state based on your sample data.

  14. PDF Basic Principles of Statistical Inference

    Three Modes of Statistical Inference. Descriptive Inference: summarizing and exploring data. Inferring "ideal points" from rollcall votes Inferring "topics" from texts and speeches Inferring "social networks" from surveys. Predictive Inference: forecasting out-of-sample data points. Inferring future state failures from past failures ...

  15. Inferential Statistics

    Inferential statistics in research draws conclusions that cannot be derived from descriptive statistics, i.e. to infer population opinion from sample data. ... An understanding of that model will go a long way to introducing you to the intricacies of data analysis in applied and social research contexts. Next topic . ...

  16. Data Analysis in Quantitative Research

    Abstract. Quantitative data analysis serves as part of an essential process of evidence-making in health and social sciences. It is adopted for any types of research question and design whether it is descriptive, explanatory, or causal. However, compared with qualitative counterpart, quantitative data analysis has less flexibility.

  17. Basic statistical tools in research and data analysis

    Inferential statistics. In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon.

  18. Come to the Right Conclusion with Inferential Analysis

    Inferential statistical analysis is the method that will be used to draw the conclusions. It allows users to infer or conclude trends about a larger population based on the samples that are analyzed. Basically, it takes data from a sample and then makes conclusions about a larger population or group.

  19. Descriptive and Inferential Statistics

    When analysing data, such as the marks achieved by 100 students for a piece of coursework, it is possible to use both descriptive and inferential statistics in your analysis of their marks. Typically, in most research conducted on groups of people, you will use both descriptive and inferential statistics to analyse your results and draw ...

  20. Data Analysis in Research: Types & Methods

    Data analysis in research is an illustrative method of applying the right statistical or logical technique so that the raw data makes sense. ... Inferential statistics. Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population's collected sample. For example ...

  21. Chapter 15 Quantitative Analysis Inferential Statistics

    Inferential statistics are the statistical procedures that are used to reach conclusions about associations between variables. They differ from descriptive statistics in that they are explicitly designed to test hypotheses. ... and is widely used in contemporary social science research. Time series analysis is a technique for analyzing time ...

  22. Inferential Statistics

    Inferential statistics help to draw conclusions about the population while descriptive statistics summarizes the features of the data set. There are two main types of inferential statistics - hypothesis testing and regression analysis. The samples chosen in inferential statistics need to be representative of the entire population.

  23. An introduction to inferential statistics: A review and practical guide

    Inferential statistics measures the significance, i.e. whether any difference e.g. between two samples is due to chance or a real effect, of a test result. ... It is essential that the type of data collected and its analysis is appropriate so that the research question can be answered. However, if the article above is read, the checklist is ...