Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

7.1: Basics of Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 16360

  • Kathryn Kozak
  • Coconino Community College

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

To understand the process of a hypothesis tests, you need to first have an understanding of what a hypothesis is, which is an educated guess about a parameter. Once you have the hypothesis, you collect data and use the data to make a determination to see if there is enough evidence to show that the hypothesis is true. However, in hypothesis testing you actually assume something else is true, and then you look at your data to see how likely it is to get an event that your data demonstrates with that assumption. If the event is very unusual, then you might think that your assumption is actually false. If you are able to say this assumption is false, then your hypothesis must be true. This is known as a proof by contradiction. You assume the opposite of your hypothesis is true and show that it can’t be true. If this happens, then your hypothesis must be true. All hypothesis tests go through the same process. Once you have the process down, then the concept is much easier. It is easier to see the process by looking at an example. Concepts that are needed will be detailed in this example.

Example \(\PageIndex{1}\) basics of hypothesis testing

Suppose a manufacturer of the XJ35 battery claims the mean life of the battery is 500 days with a standard deviation of 25 days. You are the buyer of this battery and you think this claim is inflated. You would like to test your belief because without a good reason you can’t get out of your contract.

What do you do?

Well first, you should know what you are trying to measure. Define the random variable.

Let x = life of a XJ35 battery

Now you are not just trying to find different x values. You are trying to find what the true mean is. Since you are trying to find it, it must be unknown. You don’t think it is 500 days. If you did, you wouldn’t be doing any testing. The true mean, \(\mu\), is unknown. That means you should define that too.

Let \(\mu\)= mean life of a XJ35 battery

You may want to collect a sample. What kind of sample?

You could ask the manufacturers to give you batteries, but there is a chance that there could be some bias in the batteries they pick. To reduce the chance of bias, it is best to take a random sample.

How big should the sample be?

A sample of size 30 or more means that you can use the central limit theorem. Pick a sample of size 30.

Example \(\PageIndex{1}\) contains the data for the sample you collected:

Now what should you do? Looking at the data set, you see some of the times are above 500 and some are below. But looking at all of the numbers is too difficult. It might be helpful to calculate the mean for this sample.

The sample mean is \(\overline{x} = 490\) days. Looking at the sample mean, one might think that you are right. However, the standard deviation and the sample size also plays a role, so maybe you are wrong.

Before going any farther, it is time to formalize a few definitions.

You have a guess that the mean life of a battery is less than 500 days. This is opposed to what the manufacturer claims. There really are two hypotheses, which are just guesses here – the one that the manufacturer claims and the one that you believe. It is helpful to have names for them.

Definition \(\PageIndex{1}\)

Null Hypothesis : historical value, claim, or product specification. The symbol used is \(H_{o}\).

Definition \(\PageIndex{2}\)

Alternate Hypothesis : what you want to prove. This is what you want to accept as true when you reject the null hypothesis. There are two symbols that are commonly used for the alternative hypothesis: \(H_{A}\) or \(H_{I}\). The symbol \(H_{A}\) will be used in this book.

In general, the hypotheses look something like this:

\(H_{o} : \mu=\mu_{o}\)

\(H_{A} : \mu<\mu_{o}\)

where \(\mu_{o}\) just represents the value that the claim says the population mean is actually equal to.

Also, \(H_{A}\) can be less than, greater than, or not equal to.

For this problem:

\(H_{o} : \mu=500\) days, since the manufacturer says the mean life of a battery is 500 days.

\(H_{A} : \mu<500\) days, since you believe that the mean life of the battery is less than 500 days.

Now back to the mean. You have a sample mean of 490 days. Is this small enough to believe that you are right and the manufacturer is wrong? How small does it have to be?

If you calculated a sample mean of 235, you would definitely believe the population mean is less than 500. But even if you had a sample mean of 435 you would probably believe that the true mean was less than 500. What about 475? Or 483? There is some point where you would stop being so sure that the population mean is less than 500. That point separates the values of where you are sure or pretty sure that the mean is less than 500 from the area where you are not so sure. How do you find that point?

Well it depends on how much error you want to make. Of course you don’t want to make any errors, but unfortunately that is unavoidable in statistics. You need to figure out how much error you made with your sample. Take the sample mean, and find the probability of getting another sample mean less than it, assuming for the moment that the manufacturer is right. The idea behind this is that you want to know what is the chance that you could have come up with your sample mean even if the population mean really is 500 days.

You want to find \(P\left(\overline{x}<490 | H_{o} \text { is true }\right)=P(\overline{x}<490 | \mu=500)\)

To compute this probability, you need to know how the sample mean is distributed. Since the sample size is at least 30, then you know the sample mean is approximately normally distributed. Remember \(\mu_{\overline{x}}=\mu\) and \(\sigma_{\overline{x}}=\dfrac{\sigma}{\sqrt{n}}\)

A picture is always useful.

Screenshot (117).png

Before calculating the probability, it is useful to see how many standard deviations away from the mean the sample mean is. Using the formula for the z-score from chapter 6, you find

\(z=\dfrac{\overline{x}-\mu_{o}}{\sigma / \sqrt{n}}=\dfrac{490-500}{25 / \sqrt{30}}=-2.19\)

This sample mean is more than two standard deviations away from the mean. That seems pretty far, but you should look at the probability too.

On TI-83/84:

\(P(\overline{x}<490 | \mu=500)=\text { normalcdf }(-1 E 99,490,500,25 \div \sqrt{30}) \approx 0.0142\)

\(P(\overline{x}<490 \mu=500)=\text { pnorm }(490,500,25 / \operatorname{sqrt}(30)) \approx 0.0142\)

There is a 1.42% chance that you could find a sample mean less than 490 when the population mean is 500 days. This is really small, so the chances are that the assumption that the population mean is 500 days is wrong, and you can reject the manufacturer’s claim. But how do you quantify really small? Is 5% or 10% or 15% really small? How do you decide?

Before you answer that question, a couple more definitions are needed.

Definition \(\PageIndex{3}\)

Test Statistic : \(z=\dfrac{\overline{x}-\mu_{o}}{\sigma / \sqrt{n}}\) since it is calculated as part of the testing of the hypothesis.

Definition \(\PageIndex{4}\)

p – value : probability that the test statistic will take on more extreme values than the observed test statistic, given that the null hypothesis is true. It is the probability that was calculated above.

Now, how small is small enough? To answer that, you really want to know the types of errors you can make.

There are actually only two errors that can be made. The first error is if you say that \(H_{o}\) is false, when in fact it is true. This means you reject \(H_{o}\) when \(H_{o}\) was true. The second error is if you say that \(H_{o}\) is true, when in fact it is false. This means you fail to reject \(H_{o}\) when \(H_{o}\) is false. The following table organizes this for you:

Type of errors:

Definition \(\PageIndex{5}\)

Type I Error is rejecting \(H_{o}\) when \(H_{o}\) is true, and

Definition \(\PageIndex{6}\)

Type II Error is failing to reject \(H_{o}\) when \(H_{o}\) is false.

Since these are the errors, then one can define the probabilities attached to each error.

Definition \(\PageIndex{7}\)

\(\alpha\) = P(type I error) = P(rejecting \(H_{o} / H_{o}\) is true)

Definition \(\PageIndex{8}\)

\(\beta\) = P(type II error) = P(failing to reject \(H_{o} / H_{o}\) is false)

\(\alpha\) is also called the level of significance .

Another common concept that is used is Power = \(1-\beta \).

Now there is a relationship between \(\alpha\) and \(\beta\). They are not complements of each other. How are they related?

If \(\alpha\) increases that means the chances of making a type I error will increase. It is more likely that a type I error will occur. It makes sense that you are less likely to make type II errors, only because you will be rejecting \(H_{o}\) more often. You will be failing to reject \(H_{o}\) less, and therefore, the chance of making a type II error will decrease. Thus, as \(\alpha\) increases, \(\beta\) will decrease, and vice versa. That makes them seem like complements, but they aren’t complements. What gives? Consider one more factor – sample size.

Consider if you have a larger sample that is representative of the population, then it makes sense that you have more accuracy then with a smaller sample. Think of it this way, which would you trust more, a sample mean of 490 if you had a sample size of 35 or sample size of 350 (assuming a representative sample)? Of course the 350 because there are more data points and so more accuracy. If you are more accurate, then there is less chance that you will make any error. By increasing the sample size of a representative sample, you decrease both \(\alpha\) and \(\beta\).

Summary of all of this:

  • For a certain sample size, n , if \(\alpha\) increases, \(\beta\) decreases.
  • For a certain level of significance, \(\alpha\), if n increases, \(\beta\) decreases.

Now how do you find \(\alpha\) and \(\beta\)? Well \(\alpha\) is actually chosen. There are only three values that are usually picked for \(\alpha\): 0.01, 0.05, and 0.10. \(\beta\) is very difficult to find, so usually it isn’t found. If you want to make sure it is small you take as large of a sample as you can afford provided it is a representative sample. This is one use of the Power. You want \(\beta\) to be small and the Power of the test is large. The Power word sounds good.

Which pick of \(\alpha\) do you pick? Well that depends on what you are working on. Remember in this example you are the buyer who is trying to get out of a contract to buy these batteries. If you create a type I error, you said that the batteries are bad when they aren’t, most likely the manufacturer will sue you. You want to avoid this. You might pick \(\alpha\) to be 0.01. This way you have a small chance of making a type I error. Of course this means you have more of a chance of making a type II error. No big deal right? What if the batteries are used in pacemakers and you tell the person that their pacemaker’s batteries are good for 500 days when they actually last less, that might be bad. If you make a type II error, you say that the batteries do last 500 days when they last less, then you have the possibility of killing someone. You certainly do not want to do this. In this case you might want to pick \(\alpha\) as 0.10. If both errors are equally bad, then pick \(\alpha\) as 0.05.

The above discussion is why the choice of \(\alpha\) depends on what you are researching. As the researcher, you are the one that needs to decide what \(\alpha\) level to use based on your analysis of the consequences of making each error is.

If a type I error is really bad, then pick \(\alpha\) = 0.01.

If a type II error is really bad, then pick \(\alpha\) = 0.10

If neither error is bad, or both are equally bad, then pick \(\alpha\) = 0.05

The main thing is to always pick the \(\alpha\) before you collect the data and start the test.

The above discussion was long, but it is really important information. If you don’t know what the errors of the test are about, then there really is no point in making conclusions with the tests. Make sure you understand what the two errors are and what the probabilities are for them.

Now it is time to go back to the example and put this all together. This is the basic structure of testing a hypothesis, usually called a hypothesis test. Since this one has a test statistic involving z, it is also called a z-test. And since there is only one sample, it is usually called a one-sample z-test.

Example \(\PageIndex{2}\) battery example revisited

  • State the random variable and the parameter in words.
  • State the null and alternative hypothesis and the level of significance.
  • A random sample of size n is taken.
  • The population standard derivation is known.
  • The sample size is at least 30 or the population of the random variable is normally distributed.
  • Find the sample statistic, test statistic, and p-value.
  • Interpretation

1. x = life of battery

\(\mu\) = mean life of a XJ35 battery

2. \(H_{o} : \mu=500\) days

\(H_{A} : \mu<500\) days

\(\alpha = 0.10\) (from above discussion about consequences)

3. Every hypothesis has some assumptions that be met to make sure that the results of the hypothesis are valid. The assumptions are different for each test. This test has the following assumptions.

  • This occurred in this example, since it was stated that a random sample of 30 battery lives were taken.
  • This is true, since it was given in the problem.
  • The sample size was 30, so this condition is met.

4. The test statistic depends on how many samples there are, what parameter you are testing, and assumptions that need to be checked. In this case, there is one sample and you are testing the mean. The assumptions were checked above.

Sample statistic:

\(\overline{x} = 490\)

Test statistic:

Screenshot (139).png

Using TI-83/84:

\(P(\overline{x}<490 | \mu=500)=\text { normalcdf }(-1 \mathrm{E} 99,490,500,25 / \sqrt{30}) \approx 0.0142\)

\(P(\overline{x}<490 | \mu=500)=\operatorname{pnorm}(490,500,25 / \operatorname{sqrt}(30)) \approx 0.0142\)

5. Now what? Well, this p-value is 0.0142. This is a lot smaller than the amount of error you would accept in the problem -\(\alpha\) = 0.10. That means that finding a sample mean less than 490 days is unusual to happen if \(H_{o}\) is true. This should make you think that \(H_{o}\) is not true. You should reject \(H_{o}\).

In fact, in general:

Reject \(H_{o}\) if the p-value < \(\alpha\) and

Fail to reject \(H_{o}\) if the p-value \(\geq \alpha\).

6. Since you rejected \(H_{o}\), what does this mean in the real world? That is what goes in the interpretation. Since you rejected the claim by the manufacturer that the mean life of the batteries is 500 days, then you now can believe that your hypothesis was correct. In other words, there is enough evidence to show that the mean life of the battery is less than 500 days.

Now that you know that the batteries last less than 500 days, should you cancel the contract? Statistically, there is evidence that the batteries do not last as long as the manufacturer says they should. However, based on this sample there are only ten days less on average that the batteries last. There may not be practical significance in this case. Ten days do not seem like a large difference. In reality, if the batteries are used in pacemakers, then you would probably tell the patient to have the batteries replaced every year. You have a large buffer whether the batteries last 490 days or 500 days. It seems that it might not be worth it to break the contract over ten days. What if the 10 days was practically significant? Are there any other things you should consider? You might look at the business relationship with the manufacturer. You might also look at how much it would cost to find a new manufacturer. These are also questions to consider before making any changes. What this discussion should show you is that just because a hypothesis has statistical significance does not mean it has practical significance. The hypothesis test is just one part of a research process. There are other pieces that you need to consider.

That’s it. That is what a hypothesis test looks like. All hypothesis tests are done with the same six steps. Those general six steps are outlined below.

  • State the random variable and the parameter in words. This is where you are defining what the unknowns are in this problem. x = random variable \(\mu\) = mean of random variable, if the parameter of interest is the mean. There are other parameters you can test, and you would use the appropriate symbol for that parameter.
  • State the null and alternative hypotheses and the level of significance \(H_{o} : \mu=\mu_{o}\), where \(\mu_{o}\) is the known mean \(H_{A} : \mu<\mu_{o}\) \(H_{A} : \mu>\mu_{o}\), use the appropriate one for your problem \(H_{A} : \mu \neq \mu_{o}\) Also, state your \(\alpha\) level here.
  • State and check the assumptions for a hypothesis test. Each hypothesis test has its own assumptions. They will be stated when the different hypothesis tests are discussed.
  • Find the sample statistic, test statistic, and p-value. This depends on what parameter you are working with, how many samples, and the assumptions of the test. The p-value depends on your \(H_{A}\). If you are doing the \(H_{A}\) with the less than, then it is a left-tailed test, and you find the probability of being in that left tail. If you are doing the \(H_{A}\) with the greater than, then it is a right-tailed test, and you find the probability of being in the right tail. If you are doing the \(H_{A}\) with the not equal to, then you are doing a two-tail test, and you find the probability of being in both tails. Because of symmetry, you could find the probability in one tail and double this value to find the probability in both tails.
  • Conclusion This is where you write reject \(H_{o}\) or fail to reject \(H_{o}\). The rule is: if the p-value < \(\alpha\), then reject \(H_{o}\). If the p-value \(\geq \alpha\), then fail to reject \(H_{o}\).
  • Interpretation This is where you interpret in real world terms the conclusion to the test. The conclusion for a hypothesis test is that you either have enough evidence to show \(H_{A}\) is true, or you do not have enough evidence to show \(H_{A}\) is true.

Sorry, one more concept about the conclusion and interpretation. First, the conclusion is that you reject \(H_{o}\) or you fail to reject \(H_{o}\). Why was it said like this? It is because you never accept the null hypothesis. If you wanted to accept the null hypothesis, then why do the test in the first place? In the interpretation, you either have enough evidence to show \(H_{A}\) is true, or you do not have enough evidence to show \(H_{A}\) is true. You wouldn’t want to go to all this work and then find out you wanted to accept the claim. Why go through the trouble? You always want to show that the alternative hypothesis is true. Sometimes you can do that and sometimes you can’t. It doesn’t mean you proved the null hypothesis; it just means you can’t prove the alternative hypothesis. Here is an example to demonstrate this.

Example \(\PageIndex{3}\) conclusion in hypothesis tests

In the U.S. court system a jury trial could be set up as a hypothesis test. To really help you see how this works, let’s use OJ Simpson as an example. In the court system, a person is presumed innocent until he/she is proven guilty, and this is your null hypothesis. OJ Simpson was a football player in the 1970s. In 1994 his ex-wife and her friend were killed. OJ Simpson was accused of the crime, and in 1995 the case was tried. The prosecutors wanted to prove OJ was guilty of killing his wife and her friend, and that is the alternative hypothesis

\(H_{0}\): OJ is innocent of killing his wife and her friend

\(H_{A}\): OJ is guilty of killing his wife and her friend

In this case, a verdict of not guilty was given. That does not mean that he is innocent of this crime. It means there was not enough evidence to prove he was guilty. Many people believe that OJ was guilty of this crime, but the jury did not feel that the evidence presented was enough to show there was guilt. The verdict in a jury trial is always guilty or not guilty!

The same is true in a hypothesis test. There is either enough or not enough evidence to show that alternative hypothesis. It is not that you proved the null hypothesis true.

When identifying hypothesis, it is important to state your random variable and the appropriate parameter you want to make a decision about. If count something, then the random variable is the number of whatever you counted. The parameter is the proportion of what you counted. If the random variable is something you measured, then the parameter is the mean of what you measured. (Note: there are other parameters you can calculate, and some analysis of those will be presented in later chapters.)

Example \(\PageIndex{4}\) stating hypotheses

Identify the hypotheses necessary to test the following statements:

  • The average salary of a teacher is more than $30,000.
  • The proportion of students who like math is less than 10%.
  • The average age of students in this class differs from 21.

a. x = salary of teacher

\(\mu\) = mean salary of teacher

The guess is that \(\mu>\$ 30,000\) and that is the alternative hypothesis.

The null hypothesis has the same parameter and number with an equal sign.

\(\begin{array}{l}{H_{0} : \mu=\$ 30,000} \\ {H_{A} : \mu>\$ 30,000}\end{array}\)

b. x = number od students who like math

p = proportion of students who like math

The guess is that p < 0.10 and that is the alternative hypothesis.

\(\begin{array}{l}{H_{0} : p=0.10} \\ {H_{A} : p<0.10}\end{array}\)

c. x = age of students in this class

\(\mu\) = mean age of students in this class

The guess is that \(\mu \neq 21\) and that is the alternative hypothesis.

\(\begin{array}{c}{H_{0} : \mu=21} \\ {H_{A} : \mu \neq 21}\end{array}\)

Example \(\PageIndex{5}\) Stating Type I and II Errors and Picking Level of Significance

  • The plant-breeding department at a major university developed a new hybrid raspberry plant called YumYum Berry. Based on research data, the claim is made that from the time shoots are planted 90 days on average are required to obtain the first berry with a standard deviation of 9.2 days. A corporation that is interested in marketing the product tests 60 shoots by planting them and recording the number of days before each plant produces its first berry. The sample mean is 92.3 days. The corporation wants to know if the mean number of days is more than the 90 days claimed. State the type I and type II errors in terms of this problem, consequences of each error, and state which level of significance to use.
  • A concern was raised in Australia that the percentage of deaths of Aboriginal prisoners was higher than the percent of deaths of non-indigenous prisoners, which is 0.27%. State the type I and type II errors in terms of this problem, consequences of each error, and state which level of significance to use.

a. x = time to first berry for YumYum Berry plant

\(\mu\) = mean time to first berry for YumYum Berry plant

\(\begin{array}{l}{H_{0} : \mu=90} \\ {H_{A} : \mu>90}\end{array}\)

Type I Error: If the corporation does a type I error, then they will say that the plants take longer to produce than 90 days when they don’t. They probably will not want to market the plants if they think they will take longer. They will not market them even though in reality the plants do produce in 90 days. They may have loss of future earnings, but that is all.

Type II error: The corporation do not say that the plants take longer then 90 days to produce when they do take longer. Most likely they will market the plants. The plants will take longer, and so customers might get upset and then the company would get a bad reputation. This would be really bad for the company.

Level of significance: It appears that the corporation would not want to make a type II error. Pick a 10% level of significance, \(\alpha = 0.10\).

b. x = number of Aboriginal prisoners who have died

p = proportion of Aboriginal prisoners who have died

\(\begin{array}{l}{H_{o} : p=0.27 \%} \\ {H_{A} : p>0.27 \%}\end{array}\)

Type I error: Rejecting that the proportion of Aboriginal prisoners who died was 0.27%, when in fact it was 0.27%. This would mean you would say there is a problem when there isn’t one. You could anger the Aboriginal community, and spend time and energy researching something that isn’t a problem.

Type II error: Failing to reject that the proportion of Aboriginal prisoners who died was 0.27%, when in fact it is higher than 0.27%. This would mean that you wouldn’t think there was a problem with Aboriginal prisoners dying when there really is a problem. You risk causing deaths when there could be a way to avoid them.

Level of significance: It appears that both errors may be issues in this case. You wouldn’t want to anger the Aboriginal community when there isn’t an issue, and you wouldn’t want people to die when there may be a way to stop it. It may be best to pick a 5% level of significance, \(\alpha = 0.05\).

Hypothesis testing is really easy if you follow the same recipe every time. The only differences in the various problems are the assumptions of the test and the test statistic you calculate so you can find the p-value. Do the same steps, in the same order, with the same words, every time and these problems become very easy.

Exercise \(\PageIndex{1}\)

For the problems in this section, a question is being asked. This is to help you understand what the hypotheses are. You are not to run any hypothesis tests and come up with any conclusions in this section.

  • Eyeglassomatic manufactures eyeglasses for different retailers. They test to see how many defective lenses they made in a given time period and found that 11% of all lenses had defects of some type. Looking at the type of defects, they found in a three-month time period that out of 34,641 defective lenses, 5865 were due to scratches. Are there more defects from scratches than from all other causes? State the random variable, population parameter, and hypotheses.
  • According to the February 2008 Federal Trade Commission report on consumer fraud and identity theft, 23% of all complaints in 2007 were for identity theft. In that year, Alaska had 321 complaints of identity theft out of 1,432 consumer complaints ("Consumer fraud and," 2008). Does this data provide enough evidence to show that Alaska had a lower proportion of identity theft than 23%? State the random variable, population parameter, and hypotheses.
  • The Kyoto Protocol was signed in 1997, and required countries to start reducing their carbon emissions. The protocol became enforceable in February 2005. In 2004, the mean CO2 emission was 4.87 metric tons per capita. Is there enough evidence to show that the mean CO2 emission is lower in 2010 than in 2004? State the random variable, population parameter, and hypotheses.
  • The FDA regulates that fish that is consumed is allowed to contain 1.0 mg/kg of mercury. In Florida, bass fish were collected in 53 different lakes to measure the amount of mercury in the fish. The data for the average amount of mercury in each lake is in Example \(\PageIndex{5}\) ("Multi-disciplinary niser activity," 2013). Do the data provide enough evidence to show that the fish in Florida lakes has more mercury than the allowable amount? State the random variable, population parameter, and hypotheses.
  • Eyeglassomatic manufactures eyeglasses for different retailers. They test to see how many defective lenses they made in a given time period and found that 11% of all lenses had defects of some type. Looking at the type of defects, they found in a three-month time period that out of 34,641 defective lenses, 5865 were due to scratches. Are there more defects from scratches than from all other causes? State the type I and type II errors in this case, consequences of each error type for this situation from the perspective of the manufacturer, and the appropriate alpha level to use. State why you picked this alpha level.
  • According to the February 2008 Federal Trade Commission report on consumer fraud and identity theft, 23% of all complaints in 2007 were for identity theft. In that year, Alaska had 321 complaints of identity theft out of 1,432 consumer complaints ("Consumer fraud and," 2008). Does this data provide enough evidence to show that Alaska had a lower proportion of identity theft than 23%? State the type I and type II errors in this case, consequences of each error type for this situation from the perspective of the state of Arizona, and the appropriate alpha level to use. State why you picked this alpha level.
  • The Kyoto Protocol was signed in 1997, and required countries to start reducing their carbon emissions. The protocol became enforceable in February 2005. In 2004, the mean CO2 emission was 4.87 metric tons per capita. Is there enough evidence to show that the mean CO2 emission is lower in 2010 than in 2004? State the type I and type II errors in this case, consequences of each error type for this situation from the perspective of the agency overseeing the protocol, and the appropriate alpha level to use. State why you picked this alpha level.
  • The FDA regulates that fish that is consumed is allowed to contain 1.0 mg/kg of mercury. In Florida, bass fish were collected in 53 different lakes to measure the amount of mercury in the fish. The data for the average amount of mercury in each lake is in Example \(\PageIndex{5}\) ("Multi-disciplinary niser activity," 2013). Do the data provide enough evidence to show that the fish in Florida lakes has more mercury than the allowable amount? State the type I and type II errors in this case, consequences of each error type for this situation from the perspective of the FDA, and the appropriate alpha level to use. State why you picked this alpha level.

1. \(H_{o} : p=0.11, H_{A} : p>0.11\)

3. \(H_{o} : \mu=4.87 \text { metric tons per capita, } H_{A} : \mu<4.87 \text { metric tons per capita }\)

5. See solutions

7. See solutions

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Biology library

Course: biology library   >   unit 1, the scientific method.

  • Controlled experiments
  • The scientific method and experimental design

what does it mean to revise the hypothesis

Introduction

  • Make an observation.
  • Ask a question.
  • Form a hypothesis , or testable explanation.
  • Make a prediction based on the hypothesis.
  • Test the prediction.
  • Iterate: use the results to make new hypotheses or predictions.

Scientific method example: Failure to toast

1. make an observation..

  • Observation: the toaster won't toast.

2. Ask a question.

  • Question: Why won't my toaster toast?

3. Propose a hypothesis.

  • Hypothesis: Maybe the outlet is broken.

4. Make predictions.

  • Prediction: If I plug the toaster into a different outlet, then it will toast the bread.

5. Test the predictions.

  • Test of prediction: Plug the toaster into a different outlet and try again.
  • If the toaster does toast, then the hypothesis is supported—likely correct.
  • If the toaster doesn't toast, then the hypothesis is not supported—likely wrong.

Logical possibility

Practical possibility, building a body of evidence, 6. iterate..

  • Iteration time!
  • If the hypothesis was supported, we might do additional tests to confirm it, or revise it to be more specific. For instance, we might investigate why the outlet is broken.
  • If the hypothesis was not supported, we would come up with a new hypothesis. For instance, the next hypothesis might be that there's a broken wire in the toaster.

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Incredible Answer

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

what does it mean to revise the hypothesis

Understanding Science

How science REALLY works...

  • Understanding Science 101

Evidence can reflect on an idea in many different ways — supporting it, contradicting it, suggesting it should be revised, suggesting the revision of an assumption, or suggesting a new research question.

Reviewing test results

Scientists typically weigh multiple competing ideas about how something works and try to figure out which of those is most accurate based on the evidence .

  • Evidence may lend support to one hypothesis over others. For example, drilling into coral atolls and discovering a layer of coral thousands of feet thick clearly lent support to the idea that coral atolls form around subsiding volcanic islands. Of course, many other lines of evidence also helped support that idea over competing explanations.
  • Evidence may help rule out some hypotheses. Similarly, the results of the atoll drilling project helped refute a different idea — that atolls grow atop underwater mountains built up by oceanic debris, which would have fit with the observation of a thin layer of coral.
  • Evidence may lead to the revision of a hypothesis. For example, experiments and observations had long supported the idea that light consists of waves, but in 1905, Einstein showed that a well known (and previously unexplained) phenomenon — the photoelectric effect — made perfect sense if light consisted of discrete particles. This led physicists to modify their ideas about the nature of light: light was both wave-like and particle-like.
  • Evidence may reveal a faulty assumption, causing the scientist to revise his or her assumptions and possibly redesign the test. For example, in the 1970s, geologists tried to test ideas about the timing of the transition between the Cretaceous and Tertiary periods by measuring the amount of iridium in the transitional rock layer. The test relied on the assumption that iridium was deposited at a low but constant rate. However, to their surprise, the rock layer contained unusually large amounts of iridium, indicating that their original test design had been based on the false assumption of a low and constant deposition rate.
  • Take a sidetrip

All tests involve some assumptions. To learn more, jump to  Making Assumptions .

  • Evidence may inspire a wholly new hypothesis or new research question. For example, the unexpected discovery of large amounts of iridium at the Cretaceous-Tertiary boundary eventually inspired a new hypothesis about a different topic — that the end-Cretaceous mass extinction was triggered by a catastrophic asteroid impact.
  • Evidence may be inconclusive, failing to support any particular explanation over another. For example, many biologists and chemists have investigated the origins of life trying to figure out in what environment this occurred. So far, the evidence has not been conclusive, leaving the open question of whether life started in hydrothermal vents, freshwater pools, or somewhere else. Scientists continue to collect more evidence in order to resolve the question.

New evidence can feed back into the process of science in many ways. Most importantly, new evidence helps us evaluate ideas. To learn more about how science evaluates ideas, read on…

In the real world, test results aren’t always clear cut. Often, results end up somewhat supporting or arguing against a particular hypothesis. Learn more about how scientists deal with fuzzy outcomes in  Real world results .

Digging into data

Competing ideas: A perfect fit for the evidence

Subscribe to our newsletter

  • The science flowchart
  • Science stories
  • Grade-level teaching guides
  • Teaching resource database
  • Journaling tool
  • Misconceptions

Hypothesis Testing (cont...)

Hypothesis testing, the null and alternative hypothesis.

In order to undertake hypothesis testing you need to express your research hypothesis as a null and alternative hypothesis. The null hypothesis and alternative hypothesis are statements regarding the differences or effects that occur in the population. You will use your sample to test which statement (i.e., the null hypothesis or alternative hypothesis) is most likely (although technically, you test the evidence against the null hypothesis). So, with respect to our teaching example, the null and alternative hypothesis will reflect statements about all statistics students on graduate management courses.

The null hypothesis is essentially the "devil's advocate" position. That is, it assumes that whatever you are trying to prove did not happen ( hint: it usually states that something equals zero). For example, the two different teaching methods did not result in different exam performances (i.e., zero difference). Another example might be that there is no relationship between anxiety and athletic performance (i.e., the slope is zero). The alternative hypothesis states the opposite and is usually the hypothesis you are trying to prove (e.g., the two different teaching methods did result in different exam performances). Initially, you can state these hypotheses in more general terms (e.g., using terms like "effect", "relationship", etc.), as shown below for the teaching methods example:

Depending on how you want to "summarize" the exam performances will determine how you might want to write a more specific null and alternative hypothesis. For example, you could compare the mean exam performance of each group (i.e., the "seminar" group and the "lectures-only" group). This is what we will demonstrate here, but other options include comparing the distributions , medians , amongst other things. As such, we can state:

Now that you have identified the null and alternative hypotheses, you need to find evidence and develop a strategy for declaring your "support" for either the null or alternative hypothesis. We can do this using some statistical theory and some arbitrary cut-off points. Both these issues are dealt with next.

Significance levels

The level of statistical significance is often expressed as the so-called p -value . Depending on the statistical test you have chosen, you will calculate a probability (i.e., the p -value) of observing your sample results (or more extreme) given that the null hypothesis is true . Another way of phrasing this is to consider the probability that a difference in a mean score (or other statistic) could have arisen based on the assumption that there really is no difference. Let us consider this statement with respect to our example where we are interested in the difference in mean exam performance between two different teaching methods. If there really is no difference between the two teaching methods in the population (i.e., given that the null hypothesis is true), how likely would it be to see a difference in the mean exam performance between the two teaching methods as large as (or larger than) that which has been observed in your sample?

So, you might get a p -value such as 0.03 (i.e., p = .03). This means that there is a 3% chance of finding a difference as large as (or larger than) the one in your study given that the null hypothesis is true. However, you want to know whether this is "statistically significant". Typically, if there was a 5% or less chance (5 times in 100 or less) that the difference in the mean exam performance between the two teaching methods (or whatever statistic you are using) is as different as observed given the null hypothesis is true, you would reject the null hypothesis and accept the alternative hypothesis. Alternately, if the chance was greater than 5% (5 times in 100 or more), you would fail to reject the null hypothesis and would not accept the alternative hypothesis. As such, in this example where p = .03, we would reject the null hypothesis and accept the alternative hypothesis. We reject it because at a significance level of 0.03 (i.e., less than a 5% chance), the result we obtained could happen too frequently for us to be confident that it was the two teaching methods that had an effect on exam performance.

Whilst there is relatively little justification why a significance level of 0.05 is used rather than 0.01 or 0.10, for example, it is widely used in academic research. However, if you want to be particularly confident in your results, you can set a more stringent level of 0.01 (a 1% chance or less; 1 in 100 chance or less).

Testimonials

One- and two-tailed predictions

When considering whether we reject the null hypothesis and accept the alternative hypothesis, we need to consider the direction of the alternative hypothesis statement. For example, the alternative hypothesis that was stated earlier is:

The alternative hypothesis tells us two things. First, what predictions did we make about the effect of the independent variable(s) on the dependent variable(s)? Second, what was the predicted direction of this effect? Let's use our example to highlight these two points.

Sarah predicted that her teaching method (independent variable: teaching method), whereby she not only required her students to attend lectures, but also seminars, would have a positive effect (that is, increased) students' performance (dependent variable: exam marks). If an alternative hypothesis has a direction (and this is how you want to test it), the hypothesis is one-tailed. That is, it predicts direction of the effect. If the alternative hypothesis has stated that the effect was expected to be negative, this is also a one-tailed hypothesis.

Alternatively, a two-tailed prediction means that we do not make a choice over the direction that the effect of the experiment takes. Rather, it simply implies that the effect could be negative or positive. If Sarah had made a two-tailed prediction, the alternative hypothesis might have been:

In other words, we simply take out the word "positive", which implies the direction of our effect. In our example, making a two-tailed prediction may seem strange. After all, it would be logical to expect that "extra" tuition (going to seminar classes as well as lectures) would either have a positive effect on students' performance or no effect at all, but certainly not a negative effect. However, this is just our opinion (and hope) and certainly does not mean that we will get the effect we expect. Generally speaking, making a one-tail prediction (i.e., and testing for it this way) is frowned upon as it usually reflects the hope of a researcher rather than any certainty that it will happen. Notable exceptions to this rule are when there is only one possible way in which a change could occur. This can happen, for example, when biological activity/presence in measured. That is, a protein might be "dormant" and the stimulus you are using can only possibly "wake it up" (i.e., it cannot possibly reduce the activity of a "dormant" protein). In addition, for some statistical tests, one-tailed tests are not possible.

Rejecting or failing to reject the null hypothesis

Let's return finally to the question of whether we reject or fail to reject the null hypothesis.

If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis. Alternatively, if the significance level is above the cut-off value, we fail to reject the null hypothesis and cannot accept the alternative hypothesis. You should note that you cannot accept the null hypothesis, but only find evidence against it.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • How to Write a Strong Hypothesis | Guide & Examples

How to Write a Strong Hypothesis | Guide & Examples

Published on 6 May 2022 by Shona McCombes .

A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection.

Table of contents

What is a hypothesis, developing a hypothesis (with example), hypothesis examples, frequently asked questions about writing hypotheses.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess – it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

Variables in hypotheses

Hypotheses propose a relationship between two or more variables . An independent variable is something the researcher changes or controls. A dependent variable is something the researcher observes and measures.

In this example, the independent variable is exposure to the sun – the assumed cause . The dependent variable is the level of happiness – the assumed effect .

Prevent plagiarism, run a free check.

Step 1: ask a question.

Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project.

Step 2: Do some preliminary research

Your initial answer to the question should be based on what is already known about the topic. Look for theories and previous studies to help you form educated assumptions about what your research will find.

At this stage, you might construct a conceptual framework to identify which variables you will study and what you think the relationships are between them. Sometimes, you’ll have to operationalise more complex constructs.

Step 3: Formulate your hypothesis

Now you should have some idea of what you expect to find. Write your initial answer to the question in a clear, concise sentence.

Step 4: Refine your hypothesis

You need to make sure your hypothesis is specific and testable. There are various ways of phrasing a hypothesis, but all the terms you use should have clear definitions, and the hypothesis should contain:

  • The relevant variables
  • The specific group being studied
  • The predicted outcome of the experiment or analysis

Step 5: Phrase your hypothesis in three ways

To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable.

In academic research, hypotheses are more commonly phrased in terms of correlations or effects, where you directly state the predicted relationship between variables.

If you are comparing two groups, the hypothesis can state what difference you expect to find between them.

Step 6. Write a null hypothesis

If your research involves statistical hypothesis testing , you will also have to write a null hypothesis. The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0 , while the alternative hypothesis is H 1 or H a .

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis is not just a guess. It should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (‘ x affects y because …’).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses. In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2022, May 06). How to Write a Strong Hypothesis | Guide & Examples. Scribbr. Retrieved 27 May 2024, from https://www.scribbr.co.uk/research-methods/hypothesis-writing/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, operationalisation | a guide with examples, pros & cons, what is a conceptual framework | tips & examples, a quick guide to experimental design | 5 steps & examples.

Crafting the Methods to Test Hypotheses

  • Open Access
  • First Online: 03 December 2022

Cite this chapter

You have full access to this open access chapter

what does it mean to revise the hypothesis

  • James Hiebert 6 ,
  • Jinfa Cai 7 ,
  • Stephen Hwang 7 ,
  • Anne K Morris 6 &
  • Charles Hohensee 6  

Part of the book series: Research in Mathematics Education ((RME))

11k Accesses

1 Citations

If you have carefully worked through the ideas in the previous chapters, the many questions researchers often ask about what methods to use boil down to one central question: How can I best test my hypotheses? The answers to questions such as “Should I do an ethnography or an experiment?” and “Should I use qualitative data or quantitative data?” are quite clear if you make explicit predictions for what you will find and fully develop rationales for why you made these predictions. Then you need only worry about how to find out in what ways your predictions are right in what ways they are wrong. There is a lot to know about different research designs and methods because these provide the tools you can use to test your hypotheses. But as you learn these details, keep in mind they are means to an end, not an end in themselves.

You have full access to this open access chapter,  Download chapter PDF

Part I. What Does It Mean to Test Your Hypotheses?

From the beginning, we have talked about formulating and testing hypotheses. We will briefly review relevant points from the first three chapters and then consider some additional issues you will encounter as you craft the methods you will use to test your hypotheses.

In Chap. 1 , we proposed a distinction between hypotheses and predictions. Predictions are guesses you make about answers to your research questions; hypotheses are the predictions plus the reasons, or rationales, for your predictions. We tied together predictions and rationales as constituent parts of hypotheses because it is beneficial to keep them connected throughout the process of scientific inquiry. When we talk about testing hypotheses , we mean gathering information (data) to see how close your predictions were to being correct and then assessing the soundness of your rationales. So, testing hypotheses is really a two-step process: (1) comparing predictions with empirical observations or data, and (2) assessing the soundness of the rationales that justified these predictions.

In Chap. 2 , we suggested that making predictions and explaining why you made them should happen at the same time. Along with your first guesses about the answers to your research questions, you should write out your explanations for why you think the answers will be accurate. This will be a back-and-forth process because you are likely to revise your predictions as you think through the reasons you are making them. In addition, we suggested asking how you could test your predictions. This often leads to additional revisions in your predictions.

We also noted that, because education is filled with complexities, answers to substantive questions can seldom be predicted with complete accuracy. Consequently, testing predictions does not mean deciding whether or not they were correct but rather how you can revise them to improve their correctness. In addition, testing predictions means reexamining your rationales to improve the soundness of your reasoning. In other words, testing predictions involves gathering the kind of information that guides revisions to your hypotheses.

As a final reminder from Chap. 2 , we asked you to imagine how you could test your hypotheses. This involves anticipating what information (data) would best show how accurate your predictions were and would inform revisions to your rationales. Imagining the best ways to test hypotheses is essential for moving through the early cycles of scientific inquiry. In this chapter, we extend the process by crafting the actual methods you will use to test your hypotheses.

In Chap. 3 , you considered further the multiple cycles of asking questions, articulating your predictions, developing your rationales, imagining testing your predictions and rationales, adjusting your rationales, revising your predictions, and so on. You learned that a significant consequence of repeating this cycle many times is the increasingly clear, justifiable, and complete rationales that turn into the theoretical framework for your study. This comes, in large part, from the clear descriptions of the variables you will attend to and the mechanisms you conjecture are at work. The theoretical framework allows you to imagine with greater confidence, and in more detail, the kind of data you will need to test your hypotheses and how you could collect them.

In this chapter, we will examine many of the issues you must consider as you choose and adapt methods to fit your study. By “methods,” we mean the entire set of procedures you will use, including the basic design of the study, measures for collecting data, and analytic approaches. As in previous chapters, we will focus on issues that are critical for conducting scientific inquiry but often are not sufficiently discussed in more standard methods textbooks. We will also cite sources where you can find more information. For example, the Institute of Education Sciences and the National Science Foundation (2013) jointly developed guidelines for researchers about the different methods that can be used for different types of research. These guidelines are meant to inform researchers who seek funding from these agencies.

Exercise 4.1

Choose a published empirical study that includes clearly stated research questions, explicit hypotheses (predictions about the answers to the research questions plus the rationales for the predictions), and the methods used. Identify the variables studied and describe the mechanisms, embedded in the hypotheses and conjectured to create the predicted answers. Analyze the appropriateness of the methods used to answer the research questions (i.e., test the predictions). Notes: (1) you might have trouble finding a clear statement of the hypotheses; if so, imagine what the researchers had in mind; and (2) although we have not discussed all of the information you might need to complete this exercise in detail, writing out your response in as much detail as possible will prepare you to make sense of this chapter.

Part II. What Are the Best Methods for Your Study?

The best methods for your study are the procedures that give you the richest information about how near your predictions were to the actual findings and how they could be adjusted to be more accurate. Said another way, choose the methods that provide the clearest answers to your research questions. There are many decisions you will need to make about which methods to use, and it is likely that, at a detailed level, there are different combinations of decisions that would be equally effective. So, we will not assume there is a single best combination. Rather, from this point on we will talk about appropriate methods.

The picture of text chooses the method that provides the clearest answer to your questions; written.

Most research questions in education are too complicated to be fully answered by conducting only one study using one set of methods. Different methods offer different perspectives and reveal different aspects of educational phenomena. “Science becomes more certain in its progression if it has the benefits of a wide array of methods and information. Science is not improved by subtracting but by adding methods” (Sechrest et al., 1993 , p. 230). You will need to craft one set of methods for your study but be aware that, in the future, other researchers could use another set of methods to test similar hypotheses and report somewhat different findings that would lead to further revisions of the hypotheses. The methods you craft should be aligned with your theoretical framework, as noted earlier, but there are likely to be other sets of methods that are aligned as well.

A useful organizational scheme for crafting your methods divides the process into three phases: choosing the design of your study, developing the measures and procedures for gathering the data, and choosing methods to analyze the data (in order to compare your findings to your predictions). We will not repeat most of what you can find in textbooks on research methods. Rather, we will focus on the issues within each phase of crafting your methods that are often difficult for beginning researchers. In addition, we will identify areas that manuscript reviewers for JRME often say are inadequately developed or described. Reviewers’ concerns are based on what they read, so the problems they identify could be with the study itself or the way it is reported. We will deal first with issues of conducting the study and then talk about related issues with communicating your study to others.

Choosing the Design for Your Study

One of the first decisions you need to make is what design you will use. By design we mean the overall strategy you choose to integrate the different components of the study in a coherent and logical way. The design offers guidelines for the sampling procedure, the development of measures, the collection of data, and the analysis of data. Depending on the textbook you consult, there are different classification schemes that identify different designs. One common scheme is to distinguish between experimental, correlational, and descriptive research.

In our view, each design is tailored to explain different features of phenomena. Experiments, as we define them, are tailored to explain changes in phenomena. Correlations are tailored to explain relationships between two or more phenomena. And descriptions are tailored to explain phenomena as they exist. We unpack these ideas in the following discussions.

In education, most experiments take the form of intervention studies. They are conducted to test the effects of an intervention designed to change something (e.g., students’ achievement). If you choose an experimental design, your research questions probably ask whether an intervention will improve certain outcomes. For example: “Will professional development that engages teachers in analyzing videos of teaching help them teach more conceptually? If so, under what conditions does this occur?” There are several good sources to read about designing experiments in education research (e.g., Cook et al., 2002 ; Gall et al., 2007 ; Kelly & Lesh, 2000 ). We will focus our attention on several specific issues.

Many experiments aim to determine if something causes something else. This is another way of saying the aim is to produce change in something and explain why the change occurred. In education, experiments often try to explain whether and why an intervention is “effective,” or whether and why intervention A is more effective than intervention B. Effective usually means the treatment causes or explains the outcomes of interest. If your investigation is situated in an actual classroom or another authentic educational setting, it is usually difficult to claim causal effects. There are many reasons for this, most tied to the complicated nature of educational settings. You should consider the following three issues when designing an experiment.

First, in education, the strict requirements for an experimental design are rarely met. For example, usually students, teachers, schools, and so forth, cannot be randomly assigned to receive one or the other of the interventions that are being compared. In addition, it is almost impossible to double-blind education experiments (that is, to ensure that the participants do not know which treatment they are receiving and that the researchers do not know which participants are receiving which treatment—like in medical drug trials). These design constraints limit your ability to claim causal effects of an intervention because they make it difficult to explain the reasons for the changes. Consequently, many studies that are called experiments are better labeled “quasi-experiments.” See Campbell et al. ( 1963 ) and Gopalan et al. ( 2020 ) for more details.

Second, even when you are aware of these constraints and consider your study a quasi-experiment, it is still tempting to make causal claims not supported by your findings. Suppose you are testing your prediction that a specially designed five-lesson unit will help students understand adding and subtracting fractions with unlike denominators. Suppose you are fortunate enough to randomly assign many classrooms to your intervention and an equal number to a common textbook unit. Suppose students in the experimental classrooms perform significantly better on a valid measure of understanding fraction addition and subtraction. Can you claim your treatment caused the better outcomes?

Before making this basic causal claim, you should ask yourself, “What, exactly, was the treatment? To what do I attribute better performance?” When implemented in actual, real classrooms, your intervention will have included many (interacting) elements, some of which you might not even be aware of. That is, in practice, the “treatment” may no longer be defined precisely enough to make a strong claim about the effects of the treatment you planned. And, because each classroom operates under different conditions (e.g., different groups of students, different expectations), the aspects of the intervention that really mattered in each classroom might not be apparent. An average effect over all classrooms may mask aspects of the intervention that matter in some classrooms but not others.

Despite the challenges outlined above with making causal claims, it remains important for education researchers to pursue a greater understanding of the causes behind effects. As the National Research Council ( 2002 ) says: “An area of research that, for example, does not advance beyond the descriptive phase toward more precise scientific investigation of causal effects and mechanisms for a long period of time is clearly not contributing as much to knowledge as one that builds on prior work and moves toward more complete understanding of the causal structure” (NRC, 2002 , p. 101).

Many of the problems with developing convincing explanations for changes and making causal claims have become more visible as researchers find it difficult to replicate findings (Makel & Plucker, 2014 ; Open Science Collaboration, 2015 ). And if findings cannot be replicated, it is impossible to accumulate knowledge—a hallmark of scientific inquiry (Campbell, 1961 ). Even when efforts are made to implement a particular intervention in another setting with as much fidelity as possible, the findings usually look different. The real challenge is to identify the conditions under which the intervention works as it did.

This leads to a third issue. Be sure to consider the nature of data that will best help you establish connections between interventions and outcomes. Quantitative data often are the data of choice because analyses can be applied to detect the probability the outcomes occurred as a consequence of the intervention. This information is important, but it does not, by itself, explain why the connections exist. Along with Maxwell ( 2004 ), we recommend that qualitative data also play a role in establishing causation. Qualitative data can provide insights into the mechanisms that are responsible for the connections between interventions and outcomes. Identifying mechanisms that explain changes in outcomes is key to making causal claims. Whereas quantitative data are helpful in showing whether an intervention could have caused particular outcomes, qualitative data can explain how or why this could have occurred.

Beyond Causation

Do the challenges of using experiments mean experimental designs should be avoided? No. There are a number of considerations that can make experimental designs informative. Remember that the overriding purpose of research is to understand what you are studying. We equate this to explaining why what you found might look like it does (see Chaps. 1 and 2 ). Experiments that simply compare one treatment with another or with “business as usual” do not help you understand what you are studying because the data do not help you explain why the differences occurred. They do not help you refine your predictions and revise your rationales. However, experiments do not need to be conducted simply to determine the winner of two treatments.

If you are conducting an experiment to increase the accuracy of your predictions and the adequacy of your rationales, your research questions will almost certainly ask about the conditions under which your predicted outcomes will occur. Your predictions will likely focus on the ways in which the outcomes are significantly different from before the intervention to after the intervention, and on how the intervention plus the conditions might explain or have caused these changes. Your experiment will be designed to test the effects of these conditions on the outcomes. Testing conditions is a direct way of trying to understand the reasons for the outcomes, to explain why you found what you did. In fact, understanding under what conditions an intervention works or does not work is the essence of scientific inquiry that follows an experimental design.

By providing as much detail as you can in the hypotheses, by making your predictions as precise as possible, you can set boundaries on how and to what you will generalize your findings. Making hypotheses precise often requires including the conditions under which you believe the intervention might work best, the conditions under which your predictions will be true.

Another way of saying this is that you should subject your hypotheses to severe tests . The more precise your predictions, the more severe your tests. Consider a meteorologist predicting, a month in advance, that it will rain in the State of Delaware in April. This is not a precise hypothesis, so the test is not severe. No one would be surprised if the prediction was true. Suppose she predicts it will rain in the city of Newark, Delaware, during the second week in April. The hypothesis is more precise, the test is more severe, and her colleagues will be a bit more interested in her rationale (why she made the prediction). Now suppose she predicts it will rain on the University of Delaware campus on April 16. This is a very precise prediction, the test would be considered very severe, and lots of people will be interested in understanding her rationale (even before April 16).

In education, making precise predictions about the conditions under which a classroom intervention might cause changes in particular learning outcomes and subjecting your predictions to severe tests often requires gathering lots of data at high levels of detail or small grain sizes. Graham Nuthall (2004, 2005) provides a useful analysis of the challenges involved in designing a study with the grain size of data he believes is essential. Your study will probably not be as ambitious as that described by Nuthall (2005), but the lesson is to think carefully about the grain size of data you need to test your (precise) predictions.

Additional Considerations

Although you can find details about experimental designs in several sources, some issues might not be emphasized in these sources even though they deserve attention.

First, if you are comparing the changes that occurred during your intervention to the changes that occurred during a control condition, your interpretation of the effectiveness of your intervention is only as useful as the quality of the control condition. That is, if the control condition is not expected to produce much change, and if your analyses are designed primarily to show statistical differences in outcomes, then your claim about the better effects of your intervention is not very interesting or educationally important.

Second, the significance in the size of the changes from before to after the intervention are usually reported using values that describe the probability the changes would have occurred by chance (statistical significance). But these values are affected by factors other than the size of the change, such as the size of the sample. Recently, journals have started encouraging or requiring researchers to report the size of the changes in more meaningful ways, both in terms of what the statistical result really means and in terms of the educational importance of the changes. “Effect size” is often used for these purposes. See Bakker et al. ( 2019 ) for further discussion of effect size and related issues.

Third, you should consider what “better performance” means when you compare interventions. Did all the students in the experimental classrooms outperform their peers in the control classrooms, or was the better average performance due to some students performing much better to make up for some students performing worse? Do you want to claim the intervention was effective when some students found it less effective than the control condition?

Fourth, you need to consider how fully you can describe the nature of the intervention. Because you want to explain changes in outcomes by referencing aspects of the intervention, you need to describe the intervention in enough detail to provide meaningful explanations. Describing the intervention means describing how it was implemented, not how it was planned. The degree to which the intervention was implemented as planned is sometimes referred to as fidelity of implementation (O’Donnell, 2008 ). Fidelity of implementation is especially critical when an intervention is implemented by multiple teachers in different contexts.

Based on our experience as an editorial team, there are a few additional considerations you should keep in mind. These considerations concern inadequacies that were often commented on by reviewers, so they are about the research paper and not always about the study itself. But many of them can be traced back to decisions the authors made about their research methods.

Sample is not big enough to conduct the analyses presented. If you are planning to use quantitative methods, we strongly recommend conducting a statistical power analysis. This is a method of determining if your sample is large enough to detect the anticipated effects of an intervention.

Measures used do not appear to assess what the authors claim they assess.

Methods (including coding rubrics) are not described in enough detail. (A good rule of thumb for “enough” is that readers should be able to replicate the study if they wish.)

Methods are different from those expected based on the theoretical framework presented in the paper.

Special Experimental Designs

Three designs that fit under the general category of experiments are specially crafted to examine the possible reasons for changes observed before and after an intervention. Sometimes, these designs are used to explore the conditions under which changes occur before conducting a larger study. These designs are defined somewhat differently by different researchers. Our goal is to introduce the designs but not to settle the differences in the definitions.

Because these designs include features that fall outside the conventional experiment, researchers face some unique challenges both conducting and reporting these studies. One such feature is the repeated implementation of an intervention, with each implementation containing small revisions based on the previous outcomes, in order to improve the intervention during the study. There are no agreed upon practices for reporting these studies. Should every trial and every small change in outcomes and subsequent interventions be reported? Should all the revised versions of the hypotheses that guided the next trial be reported? Keep these challenges in mind as you consider the following designs.

Teaching Experiments

During the 1980s, mathematics educators began focusing closely on how students changed their thinking during instruction (Cobb & Steffe, 1983 ; Steffe & Thompson, 2000 ). The aim was to describe these changes in considerable detail and to explain how the instructional activities prompted them. Teaching experiments were developed as a design to follow changes in students’ thinking as they received small, well-defined episodes of teaching. In some cases, mapping developmental changes in student thinking was of primary interest; instruction was simply used to induce and accelerate these changes.

Most teaching experiments can be described as a sequence of teaching episodes designed for testing hypotheses about how students learn and reason. A premium is placed on getting to know students well, so the number of students is usually small, and the teacher is the researcher. Predictions are made before each episode about how students’ (often each student’s) thinking will change based on the features of the teaching activity. Data are gathered at a small grain size to test the predictions and revise the hypotheses for the next episode. Until they gain the insights they intend, researchers often continue the following cycles of activities: teaching to test hypotheses, collecting data, analyzing data to compare with predictions, revising predictions and rationales, teaching to test the revised hypotheses, and so on.

Design-Based Research

Following the introduction of teaching experiments, the concept was elaborated and expanded into an approach called design-based research (Akker et al., 2006 ; Cobb et al., 2017 ; Collins, 1992 ; Design-Based Research Collaborative, 2003 ; Puntambekar, 2018 ). There are many forms of this research design but most of them are tailored to developing topic-specific instructional theories that can be shared with teachers and educational designers.

Like teaching experiments, design-based research consists of continuous cycles of formulating hypotheses that connect instructional activities with changes in learning, designing the learning environment to test the hypotheses, implementing instruction, gathering and analyzing data on changes in learning, and revising the hypotheses. The grain size of data matches the needs of teachers to make day-to-day instructional decisions. Often, this research is carried out through researcher–teacher partnerships, with researchers focused on developing theories (systematic explanations for changes in students’ learning) and teachers focused on implementing and testing theories. In addition, unlike many teaching experiments, design-based research has the design of instructional products as one of its goals.

These designs initially aimed to develop full explanations or theories of the learning processes through which students developed understanding for a topic complemented with theories of instructional activities that support such processes. The design was quickly expanded to study learning situations of all kinds, including, for example, teacher professional development (Gravemeijer & van Eerde, 2009 ).

Other forms of design-based research have also emerged, each with the same basic principles but with different emphases. For example, “Design-Based Implementation Research” (Fishman & Penuel, 2018) focuses on improving the implementation of promising instructional approaches for meeting the needs of diverse students in diverse classrooms. Researcher–teacher partnerships produce adaptations that are scalable and sustainable through cycles of formulating, testing, and revising hypotheses.

Continuous Improvement Research

An approach to research that shares features with design-based research but focuses more directly on improving professional practices is often called either continuous improvement, improvement science, or implementation science. This approach has shown considerable promise outside of education in fields such as medicine and industry and could be adapted to educational settings (Bryk et al., 2015 ; Morris & Hiebert, 2011). A special issue of the American Psychologist in 2020 explored the possibilities of implementation science to address the challenge posed in its first sentence, “Reducing the gap between science and practice is the great challenge of our time” (Stirman & Beidas, 2020 , p. 1033).

The cycles of formulating, testing, and revising hypotheses in the continuous improvement model are characterized by four features (Morris & Hiebert, 2011). First, the research problems are drawn from practice because the aim is to improve these practices. Second, the outcome is a concrete product that holds the knowledge gained from the research. For example, an annotated lesson plan could serve as a product of research directed toward improving instructional practice of a particular concept or skill. Third, the interventions test a series of small changes to the product, each built on the previous version, by collecting just enough data to tell whether the change was an improvement. Finally, the research process involves the users as well as the researchers. If the goal is to improve practice, practitioners must be an integral part of the process.

Shared Goals of Useful Education Experiments

All experimental designs that we recommend have two things in common. One is they try to change something and then study the possible mechanisms for the change and the conditions under which the change occurred. Experimental designs that study the reasons and conditions for a change offer greater understanding of the phenomena they are studying. The noted sociologist Kurt Lewin said, “If you want truly to understand something, try to change it” (quoted in Tolman et al., 1996 , p. 31). Recall that understanding phenomena was one of the basic descriptors of scientific inquiry we introduced in Chap. 1 .

In our view, a second feature of useful experiments in education is that they formulate, test, and revise hypotheses at a grain size that matches the needs of educators to make decisions that improve the learning opportunities for all students. Often, research questions that motivate useful experiments address instructional problems that teachers face in their classrooms. We will return to these two features in Chap. 5 .

Correlation

Correlational designs investigate and explain the relationship between two or more variables. Researchers who use this design might ask questions like Martha’s: “What is the relationship between how well teachers analyze videos of teaching and how conceptually they teach?”

Notice the difference between this research question and the earlier one posed for an experimental design (“Will professional development that engages teachers in analyzing videos of teaching help them teach more conceptually? If so, under what conditions does this occur?”). In the experimental case, researchers hypothesized that analyzing videos of teaching would cause more conceptual teaching; in the correlational case they are acknowledging they are not ready to make this prediction. However, they believe there is a sufficiently strong rationale (theoretical framework) to predict a relationship between the two. In other words, although predicting that one event causes another cannot be justified, a rationale can be developed for predicting a relationship between the events.

Correlations in Education Are Rarely Simple

When two or more events appear related, the explanation might be quite complicated. It might be that one event causes another, but there are many more possibilities. Recall Martha’s research question: “What are the relationships between learning to analyze videos of teaching in particular ways (specified from prior research) and teaching for conceptual understanding?” Her research question fits a correlational design because she could not develop a clear rationale explaining why one event (learning to analyze videos) should cause changes in another (changes in teaching conceptually).

Martha could imagine three reasons for a relationship: (1) an underlying factor could be responsible for both events varying together (maybe developing more pedagogical content knowledge is the underlying factor that enables teachers to both analyze videos more insightfully and teach more conceptually); (2) there could be a causal relation but in the reverse direction (maybe teachers who already teach quite conceptually build on students’ thinking, which then helps them analyze videos of teaching in particular ways); or (3) analyzing videos well could lead to more conceptual teaching but through a complicated path (maybe analyzing video helps focus teachers’ attention on key learning moments during a lesson which, in turn, helps them plan lessons with these moments in mind which, in turn, shifts their emphasis to engaging students in these moments which, in turn, results in more conceptual instruction).

Simple correlational designs involve investigating and explaining relationships between just two variables. But simple correlations can get complicated quickly. Researchers might, for example, hypothesize the relationship exists only under particular conditions—when other factors are controlled. In these situations, researchers often remove the effect of these variables and investigate the “partial correlations” between the two variables of primary interest. Many sophisticated statistical techniques have been developed for investigating more complicated relationships between multiple variables (e.g., exploratory and confirmatory factor analysis, Gorsuch, 2014 ).

Correlational Designs We Recommend

The correlational designs we recommend are those that involve collecting data to test your predictions about the extent of the relationship between two (or more) variables and assess how well your rationales (theoretical framework) explain why these relationships exist. By predicting the extent of the relationships and formulating rationales for the degree of the relationships, the findings will help you adjust your predictions and revise your rationales.

Because correlations often involve multiple variables, your rationales might have proposed which variables are most important for, or best explain, the relationship. The findings could help you revise your thinking about the roles of different variables in determining the observed relationship.

For example, analyzing videos insightfully could be unpacked into separate variables, such as the nature of the video, the aspects of the video that could be attended to, and the knowledge needed to comment on each aspect. Teaching conceptually could also be unpacked into many individual variables. To explain or understand the predicted relationship, you would need to study which variables are most responsible for the relationship.

Some researchers suggest that correlational designs precede experimental designs (Sloane, 2008 ). The logic is that correlational research can document that relationships exist and can reveal the key variables. This information can enable the development of rationales for why changes in one construct or variable might cause changes in another construct or variable.

Description

In some ways, descriptions are the most basic design. They are tailored to describe a phenomenon and then explain why it exists as it does. If the research questions ask about the status of a situation or about the nature of a phenomenon and there is no interest, at the moment, in trying to change something or to relate one thing with another, then a descriptive design is appropriate. For example, researchers might be interested in describing the ways in which teachers analyze video clips of classroom instruction or in describing the nature of conceptual teaching in a particular school district.

In this type of research, researchers would predict what they expect to find, and rationales would explain why these findings are expected. As an example, consider the case above of researchers describing the ways teachers analyze video clips of classroom instruction. If Martha had access to such a description and an explanation for why teachers analyzed videos in this way, she could have used this information to formulate her hypotheses regarding the relationship between analysis of videos and conceptual teaching (see Chap. 3 ). Based on the literature describing what teachers notice when observing classroom instruction (e.g., Sherin et al., 2001 ) and on the researchers’ experience working with teachers to explain why they notice particular features, researchers might predict that many teachers will focus more on specific pedagogical skills of the teacher, such as classroom management and organization, and less on the nature of the content being discussed and the strategies students use to solve problems. If these predictions are partially confirmed, the predictions and their rationales would support the rationale for Martha’s hypothesis of a growing relationship between analyzing videos and conceptual teaching as teachers move from focusing on pedagogical skills to focusing on the way in which students are interacting with the content.

In some research programs, descriptive studies logically precede correlation studies (Sloane, 2008 ). Until researchers know they can describe, say, conceptual teaching, there is no point in asking how such teaching relates to other variables (e.g., analyzing videos of teaching) or how to improve the level of conceptual teaching.

As with other designs, there are several types of descriptive studies. We encourage you to read more about the details of each (in, e.g., Miles et al., 2014 ; de Freitas et al., 2017 ).

A case study is usually defined as the in-depth study of a particular instance or of a single unit or case. The instance must be identifiable with clear boundaries and must be sufficiently meaningful to warrant detailed observation, data collection, and analysis. At the outset, you need to describe what the case is a case of. The goal is to understand the case—how it works, what it means, why it looks like it does—within the context in which it functions. To describe conceptual teaching more fully, for example, researchers might investigate a case of one teacher teaching several lessons conceptually.

Some researchers use a case study to show something exists . For example, suppose a researcher notices that students change the way they think about two-dimensional geometric figures after studying three-dimensional objects. The researcher might propose a concept of backward transfer (Hohensee, 2014 ) and design a case study with a small group of students and a targeted set of instructional activities to study this phenomenon in detail. The goal is to determine whether this effect exists and to explain its existence by identifying some of the conditions under which it occurs. Notice that this example also could be considered a “teaching experiment.” There are overlaps between some designs and boundaries between them are not always clear.

Ethnography

The term “ethnography” often is used to name a variety of research approaches that provide detailed and comprehensive accounts of educational phenomena. The approaches include participant observation, fieldwork, and even case studies. For a useful example, see Weisner et al. ( 2001 ). See the following for further descriptions of ethnographic research from various perspectives (Atkinson et al., 2007 ; Denzin & Lincoln, 2017 ).

Survey designs are used to gather information from groups of participants, often large groups that fit specific criteria (e.g., fourth-grade teachers in Delaware), to learn about their characteristics, opinions, attitudes, and so on. Usually, surveys are conducted by administering a questionnaire, either written or oral. The responses to the questions form the data for the study. See Wolf et al. ( 2016 ) for more complete descriptions of survey methodology.

Like for previous designs, we recommend that each of these designs be used to test predictions about what will be found and assess the soundness of the rationales for these predictions. In all these settings, the goal remains to understand and explain what you are studying.

Developing Measures and Procedures for Gathering Data

This a critical phase of crafting your methods because your study is only as good as the quality of the data you gather. And, the quality of data is determined by the measures you use. “Measures” means tests, questionnaires, observation instruments, and anything else that generates data. The research methods textbooks and other resources we cited above include lots of detail about this phase. However, we will note a few issues that journal reviewers often raise and that we have found are problematic for beginning researchers.

Craft Measures That Produce Data at an Appropriate Grain Size

A critical step in the scientific inquiry process is comparing the results you find with those you predicted based on your rationales. Thinking ahead about this part of the process (see Chap. 3 ) helps you see that, for this comparison to be useful for revising your hypotheses, the predictions you make must be at the same level of detail, or grain size, as the results. If your predictions are at too general of a level, you will not be able to make this comparison in a meaningful way. After making predictions, you must craft measures that generate data at the same grain size as your predictions.

To illustrate, we return to Martha, the doctoral student investigating “What are the relationships between learning to analyze videos of teaching in particular ways (specified from prior research) and teaching for conceptual understanding?” In Chap. 3 , one of Martha’s predictions was: “Of the video analysis skills that will be assessed, the two that will show the strongest relationship are spontaneously describing (1) the mathematics that students are struggling with and (2) useful suggestions for how to improve the conceptual learning opportunities for students.” To test this prediction, Martha will need to craft measures that assess separately different kinds of responses when analyzing the videos. Notice that in her case, the predictions are precise enough to specify the nature and grain size of the data that must be collected (i.e., the measures must yield information on the teachers’ spontaneous descriptions of the mathematics that students are struggling with plus their suggestions for how to improve conceptual learning opportunities for students).

Develop Your Own Measures or Borrow from Others?

When crafting the measures for gathering data, weigh carefully the benefits and costs of designing your own measures versus using measures designed and already used by other researchers.

The benefits of developing your own measures come mostly from targeting your measures to assess exactly what you need so you can test your predictions. Sometimes, creating your own measures is critical for the success of your study.

The picture has the text Weigh carefully the benefits and costs of designing your own measures versus using measures designed and already used by other researchers; written.

However, there also are costs to consider. One is convincing others that your measures are both reliable and valid. In general, reliability of a measure refers to how consistently it will yield the same outcomes; validity means how accurately the measure assesses what you say you are measuring (see Gournelos et al., 2019 ). Establishing reliability and validity for new measures can be challenging and expensive in terms of time and resources.

A second cost of creating your own measures is not being able to compare your data to those of other researchers who have studied similar phenomenon. Knowledge accumulates as researchers build on the work of others and extend and refine hypotheses. This is partially enabled by comparing results across different studies that have addressed similar research questions. When you formulate hypotheses that extend previous research, it is often natural (and even obvious) to borrow measures that were used in previous studies. Consider Martha’s predictions described in Chap. 3 , one of which is presented above. Because the prediction builds directly on previous work, testing the predictions would almost require Martha to use the same measures used previously.

If you find it necessary to design your own measures, you should ask yourself whether you are reaching too far beyond previous work. Maybe you could tie your work more closely to past research by tweaking your research questions and hypotheses so existing, validated measures are what you need to test your predictions. In other words, use the time when you are crafting measures as a chance to ask whether you are extending previous research in the most productive way. If you decide to keep your original research questions and design new measures, we recommend considering a combination of previously validated measures and your own custom-made measures.

Whichever approach you choose, be sure to describe your measures in enough detail that others can use them if they are studying related phenomenon or if they would like to replicate your study. Also, if you use measures developed by others be sure to credit them.

Using Data that Already Exist

Most educational researchers collect their own data as part of the study. We have written the previous sections assuming this is the case. Is it possible to conduct an important study using data that have been collected by someone else? Yes. But we suggest you consider the following issues if you are planning a study using an existing set of data.

First, we recommend that your study begin with a hypothesis or research question, just like for a study in which you collect your own data. A common warning about choosing research methods is that you should not choose a method (e.g., hierarchical linear modeling) and then look for a research question. Your hypotheses, or research questions, should drive everything else. Similarly for choosing data to analyze. The data should be chosen because they are the best data to test your hypothesis, not because they exist.

Of course, you might be familiar with a data set and wonder what it would tell you about a particular research problem. Even in this case, however, you should formulate a hypothesis that is important on its own merits. It is easy to tell whether this is true by sharing your hypothesis with colleagues who are not aware of the existing data set and asking them to comment on the value of testing the hypothesis. Would a tested and revised hypothesis make a contribution to the field?

The picture has text you should not choose a method and then look for a research question. Your hypotheses, or research questions, should drive everything else; written.

A second issue to consider when using existing data is the alignment of the methods used to collect the data and your theoretical framework. Although you didn’t choose the methods, you need to be familiar with the methods that were used and be able to justify the appropriateness of the methods, just as you would with methods you craft. Justifying the appropriateness of methods is another way of saying you need to convince others you are using the best data possible to test your hypotheses. As you read the remaining sections of this chapter, think about what you would need to do if you use existing data. Could you satisfy the same expectations as researchers who are collecting their own data?

Exercise 4.2

There are several large data sets that are available to researchers for secondary analyses, including data from the National Assessment of Educational Progress (NAEP), the Programme for International Student Assessment (PISA), and the Trends in International Mathematics and Science Study (TIMSS). Locate a published empirical study that uses an existing data set and clearly states explicit hypotheses or research questions. How do the authors justify their use of the existing data set to address their hypotheses or research questions? What advantages do you think the authors gained by choosing to use existing data? What constraints do you think that choice placed on them?

Choosing Methods to Analyze Data and Compare with Predictions

As with the first two phases of crafting your methods, there are a number of sources that describe issues to think about when putting together your data analysis strategies (e.g., de Freitas et al., 2017 ; Sloane & Wilkins, 2017 ). Beyond what you will read in these sources, or to emphasize some things you might read, we identify a few issues that you should attend to with extra care.

Create Coding Rubrics

Frequently, research in education involves collecting data in the form of interview responses by participants (students, teachers, teacher educators, etc.) or written responses to tasks, problems, or questionnaires, as well as in other forms that researchers must interpret before conducting analyses. This interpretation process is often referred to as coding data, and coding requires developing a rubric that describes, in detail, how the responses will be coded.

There are two main reasons to create a rubric. First, you must code responses that have the same meaning in the same way. This is sometimes called intracoder reliability : an individual coder is coding similar responses consistently. Second, you must communicate to readers and other researchers exactly how you coded the responses. This helps them interpret your data and make their own decisions about whether your claims are warranted. Recall from Chap. 1 an implication of the third descriptor of scientific inquiry which pointed to the public nature of research: “It is a public practice that occurs in the open and is available for others to see and learn from.”

As you code, you will almost always realize that the initial definitions you created for your codes are insufficient to make borderline judgments, and you will need to revise and elaborate the coding rubric. For example, you might decide to split a code into several codes because you realize that the responses you were coding as similar are not as similar as you initially thought. Or you might decide to combine codes that at first seemed to describe different kinds of responses but you now realize are too hard to distinguish reliably. This process helps you clarify for yourself exactly what your codes mean and what the data are telling you.

Determine Intercoder Reliability

In addition to ensuring that you are coding consistently with yourself, you must make sure others would code the same way if they followed your rubric. Determining intercoder reliability involves training someone else to use your rubric to code the same responses and then comparing codes for agreement. There are several ways to calculate intercoder reliability (see, e.g., Stemler, 2004 ).

There are two main reasons to determine intercoder reliability. First, it is important to convince readers that the rubric holds all the information you used to code the responses. It is easy to use lots of implicit knowledge to code responses, especially if you are familiar with the data (e.g., if you conducted the interviews). Using implicit knowledge to code responses hides from others why you are coding responses as you are. This creates bias that interferes with the principles of scientific inquiry (being open and transparent). Establishing acceptable levels of intercoder reliability shows others that the knowledge made explicit in the rubric is all that was needed to code the responses.

A second reason to determine intercoder reliability is that doing so improves the completeness and specificity of the definitions for the codes. As you compare your coding with that of another coder, you will realize that your definitions were not as clear as you thought. You can learn what needs to be added or revised so the definition is clearer; sometimes this includes examples to help clarify the boundary between one code and another. As you reach sufficient levels of agreement, your rubric will reach its final version. This is the version that you will likely include as an appendix in a written report of your study. It tells the reader what each code means.

Beyond the Three Phases

We have discussed three phases of crafting methods (choosing the design of your study, developing the measures and procedures you need to gather the data, and selecting the analysis procedures to compare your findings with your predictions). There are some issues that cut across all three phases. You will read about some of these in the sources we suggested, but several could benefit from special attention.

Quantitative and Qualitative Data

For some time, educators have debated the value of quantitative versus qualitative data (Hart et al., 2008 ). As the labels suggest, quantitative data refers to data that can be expressed with numbers (frequencies, amounts, etc.). Most of the common statistical analyses require quantitative data. Qualitative data are not automatically transformed into numbers. Coding of qualitative data, as described above, can produce numbers (e.g., frequencies) but the data themselves are often words—written or spoken. Corresponding to these two forms of data, some types of research are referred to as quantitative research and some types as qualitative. As an easy reference point, experimental and correlational designs often foreground quantitative data and descriptive designs often foreground qualitative data. We recommend keeping several things in mind when reading about these two types of research.

First, it is best not to begin developing a study by saying you want to do a quantitative study or a qualitative study. We recommend, as we did earlier, that you begin with questions or hypotheses that are of most interest and then decide whether the methods that will best test your predictions require collecting quantitative or qualitative data.

Second, many hypotheses in education are best examined using both kinds of data. You are not limited to using one or the other. Often, studies that use both are referred to as mixed methods studies. Our guess is that if you are investigating an important hypothesis, your study could take advantage of, and benefit from, mixed methods (Hay, 2016 ; Weis et al. 2019a ). As we noted earlier, different methods offer different perspectives so multiple methods are more likely to tell a more complete story (Sechrest et al., 1993 ). Some useful resources for reading about quantitative, qualitative, and mixed methods are Miles et al. ( 2014 ); de Freitas et al. ( 2017 ); Weis et al. ( 2019b ); Small ( 2011 ); and Sloane and Wilkins ( 2017 ).

Defining a Unit of Analysis

The unit of analysis in your study is the “who” or the “what” that you are analyzing and want to make claims about. There are several ways in which this term is used. Your unit of analysis could be an individual student, a group of students, an individual task, a classroom, and so forth. It is important to understand that, in these cases, your unit of analysis might not be the same as your unit of observation. For example, you might gather data about individual students (unit of observation) but then compare the averages among groups of students, say in classrooms or schools (unit of analysis).

Unit of analysis can also refer to what is coded when you analyze qualitative data. For example, when analyzing the transcript of an interview or a classroom lesson, you might want to break up the transcript into segments that focus on different topics, into turns that each speaker takes, into sentences or utterances, or into other chunks. Again, the unit of analysis might not be the same as your unit of observation (the unit in which your findings are presented).

We recommend keeping two things in mind when you consider the unit of analysis. First, it is not uncommon to use more than one unit of analysis in a study. For example, when conducting a textbook analysis, you might use “page” as a unit of analysis (i.e., you treat each page as a single, separate object to examine), and you might also use “instructional task” as a unit of analysis (i.e., you treat each instructional task as a single object to examine, whether it takes up less than one page or many pages). Second, when the data collected have a nested nature (e.g., students nested in classrooms nested in schools), it is necessary to determine what is the most appropriate unit of analysis. Readers can refer to Sloane and Wilkins ( 2017 ) for a more detailed discussion of such analyses.

Ensuring Your Methods Are Fair to All Students

Regardless of which methods you use, remember they need to help you fulfill the purpose of your study. Suppose, as we suggested in earlier chapters, the purpose furthers the goal of understanding how educators can improve the learning opportunities for all students. It is worth thinking, separately, about whether the methods you are using are fully inclusive and are not (unintentionally) leading you to draw conclusions that systematically ignore groups of students with specific characteristics—race, ethnicity, gender, sexual orientation, and special education needs.

For example, if you want to investigate the correlation between students’ participation in class and their sense of efficacy for the subject, you need to include students at different levels of achievement, with different demographics, with different entry efficacy levels, and so on. Your hypotheses should be perfectly clear about which variables that might influence this correlation are being included in your design. This issue is also directly related to our concern about generalizability: it would be inappropriate to generalize to populations or conditions that you have not accounted for in your study.

Researchers in education and psychology have also considered methodological approaches to ensure that research does not unfairly marginalize groups of students. For example, researchers have made use of back translation to ensure the translation equivalency of measures when a study involves students using different languages. Jonson and Geisinger ( 2022 ) and Zieky ( 2013 ) discuss ways to help ensure the fairness of educational assessments.

Part III. Crafting the Most Appropriate Methods

With the background we developed in Part III, we can now consider how to craft the methods you will use. In Chap. 3 , we discussed how the theoretical framework you create does lots of work for you: (1) it helps you refine your predictions and backs them up with sound reasons or explanations; (2) it provides the parameters within which you craft your methods by providing clear rationales for some methods but not others; (3) it ensures that you can interpret your results appropriately by comparing them with your predictions; and, (4) it describes how your results connect with the prior research you used to build the rationales for your hypotheses. In this part of Chap. 4 , we will explore the ways in which your theoretical framework guides, and even determines, the methods you craft for your study.

In Chap. 3 , we described a cyclical process that produced the theoretical framework: asking questions, articulating predictions, developing rationales, imagining testing predictions, revising questions, adjusting rationales, revising predictions, and so on, and so on. We now extend this process beyond imagining how you could test your predictions.

The best way to craft appropriate methods that you will use is to try them out. Instead of only imagining how you could test your predictions, the cyclical process we described in Chap. 3 will be extended to trying out the methods you think you will use. This means trying out the measures you plan to use, the coding rubric (if you are coding data), the ways in which you will collect data, and how you will analyze data. By “try out” we mean a range of activities.

Write Out Your Methods

The first way you should try out your methods is by writing them out for yourself ( actually writing them out ) and then asking yourself two main questions. First, do the reasons or rationales in the theoretical framework point to using these specific measures, this coding rubric, and so forth? In other words, would anyone who reads your theoretical framework be the least bit surprised that you plan to use these methods? They should not be. In fact, you would expect anyone who read your theoretical framework to choose from the same set of reasonable, appropriate methods. If you plan to use methods for reasons other than those you find in your theoretical framework (perhaps because the framework is silent about this part of your study) or if you are using methods that are different from what would be expected, you probably need to either revise your framework (maybe to fill in some gaps or revise the arguments you make) or change your methods.

A second question to ask yourself after you have written a description of your methods is: “Can I imagine using these methods to generate data I could compare with my predictions?” Are the grain sizes similar? Can you plan how you will compare the data with the predictions? If you are unsure about this, you should consider changing your predictions (and your hypotheses and theoretical rationales) or changing your methods.

As described in Chap. 3 , your writing will serve two purposes. It will help you think through and reflect on your methods, trying them out in your head. And it will also constitute another part of your evolving research paper that you create while you are designing, conducting, and then documenting your research study. Writing is a powerful tool for thinking as well as the most common form of communicating your work to others. So, the writing you do here is not just scratch work that you will discard. It should be a draft for what will become your final research paper. Treat it seriously. That said, it is still just a draft; do not take it so seriously that you find yourself stuck and unable to put words to paper because you are not certain what you are writing is good enough.

The second way you can try out your methods is to solicit feedback and advice from other people. Scientific inquiry is not only an individual process but a social process as well (recall again the third descriptor of scientific inquiry in Chap. 1 ). Doing good scientific inquiry requires the assistance of others. It is impossible to see everything you will need to think about by yourself; you need to present your ideas and get feedback from others. Here are several things to try.

First, if you are a doctoral student, describe your planned methods to your advisor. That is probably already your go-to strategy. If you are a beginning professor, you can seek advice from former and current colleagues.

Second, try out your ideas by making a more formal presentation to an audience of friendly critics (e.g., colleagues). Perhaps you can invite colleagues to a special “seminar” in which you present your study (without the results). Ask for suggestions, maybe about specific issues you are struggling with and about any aspects of your study that could be clarified and even revised. You do not need to have the details of your methods worked out before showing your preliminary plans to your colleagues. If your research questions and initial predictions are clear, getting feedback on your preliminary plans (design, measures, and data analysis) can be very helpful and can prevent wasting time on things you will end up needing to change. We recommend getting feedback earlier rather than later and getting feedback in multiple settings multiple times.

Finally, regardless of your current professional situation, we encourage you to join, or create, a community of learners who interact regularly. Such communities are not only intellectually stimulating but socially supportive.

Exercise 4.3

Ask a few colleagues to spend 45–60 min with you. Present your study as you have imagined it to this point (20 min): Research questions, predictions about the answers, rationales for your predictions (i.e., your theoretical framework), and methods you will use to test your predictions (design, measures, data collection, and data analysis to check your predictions). Ask for their feedback (especially about the methods you will use, but also about any aspect of the planned study). Presenting all this information is challenging but is good practice for thinking about the most critical pieces of your plan and your reasons for them. Use the feedback to revise your plan.

Conduct Pilot Studies

The value of conducting small, repeated, pilot studies cannot be overstated. It is hugely undervalued in most discussions of crafting methods for research studies. Conducting pilot studies is well worth the time and effort. It is probably the best way to try out the methods you think will work.

The picture has text conducting pilot studies is probably the best way to try out the methods you think will work; written.

Pilot studies can be quite small, both in terms of time spent and number of participants. You can keep pilot studies small by using a very small sample of participants or a small sample of your measures. The sample of participants can be participants who are easy to find. Just try to select a small sample that represents the larger sample you plan to use. Then, see if the data you collect are like those you expected and if these data will test your predictions in the way you hoped. If not, you are likely to find that your methods are not aligned well enough with your theoretical framework. Even one pilot study can be very useful and save you tons of time; several follow-up pilots are even better because you can check whether your revisions solved the problem. Do not think of pilot studies as speed bumps that slow your progress but rather as course corrections that help you stay aimed squarely at your goal and save you time in the long run.

Small pilot studies can be conducted for various purposes. Here are a few.

Help Specify Your Predictions

Pilot studies can help you specify your predictions. Sometimes it might be difficult to anticipate the answers to your research questions. Rather than conducting a complete study with little idea of what will happen, it is much more productive to do some preliminary work to help you formulate predictions. If you conduct your study without doing this, you are likely to realize too late that your study could have been much more informative if you used a different sample of participants, if you asked different or additional questions during your interviews, if you used different measures (or tasks) to gather the data, if your data looked different so you could have used different analyses, and so forth.

In our view, this is an especially important use of pilot studies because it is our response to the argument we rebutted earlier that asserted research can be productive even if researchers have no idea what to expect and cannot make testable predictions. Throughout this book, we have argued that scientific inquiry requires predictions and rationales, regardless how weak or uncertain. We have claimed that, if the research is worth doing, it is possible and productive to make predictions. It is hard for us to imagine conducting research that builds on past work yet having no idea what to expect. If a researcher is charting new territory, then pilot studies are essential. Conducting one or more small pilot studies will provide some initial guesses and should trigger some ideas for why these guesses will be correct. As we noted earlier, however, we do not recommend beginning researchers chart completely new territory.

Improve Your Predictions

Even if you have some predictions, conducting a pilot study or two will tell you whether you are close. The more accurate you are with your predictions for the main study, the more precisely you can revise your predictions after the study and formulate very good explanations for why these new predictions should be accurate.

Refine Your Measures

Pilot studies can be very useful for making sure your measures will produce the kinds of data you need. For example, if your study includes participants who are asked to complete tasks of various kinds, you need to make sure the tasks generate the information you need.

Suppose you ask whether second graders improve their understanding of place value after an instructional intervention. You need to use tasks that help you interpret how well they understand place value before and after the intervention. You might ask two second graders and two third graders to complete your tasks to see if they generate the expected variation in performance and whether this variation can be tied to inferred levels of understanding. Also, ask a few colleagues to interpret the responses and check if they match with your interpretations.

Suppose you want to know whether middle school teachers interact differently with boys and girls about the most challenging problems during math class. Find a lesson or two in the curriculum that includes challenging problems and sit in on these lessons in several teachers’ classrooms. Test whether your observation instrument captures the differences that you think you notice.

Test Your Analytic Procedures

You can use small pilot studies to check if your data analysis procedures will work. This can be extremely useful if your procedures are more than simple quantitative comparisons such as t tests. Suppose you will conduct interviews with teachers and code their responses for particular features or patterns. Conducting two or three interviews and coding them can tell you quickly whether your coding rubric will work. Even more important, coding the interviews will tell you whether the interview questions are the right ones or whether they need to be revised to produce the data you need.

Other Purposes of Pilot Studies

In addition to the purposes we identified above, pilot studies can tell you whether the sample you identified will give you the information you need, whether your measures can be administered in the time you allocated, and whether other details of your data collection and analysis plans work as you hope. In summary, pilot studies allow you to rehearse your methods so you can be sure they will provide a strong test of your predictions.

After you conduct a pilot study, make the revisions needed to the framework or to the methods to ensure you will gather more informative data. Be sure to update your evolving research paper to reflect these changes. Each draft of this paper should be the draft which matches your current reasoning and decisions regarding your study.

The picture has text pilot studies that allow you to rehearse your methods so you can be sure they will provide a strong test of your predictions; written.

Part IV. Writing Your Evolving Research Paper and Revisiting Alignment

We continue here to elaborate our recommendation that you compose drafts of your evolving research paper as you make decisions along the way. It is worth describing several advantages in writing the paper and planning the study in parallel.

Advantages of Writing Your Research Paper While Planning Your Study

One of the major challenges researchers face as they plan and conduct research studies is aligning all parts of the study with a visible and tight logic tying all the parts together. You will find that as you make decisions about your study and write about these decisions, you are faced with this alignment challenge in both settings. Working out the alignment in one setting will help in the other. They reinforce each other. For example, as you write a record of your decisions while you plan your study, you might notice a gap in your logic. You can then fill in the gap, both in the paper and in the plans for the study.

As we have argued, writing is a useful tool for thinking. Writing out your questions and your predictions of the answers helps you decide if the questions are the ones you really want to ask and if your predictions are testable; writing out your rationales for your predictions helps you decide if you have sound reasons for your predictions, and if your theoretical framework is complete and convincing; writing out your theoretical rationales also helps you decide which methods will provide a strong test of your predictions.

Your evolving research paper will become the paper you will use to communicate your study to others. Writing drafts as you make decisions about how to conduct your study and why to conduct it as you did will prevent you from needing to reconstruct the logic you used as you planned each successive phase of your study. In addition, composing the paper as you go ensures that you consider the logic connecting each step to the next one. One of the major complaints reviewers are likely to have is that there is a lack of alignment. By following the processes we have described, you have no choice but to find, in the end, that all parts of the study are connected by an obvious logic.

We noted in Chap. 3 that writing your evolving research paper along with planning and conducting your study does not mean creating a chronology of all the decisions you made along the way. At each point in the process, you should step back and think about how to describe your work in the easiest-to-follow and clearest way for the reader. Usually, readers want to know only about your final decisions and, in many cases, your reasons for making these decisions.

Journal Reviewers’ Common Concerns

The concerns of reviewers provide useful guides for where you need to be especially careful to conduct a well-argued and well-designed study and to write a coherent paper reporting the study. As the editorial team for JRME , we found that one of the most frequent concerns raised by reviewers was that the research questions were not well connected to other parts of the paper. Of all manuscripts sent out for review, nearly 30% of the reviewers expressed concern that the paper was not coherent because parts of the paper were not connected back to the research questions. This could mean, for example, reviewers were not clear why or how the methods crafted for the study were appropriate to test the hypotheses or to answer the questions. The lack of clear connections could be due to either choices made planning and implementing the study or writing the research paper. Sometimes the connections exist but have been left implicit in the research report or even in the conceptualization of the study. Conceptualizing a study and writing the research report require making all the connections explicit. As noted above, these disconnects are less likely if you are composing the evolving research paper simultaneously with planning and implementing the study.

A further concern raised by many reviewers speaks to alignment and coherence: One or more of the research questions were not answered fully by the study. Although we will deal with this concern further in the next chapter, we believe it is relevant for the choice of methods because if you do not ensure that the methods are appropriate to answer your research questions (i.e., to test your hypotheses), it is likely they will not generate the data you need to answer your questions. In contrast, if you have aligned all parts of your study, you are likely to collect the data you need to answer your questions (i.e., to test and revise your hypotheses).

In summary, there are many reasons to compose your evolving research paper along with planning and conducting your study. As we have noted several times, your paper will not be a chronology of all the back-and-forth cycles you used to refine aspects of your study as you moved to the next phase, but it will be a faithful description of the ultimate decisions you made and your reasons for making them. Consequently, your evolving research paper will gradually build as you describe the following parts and explain the logic connecting them: (1) the purpose of your study, (2) your theoretical framework (i.e., the rationales for your predictions woven into a coherent argument), (3) your research questions plus predictions of the answers (generated directly from your theoretical rationales), (4) the methods you used to test your predictions, (5) the presentation of results, and (6) your interpretation of results (i.e., comparison of predicted results with the results reported plus proposed revisions to hypotheses). We will continue the story by addressing parts 5 and 6 in Chap. 5 .

Akker, J., Gravemeijer, K., McKenney, S., & Nieveen, N. (Eds.). (2006). Educational design research . Routledge.

Google Scholar  

Atkinson, P., Coffey, A., Delamont, S., Lofland, J., & Lofland, L. (2007). Handbook of ethnography . SAGE.

Book   Google Scholar  

Bakker, A., Cai, J., English, L., Kaiser, G., Mesa, V., & Van Dooren, W. (2019). Beyond small, medium, or large: Points of consideration when interpreting effect sizes. Educational Studies in Mathematics, 102 , 1–8.

Article   Google Scholar  

Bryk, A. S., Gomez, L. M., Grunow, A., & LeMahieu, P. G. (2015). Learning to improve: How America’s schools can get better at getting better . Harvard University Press.

Campbell, D. T. (1961). The mutual methodological relevance of anthropology and psychology. In F. L. K. Hsu (Ed.), Psychological anthropology: Approaches to culture and personality (pp. 333–352). Dorsey Press.

Campbell, D. T., Stanley, J. C., & Gage, N. L. (1963). Experimental and quasi-experimental designs for research . Houghton.

Cobb, P., & Steffe, L. P. (1983). The constructivist researcher as teacher and model builder. Journal for Research in Mathematics Education, 14 (2), 83–94.

Cobb, P., Jackson, K., & Sharpe, C. D. (2017). Conducting design studies to investigate and support mathematics students’ and teachers’ learning. In J. Cai (Ed.), Compendium for research in mathematics education (pp. 208–233). National Council of Teachers of Mathematics.

Collins, A. (1992). Toward a design science of education. In E. Scanlon & T. O’Shea (Eds.), New directions in educational technology . Springer.

Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental designs for generalized causal inference . Houghton Mifflin.

de Freitas, E., Lerman, S., & Parks, A. N. (2017). Qualitative methods. In J. Cai (Ed.), Compendium for research in mathematics education (pp. 159–182). NCTM.

Denzin, N. K., & Lincoln, Y. S. (Eds.). (2017). The SAGE handbook of qualitative research . SAGE.

Design-Based Research Collaborative. (2003). Design-based research: An emerging paradigm for educational inquiry. Educational Researcher, 32 (1), 5–8.

Gall, M. D., Gall, J. P., & Borg, W. R. (2007). Educational research: An introduction (8th ed.). Pearson.

Gopalan, M., Rosinger, K., & Ahn, J. B. (2020). Use of quasi-experimental research designs in education research: Growth, promise, and challenges. Review of Research in Education, 44 (1), 218–243. https://doi.org/10.3102/0091732X20903302

Gorsuch, R. L. (2014). Factor analysis (Classic 2nd Edition). Routledge.

Gournelos, T., Hammonds, J. R., & Wilson, M. A. (2019). Doing academic research: A practical guide to research methods and analysis . Routledge.

Gravemeijer, K., & van Eerde, D. (2009). Design research as a means for building a knowledge base for teachers and teaching in mathematics education. Elementary School Journal, 109 (5), 510–524.

Hart, L. C., Smith, S. Z., Swars, S. L., & Smith, M. E. (2008). An examination of research methods in mathematics education (1995–2005). Journal of Mixed Methods Research, 3 (1), 26–41.

Hay, C. M. (Ed.). (2016). Methods that matter: Integrating mixed methods for more effective social science research . University of Chicago Press.

Hohensee, C. (2014). Backward transfer: An investigation of the influence of quadratic functions instruction on students’ prior ways of reasoning about linear functions. Mathematical Thinking and Learning, 16 (2), 135–174.

Jonson, J. L., & Geisinger, K. F. (Eds.). (2022). Fairness in educational and psychological testing: Examining theoretical, research, practice, and policy implications of the 2014 standards . American Educational Research Association.

Kelly, A. E., & Lesh, R. A. (Eds.). (2000). Handbook of research design in mathematics and science education . Erlbaum.

Makel, M. C., & Plucker, J. A. (2014). Facts are more important than novelty: Replication in the education sciences. Educational Researcher, 43 (6), 304–316.

Maxwell, J. A. (2004). Causal explanation, qualitative research, and scientific inquiry in education. Educational Researcher, 33 (2), 3–11.

Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative data analysis: A methods sourcebook (4th ed.). Sage.

National Research Council. (2002). Scientific research in education . National Academy Press.

O’Donnell, C. A. (2008). Defining, conceptualizing, and measuring fidelity of implementation and its relationship to outcomes in K–12 curriculum intervention research. Review of Educational Research, 78 , 33–84.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349 (6251), aac4716. https://doi.org/10.1126/science.aac4716

Puntambekar, S. (2018). Design-based research. In Fischer, F., Hmelo-Silver, C. E., Goldman, S. R., & Reimann, P. (Eds.). International handbook of the learning sciences (pp. 383–392). Retrieved from http://ebookcentral.proquest.com

Sechrest, L., Babcock, J., & Smith, B. (1993). An invitation to methodological pluralism. Evaluation Practice, 14 (3), 227–235.

Sherin, M. G., Jacobs, V. R., & Philipp, R. A. (Eds.). (2001). Mathematics teacher noticing: Seeing through teachers’ eyes . Routledge.

Sloane, F. C. (2008). Randomized trials in mathematics education: Recalibrating the proposed high watermark. Educational Researcher, 37 (9), 624–630. https://doi.org/10.3102/0013189X08328879

Sloane, F. C., & Wilkins, J. L. M. (2017). Aligning statistical modeling with theories of learning in mathematics education research. In J. Cai (Ed.), Compendium for research in mathematics education (pp. 183–207). NCTM.

Small, M. L. (2011). How to conduct a mixed methods study: Recent trends in a rapidly growing literature. Annual Review of Sociology, 37 , 57–86.

Steffe, L. P., & Thompson, P. W. (2000). Teaching experiment methodology: Underlying principles and essential elements. In A. Kelly & R. A. Lesh (Eds.), Handbook of research design in mathematics and science education (pp. 266–287). Lawrence Erlbaum.

Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research, and Evaluation, 9 , Article 4. https://doi.org/10.7275/96jp-xz07

Stirman, S. W., & Beidas, R. S. (2020). Expanding the reach of psychological science through implementation science: Introduction to the special issue. American Psychologist, 75 (8), 1033–1037.

Tolman, C. W., Cherry, F., van Hezewijk, R., & Lubek, I. (1996). Problems of theoretical psychology . Ontario, CA.

Weis, L., Eisenhart, M., Duncan, G. J., Albro, E., Bueschel, A. C., Cobb, P., Eccles, J., Mendenhall, R., Moss, P., Penuel, W., Ream, R. K., Rumbaut, R. G., Sloane, F., Weisner, T. S., & Wilson, J. (2019a). Mixed methods for studies that address broad and enduring issues in education research. Teachers College Record, 121 , 100307.

Weis, L., Eisenhart, M., Weisner, T. S., Cobb, P., Duncan, G. J., Albro, E., Mendenhall, R., Penuel, W., Moss, P., Ream, R. K., & Rumbaut, R. G. (2019b). Exemplary mixed-methods research studies compiled by the mixed methods working group. Teachers College Record, 121 , 100308.

Weisner, T., Ryan, G. W., Reese, L., Kroesen, K., Bernheimer, L., & Gallimore, R. (2001). Behavior sampling and ethnography: Complementary methods for understanding home-school connections among Latino immigrant families. Field Methods, 13 (1), 20–46.

Wolf, C., Joye, D., Smith, T. W., & Fu, Y.-C. (2016). The SAGE handbook of survey methodology . SAGE.

Zieky, M. J. (2013). Fairness review in assessment. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 1. Test theory and testing and assessment in industrial and organizational psychology (pp. 293–302). American Psychological Association. https://doi.org/10.1037/14047-017

Chapter   Google Scholar  

Download references

Author information

Authors and affiliations.

School of Education, University of Delaware, Newark, DE, USA

James Hiebert, Anne K Morris & Charles Hohensee

Department of Mathematical Sciences, University of Delaware, Newark, DE, USA

Jinfa Cai & Stephen Hwang

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2023 The Author(s)

About this chapter

Hiebert, J., Cai, J., Hwang, S., Morris, A.K., Hohensee, C. (2023). Crafting the Methods to Test Hypotheses. In: Doing Research: A New Researcher’s Guide. Research in Mathematics Education. Springer, Cham. https://doi.org/10.1007/978-3-031-19078-0_4

Download citation

DOI : https://doi.org/10.1007/978-3-031-19078-0_4

Published : 03 December 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-19077-3

Online ISBN : 978-3-031-19078-0

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Research Hypothesis In Psychology: Types, & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A research hypothesis, in its plural form “hypotheses,” is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method .

Hypotheses connect theory to data and guide the research process towards expanding scientific understanding

Some key points about hypotheses:

  • A hypothesis expresses an expected pattern or relationship. It connects the variables under investigation.
  • It is stated in clear, precise terms before any data collection or analysis occurs. This makes the hypothesis testable.
  • A hypothesis must be falsifiable. It should be possible, even if unlikely in practice, to collect data that disconfirms rather than supports the hypothesis.
  • Hypotheses guide research. Scientists design studies to explicitly evaluate hypotheses about how nature works.
  • For a hypothesis to be valid, it must be testable against empirical evidence. The evidence can then confirm or disprove the testable predictions.
  • Hypotheses are informed by background knowledge and observation, but go beyond what is already known to propose an explanation of how or why something occurs.
Predictions typically arise from a thorough knowledge of the research literature, curiosity about real-world problems or implications, and integrating this to advance theory. They build on existing literature while providing new insight.

Types of Research Hypotheses

Alternative hypothesis.

The research hypothesis is often called the alternative or experimental hypothesis in experimental research.

It typically suggests a potential relationship between two key variables: the independent variable, which the researcher manipulates, and the dependent variable, which is measured based on those changes.

The alternative hypothesis states a relationship exists between the two variables being studied (one variable affects the other).

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It is a key component of the scientific method. Some key points about hypotheses:

  • Important hypotheses lead to predictions that can be tested empirically. The evidence can then confirm or disprove the testable predictions.

In summary, a hypothesis is a precise, testable statement of what researchers expect to happen in a study and why. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

An experimental hypothesis predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and are significant in supporting the theory being investigated.

The alternative hypothesis can be directional, indicating a specific direction of the effect, or non-directional, suggesting a difference without specifying its nature. It’s what researchers aim to support or demonstrate through their study.

Null Hypothesis

The null hypothesis states no relationship exists between the two variables being studied (one variable does not affect the other). There will be no changes in the dependent variable due to manipulating the independent variable.

It states results are due to chance and are not significant in supporting the idea being investigated.

The null hypothesis, positing no effect or relationship, is a foundational contrast to the research hypothesis in scientific inquiry. It establishes a baseline for statistical testing, promoting objectivity by initiating research from a neutral stance.

Many statistical methods are tailored to test the null hypothesis, determining the likelihood of observed results if no true effect exists.

This dual-hypothesis approach provides clarity, ensuring that research intentions are explicit, and fosters consistency across scientific studies, enhancing the standardization and interpretability of research outcomes.

Nondirectional Hypothesis

A non-directional hypothesis, also known as a two-tailed hypothesis, predicts that there is a difference or relationship between two variables but does not specify the direction of this relationship.

It merely indicates that a change or effect will occur without predicting which group will have higher or lower values.

For example, “There is a difference in performance between Group A and Group B” is a non-directional hypothesis.

Directional Hypothesis

A directional (one-tailed) hypothesis predicts the nature of the effect of the independent variable on the dependent variable. It predicts in which direction the change will take place. (i.e., greater, smaller, less, more)

It specifies whether one variable is greater, lesser, or different from another, rather than just indicating that there’s a difference without specifying its nature.

For example, “Exercise increases weight loss” is a directional hypothesis.

hypothesis

Falsifiability

The Falsification Principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory or hypothesis to be considered scientific, it must be testable and irrefutable.

Falsifiability emphasizes that scientific claims shouldn’t just be confirmable but should also have the potential to be proven wrong.

It means that there should exist some potential evidence or experiment that could prove the proposition false.

However many confirming instances exist for a theory, it only takes one counter observation to falsify it. For example, the hypothesis that “all swans are white,” can be falsified by observing a black swan.

For Popper, science should attempt to disprove a theory rather than attempt to continually provide evidence to support a research hypothesis.

Can a Hypothesis be Proven?

Hypotheses make probabilistic predictions. They state the expected outcome if a particular relationship exists. However, a study result supporting a hypothesis does not definitively prove it is true.

All studies have limitations. There may be unknown confounding factors or issues that limit the certainty of conclusions. Additional studies may yield different results.

In science, hypotheses can realistically only be supported with some degree of confidence, not proven. The process of science is to incrementally accumulate evidence for and against hypothesized relationships in an ongoing pursuit of better models and explanations that best fit the empirical data. But hypotheses remain open to revision and rejection if that is where the evidence leads.
  • Disproving a hypothesis is definitive. Solid disconfirmatory evidence will falsify a hypothesis and require altering or discarding it based on the evidence.
  • However, confirming evidence is always open to revision. Other explanations may account for the same results, and additional or contradictory evidence may emerge over time.

We can never 100% prove the alternative hypothesis. Instead, we see if we can disprove, or reject the null hypothesis.

If we reject the null hypothesis, this doesn’t mean that our alternative hypothesis is correct but does support the alternative/experimental hypothesis.

Upon analysis of the results, an alternative hypothesis can be rejected or supported, but it can never be proven to be correct. We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist which could refute a theory.

How to Write a Hypothesis

  • Identify variables . The researcher manipulates the independent variable and the dependent variable is the measured outcome.
  • Operationalized the variables being investigated . Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might count the number of punches given by participants.
  • Decide on a direction for your prediction . If there is evidence in the literature to support a specific effect of the independent variable on the dependent variable, write a directional (one-tailed) hypothesis. If there are limited or ambiguous findings in the literature regarding the effect of the independent variable on the dependent variable, write a non-directional (two-tailed) hypothesis.
  • Make it Testable : Ensure your hypothesis can be tested through experimentation or observation. It should be possible to prove it false (principle of falsifiability).
  • Clear & concise language . A strong hypothesis is concise (typically one to two sentences long), and formulated using clear and straightforward language, ensuring it’s easily understood and testable.

Consider a hypothesis many teachers might subscribe to: students work better on Monday morning than on Friday afternoon (IV=Day, DV= Standard of work).

Now, if we decide to study this by giving the same group of students a lesson on a Monday morning and a Friday afternoon and then measuring their immediate recall of the material covered in each session, we would end up with the following:

  • The alternative hypothesis states that students will recall significantly more information on a Monday morning than on a Friday afternoon.
  • The null hypothesis states that there will be no significant difference in the amount recalled on a Monday morning compared to a Friday afternoon. Any difference will be due to chance or confounding factors.

More Examples

  • Memory : Participants exposed to classical music during study sessions will recall more items from a list than those who studied in silence.
  • Social Psychology : Individuals who frequently engage in social media use will report higher levels of perceived social isolation compared to those who use it infrequently.
  • Developmental Psychology : Children who engage in regular imaginative play have better problem-solving skills than those who don’t.
  • Clinical Psychology : Cognitive-behavioral therapy will be more effective in reducing symptoms of anxiety over a 6-month period compared to traditional talk therapy.
  • Cognitive Psychology : Individuals who multitask between various electronic devices will have shorter attention spans on focused tasks than those who single-task.
  • Health Psychology : Patients who practice mindfulness meditation will experience lower levels of chronic pain compared to those who don’t meditate.
  • Organizational Psychology : Employees in open-plan offices will report higher levels of stress than those in private offices.
  • Behavioral Psychology : Rats rewarded with food after pressing a lever will press it more frequently than rats who receive no reward.

Print Friendly, PDF & Email

Related Articles

Qualitative Data Coding

Research Methodology

Qualitative Data Coding

What Is a Focus Group?

What Is a Focus Group?

Cross-Cultural Research Methodology In Psychology

Cross-Cultural Research Methodology In Psychology

What Is Internal Validity In Research?

What Is Internal Validity In Research?

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Statology

Statistics Made Easy

When Do You Reject the Null Hypothesis? (3 Examples)

A hypothesis test is a formal statistical test we use to reject or fail to reject a statistical hypothesis.

We always use the following steps to perform a hypothesis test:

Step 1: State the null and alternative hypotheses.

The null hypothesis , denoted as H 0 , is the hypothesis that the sample data occurs purely from chance.

The alternative hypothesis , denoted as H A , is the hypothesis that the sample data is influenced by some non-random cause.

2. Determine a significance level to use.

Decide on a significance level. Common choices are .01, .05, and .1. 

3. Calculate the test statistic and p-value.

Use the sample data to calculate a test statistic and a corresponding p-value .

4. Reject or fail to reject the null hypothesis.

If the p-value is less than the significance level, then you reject the null hypothesis.

If the p-value is not less than the significance level, then you fail to reject the null hypothesis.

You can use the following clever line to remember this rule:

“If the p is low, the null must go.”

In other words, if the p-value is low enough then we must reject the null hypothesis.

The following examples show when to reject (or fail to reject) the null hypothesis for the most common types of hypothesis tests.

Example 1: One Sample t-test

A  one sample t-test  is used to test whether or not the mean of a population is equal to some value.

For example, suppose we want to know whether or not the mean weight of a certain species of turtle is equal to 310 pounds.

We go out and collect a simple random sample of 40 turtles with the following information:

  • Sample size n = 40
  • Sample mean weight  x  = 300
  • Sample standard deviation s = 18.5

We can use the following steps to perform a one sample t-test:

Step 1: State the Null and Alternative Hypotheses

We will perform the one sample t-test with the following hypotheses:

  • H 0 :  μ = 310 (population mean is equal to 310 pounds)
  • H A :  μ ≠ 310 (population mean is not equal to 310 pounds)

We will choose to use a significance level of 0.05 .

We can plug in the numbers for the sample size, sample mean, and sample standard deviation into this One Sample t-test Calculator to calculate the test statistic and p-value:

  • t test statistic: -3.4187
  • two-tailed p-value: 0.0015

Since the p-value (0.0015) is less than the significance level (0.05) we reject the null hypothesis .

We conclude that there is sufficient evidence to say that the mean weight of turtles in this population is not equal to 310 pounds.

Example 2: Two Sample t-test

A  two sample t-test is used to test whether or not two population means are equal.

For example, suppose we want to know whether or not the mean weight between two different species of turtles is equal.

We go out and collect a simple random sample from each population with the following information:

  • Sample size n 1 = 40
  • Sample mean weight  x 1  = 300
  • Sample standard deviation s 1 = 18.5
  • Sample size n 2 = 38
  • Sample mean weight  x 2  = 305
  • Sample standard deviation s 2 = 16.7

We can use the following steps to perform a two sample t-test:

We will perform the two sample t-test with the following hypotheses:

  • H 0 :  μ 1  = μ 2 (the two population means are equal)
  • H 1 :  μ 1  ≠ μ 2 (the two population means are not equal)

We will choose to use a significance level of 0.10 .

We can plug in the numbers for the sample sizes, sample means, and sample standard deviations into this Two Sample t-test Calculator to calculate the test statistic and p-value:

  • t test statistic: -1.2508
  • two-tailed p-value: 0.2149

Since the p-value (0.2149) is not less than the significance level (0.10) we fail to reject the null hypothesis .

We do not have sufficient evidence to say that the mean weight of turtles between these two populations is different.

Example 3: Paired Samples t-test

A paired samples t-test is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample.

For example, suppose we want to know whether or not a certain training program is able to increase the max vertical jump of college basketball players.

To test this, we may recruit a simple random sample of 20 college basketball players and measure each of their max vertical jumps. Then, we may have each player use the training program for one month and then measure their max vertical jump again at the end of the month:

Paired t-test example dataset

We can use the following steps to perform a paired samples t-test:

We will perform the paired samples t-test with the following hypotheses:

  • H 0 :  μ before = μ after (the two population means are equal)
  • H 1 :  μ before ≠ μ after (the two population means are not equal)

We will choose to use a significance level of 0.01 .

We can plug in the raw data for each sample into this Paired Samples t-test Calculator to calculate the test statistic and p-value:

  • t test statistic: -3.226
  • two-tailed p-value: 0.0045

Since the p-value (0.0045) is less than the significance level (0.01) we reject the null hypothesis .

We have sufficient evidence to say that the mean vertical jump before and after participating in the training program is not equal.

Bonus: Decision Rule Calculator 

You can use this decision rule calculator to automatically determine whether you should reject or fail to reject a null hypothesis for a hypothesis test based on the value of the test statistic.

Featured Posts

5 Tips for Interpreting P-Values Correctly in Hypothesis Testing

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Definition of a Hypothesis

What it is and how it's used in sociology

  • Key Concepts
  • Major Sociologists
  • News & Issues
  • Research, Samples, and Statistics
  • Recommended Reading
  • Archaeology

A hypothesis is a prediction of what will be found at the outcome of a research project and is typically focused on the relationship between two different variables studied in the research. It is usually based on both theoretical expectations about how things work and already existing scientific evidence.

Within social science, a hypothesis can take two forms. It can predict that there is no relationship between two variables, in which case it is a null hypothesis . Or, it can predict the existence of a relationship between variables, which is known as an alternative hypothesis.

In either case, the variable that is thought to either affect or not affect the outcome is known as the independent variable, and the variable that is thought to either be affected or not is the dependent variable.

Researchers seek to determine whether or not their hypothesis, or hypotheses if they have more than one, will prove true. Sometimes they do, and sometimes they do not. Either way, the research is considered successful if one can conclude whether or not a hypothesis is true. 

Null Hypothesis

A researcher has a null hypothesis when she or he believes, based on theory and existing scientific evidence, that there will not be a relationship between two variables. For example, when examining what factors influence a person's highest level of education within the U.S., a researcher might expect that place of birth, number of siblings, and religion would not have an impact on the level of education. This would mean the researcher has stated three null hypotheses.

Alternative Hypothesis

Taking the same example, a researcher might expect that the economic class and educational attainment of one's parents, and the race of the person in question are likely to have an effect on one's educational attainment. Existing evidence and social theories that recognize the connections between wealth and cultural resources , and how race affects access to rights and resources in the U.S. , would suggest that both economic class and educational attainment of the one's parents would have a positive effect on educational attainment. In this case, economic class and educational attainment of one's parents are independent variables, and one's educational attainment is the dependent variable—it is hypothesized to be dependent on the other two.

Conversely, an informed researcher would expect that being a race other than white in the U.S. is likely to have a negative impact on a person's educational attainment. This would be characterized as a negative relationship, wherein being a person of color has a negative effect on one's educational attainment. In reality, this hypothesis proves true, with the exception of Asian Americans , who go to college at a higher rate than whites do. However, Blacks and Hispanics and Latinos are far less likely than whites and Asian Americans to go to college.

Formulating a Hypothesis

Formulating a hypothesis can take place at the very beginning of a research project , or after a bit of research has already been done. Sometimes a researcher knows right from the start which variables she is interested in studying, and she may already have a hunch about their relationships. Other times, a researcher may have an interest in ​a particular topic, trend, or phenomenon, but he may not know enough about it to identify variables or formulate a hypothesis.

Whenever a hypothesis is formulated, the most important thing is to be precise about what one's variables are, what the nature of the relationship between them might be, and how one can go about conducting a study of them.

Updated by Nicki Lisa Cole, Ph.D

  • Null Hypothesis Examples
  • Examples of Independent and Dependent Variables
  • Difference Between Independent and Dependent Variables
  • What Is a Hypothesis? (Science)
  • Understanding Path Analysis
  • What Are the Elements of a Good Hypothesis?
  • What It Means When a Variable Is Spurious
  • What 'Fail to Reject' Means in a Hypothesis Test
  • How Intervening Variables Work in Sociology
  • Null Hypothesis Definition and Examples
  • Understanding Simple vs Controlled Experiments
  • Scientific Method Vocabulary Terms
  • Null Hypothesis and Alternative Hypothesis
  • Six Steps of the Scientific Method
  • What Are Examples of a Hypothesis?
  • Structural Equation Modeling

science made simple logo

The Scientific Method by Science Made Simple

Understanding and using the scientific method.

The Scientific Method is a process used to design and perform experiments. It's important to minimize experimental errors and bias, and increase confidence in the accuracy of your results.

science experiment

In the previous sections, we talked about how to pick a good topic and specific question to investigate. Now we will discuss how to carry out your investigation.

Steps of the Scientific Method

  • Observation/Research
  • Experimentation

Now that you have settled on the question you want to ask, it's time to use the Scientific Method to design an experiment to answer that question.

If your experiment isn't designed well, you may not get the correct answer. You may not even get any definitive answer at all!

The Scientific Method is a logical and rational order of steps by which scientists come to conclusions about the world around them. The Scientific Method helps to organize thoughts and procedures so that scientists can be confident in the answers they find.

OBSERVATION is first step, so that you know how you want to go about your research.

HYPOTHESIS is the answer you think you'll find.

PREDICTION is your specific belief about the scientific idea: If my hypothesis is true, then I predict we will discover this.

EXPERIMENT is the tool that you invent to answer the question, and

CONCLUSION is the answer that the experiment gives.

Don't worry, it isn't that complicated. Let's take a closer look at each one of these steps. Then you can understand the tools scientists use for their science experiments, and use them for your own.

OBSERVATION

observation  magnifying glass

This step could also be called "research." It is the first stage in understanding the problem.

After you decide on topic, and narrow it down to a specific question, you will need to research everything that you can find about it. You can collect information from your own experiences, books, the internet, or even smaller "unofficial" experiments.

Let's continue the example of a science fair idea about tomatoes in the garden. You like to garden, and notice that some tomatoes are bigger than others and wonder why.

Because of this personal experience and an interest in the problem, you decide to learn more about what makes plants grow.

For this stage of the Scientific Method, it's important to use as many sources as you can find. The more information you have on your science fair topic, the better the design of your experiment is going to be, and the better your science fair project is going to be overall.

Also try to get information from your teachers or librarians, or professionals who know something about your science fair project. They can help to guide you to a solid experimental setup.

research science fair topic

The next stage of the Scientific Method is known as the "hypothesis." This word basically means "a possible solution to a problem, based on knowledge and research."

The hypothesis is a simple statement that defines what you think the outcome of your experiment will be.

All of the first stage of the Scientific Method -- the observation, or research stage -- is designed to help you express a problem in a single question ("Does the amount of sunlight in a garden affect tomato size?") and propose an answer to the question based on what you know. The experiment that you will design is done to test the hypothesis.

Using the example of the tomato experiment, here is an example of a hypothesis:

TOPIC: "Does the amount of sunlight a tomato plant receives affect the size of the tomatoes?"

HYPOTHESIS: "I believe that the more sunlight a tomato plant receives, the larger the tomatoes will grow.

This hypothesis is based on:

(1) Tomato plants need sunshine to make food through photosynthesis, and logically, more sun means more food, and;

(2) Through informal, exploratory observations of plants in a garden, those with more sunlight appear to grow bigger.

science fair project ideas

The hypothesis is your general statement of how you think the scientific phenomenon in question works.

Your prediction lets you get specific -- how will you demonstrate that your hypothesis is true? The experiment that you will design is done to test the prediction.

An important thing to remember during this stage of the scientific method is that once you develop a hypothesis and a prediction, you shouldn't change it, even if the results of your experiment show that you were wrong.

An incorrect prediction does NOT mean that you "failed." It just means that the experiment brought some new facts to light that maybe you hadn't thought about before.

Continuing our tomato plant example, a good prediction would be: Increasing the amount of sunlight tomato plants in my experiment receive will cause an increase in their size compared to identical plants that received the same care but less light.

This is the part of the scientific method that tests your hypothesis. An experiment is a tool that you design to find out if your ideas about your topic are right or wrong.

It is absolutely necessary to design a science fair experiment that will accurately test your hypothesis. The experiment is the most important part of the scientific method. It's the logical process that lets scientists learn about the world.

On the next page, we'll discuss the ways that you can go about designing a science fair experiment idea.

The final step in the scientific method is the conclusion. This is a summary of the experiment's results, and how those results match up to your hypothesis.

You have two options for your conclusions: based on your results, either:

(1) YOU CAN REJECT the hypothesis, or

(2) YOU CAN NOT REJECT the hypothesis.

This is an important point!

You can not PROVE the hypothesis with a single experiment, because there is a chance that you made an error somewhere along the way.

What you can say is that your results SUPPORT the original hypothesis.

If your original hypothesis didn't match up with the final results of your experiment, don't change the hypothesis.

Instead, try to explain what might have been wrong with your original hypothesis. What information were you missing when you made your prediction? What are the possible reasons the hypothesis and experimental results didn't match up?

Remember, a science fair experiment isn't a failure simply because does not agree with your hypothesis. No one will take points off if your prediction wasn't accurate. Many important scientific discoveries were made as a result of experiments gone wrong!

A science fair experiment is only a failure if its design is flawed. A flawed experiment is one that (1) doesn't keep its variables under control, and (2) doesn't sufficiently answer the question that you asked of it.

Search This Site:

Science Fairs

  • Introduction
  • Project Ideas
  • Types of Projects
  • Pick a Topic
  • Scientific Method
  • Design Your Experiment
  • Present Your Project
  • What Judges Want
  • Parent Info

Recommended *

  • Sample Science Projects - botany, ecology, microbiology, nutrition

scientific method book

* This site contains affiliate links to carefully chosen, high quality products. We may receive a commission for purchases made through these links.

  • Terms of Service

Copyright © 2006 - 2023, Science Made Simple, Inc. All Rights Reserved.

The science fair projects & ideas, science articles and all other material on this website are covered by copyright laws and may not be reproduced without permission.

photo of student typing on a typewriter outside

Writing Program at New College

Revising the research question.

Once you receive feedback from your classmates, teacher, and others, you will be in a place where you can revise your research question. Remember that the final decision is yours. You will have to decide which advice for revision is useful and to the point and which is not. To assist you in this process, we offer the following questions:

  • Is there agreement among your readers that your research question is too broad?
  • Have you been given advice for narrowing your research question, or can you think of a good way to focus the question?
  • Is there a consensus among readers that your research question is too narrow? What ideas do you have for making it more general?
  • Have you received feedback that suggests you have some misunderstanding of the issue at hand?
  • Have your readers provided you with specifics about what it is you misunderstand?
  • Did you receive comments about the clarity of your question? Were your readers specific about how to clarify your phrasing of the question?

After reviewing your initial research question, all of the responses initiated by this question, as well as your answers to the above questions, you are ready to move to a formal draft of your Research Question.

You can access the full assignment description   here .

Writing Program

Select Section

what does it mean to revise the hypothesis

Outbreak Investigations

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  
  • |   10  

On This Page sidebar

Step 8: Refine Hypotheses and Carry Out Additional Studies If Necessary

Step 9: implement control and prevention measures, step 10: communicate the findings.

Learn More sidebar

All Modules

If analytical studies do not confirm any of the hypotheses generated by descriptive epidemiology, then you need to go back to the descriptive epidemiology and consider other sources and routes of transmission.

In addition, even if analytical studies establish the source, it may be necessary to pursue the investigation in order to refine your understanding of the source. For example, in the Salmonella outbreak described on page 7 it was clear that the manicotti dish was responsible, but what was the specific source? Was the manicotti prepared at home? Was it purchased? What ingredient was responsible for contaminating the manicotti? Was it the eggs used in preparation of the pasta? Was it the cheese?

This step is listed toward the end, but, you obviously want to initiate prevention measures as soon as possible if you have identified the source, even if you haven't worked out all of the details.

When the investigation is concluded, it is important to communicate your findings to the local health authorities and to those responsible for implementing control and prevention measures. The communications usually require both oral and written reports. The written report should follow standard scientific guidelines, and it should include an introduction, background, methods, results, discussion, and recommendations.

return to top | previous page | next page

Content ©2016. All Rights Reserved. Date last modified: May 3, 2016. Wayne W. LaMorte, MD, PhD, MPH

IMAGES

  1. 🏷️ Formulation of hypothesis in research. How to Write a Strong

    what does it mean to revise the hypothesis

  2. Types Of Research Hypothesis

    what does it mean to revise the hypothesis

  3. SOLUTION: How to write research hypothesis

    what does it mean to revise the hypothesis

  4. What is a Hypothesis

    what does it mean to revise the hypothesis

  5. Hypothesis

    what does it mean to revise the hypothesis

  6. How to Write a Strong Hypothesis in 6 Simple Steps

    what does it mean to revise the hypothesis

VIDEO

  1. Are we cyborgs now?

  2. SQ3R formula study formula # math #formula Sir M junaid

  3. Can You Find the Mean #mathfun #mathshorts #exam #igcsemaths # mean

  4. Payment Revision Needed On Amazon !

  5. Obtaining Potable Water

  6. WHAT does the MYSTICAL meaning of CITIES in the Bible mean?

COMMENTS

  1. How to Revise Your Hypothesis Test in Data Analysis

    4 Re-run the test. The final step to revise your hypothesis test is to re-run the test with the revised data, assumptions, and parameters. You should calculate the test statistic and p-value ...

  2. Hypothesis Testing

    Step 5: Present your findings. The results of hypothesis testing will be presented in the results and discussion sections of your research paper, dissertation or thesis.. In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p-value).

  3. Hypothesis Testing Explained (How I Wish It Was Explained to Me)

    The curse of hypothesis testing is that we will never know if we are dealing with a True or a False Positive (Negative). All we can do is fill the confusion matrix with probabilities that are acceptable given our application. To be able to do that, we must start from a hypothesis. Step 1. Defining the hypothesis

  4. 7.1: Basics of Hypothesis Testing

    It doesn't mean you proved the null hypothesis; it just means you can't prove the alternative hypothesis. Here is an example to demonstrate this. Example \(\PageIndex{3}\) conclusion in hypothesis tests ... That does not mean that he is innocent of this crime. It means there was not enough evidence to prove he was guilty. Many people ...

  5. The scientific method (article)

    The scientific method. At the core of biology and other sciences lies a problem-solving approach called the scientific method. The scientific method has five basic steps, plus one feedback step: Make an observation. Ask a question. Form a hypothesis, or testable explanation. Make a prediction based on the hypothesis.

  6. What Is A Hypothesis

    Hypothesis Definition. In the context of a consulting interview, a hypothesis definition is "a testable statement that needs further data for verification". In other words, the meaning of a hypothesis is that it's an educated guess that you think could be the answer to your client's problem. A hypothesis is therefore not always true.

  7. How to Write a Strong Hypothesis

    6. Write a null hypothesis. If your research involves statistical hypothesis testing, you will also have to write a null hypothesis. The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0, while the alternative hypothesis is H 1 or H a.

  8. Reviewing test results

    Evidence may lead to the revision of a hypothesis. For example, experiments and observations had long supported the idea that light consists of waves, but in 1905, Einstein showed that a well known (and previously unexplained) phenomenon — the photoelectric effect — made perfect sense if light consisted of discrete particles.

  9. Hypothesis Testing

    Let's return finally to the question of whether we reject or fail to reject the null hypothesis. If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis. Alternatively, if the significance level is above ...

  10. How to Write a Strong Hypothesis

    Step 5: Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  11. How Do You Formulate (Important) Hypotheses?

    Building on the ideas in Chap. 1, we describe formulating, testing, and revising hypotheses as a continuing cycle of clarifying what you want to study, making predictions about what you might find together with developing your reasons for these predictions, imagining tests of these predictions, revising your predictions and rationales, and so ...

  12. Crafting the Methods to Test Hypotheses

    At the outset, you need to describe what the case is a case of. The goal is to understand the case—how it works, what it means, why it looks like it does—within the context in which it functions. To describe conceptual teaching more fully, for example, researchers might investigate a case of one teacher teaching several lessons conceptually.

  13. Research Hypothesis In Psychology: Types, & Examples

    Examples. A research hypothesis, in its plural form "hypotheses," is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

  14. When Do You Reject the Null Hypothesis? (3 Examples)

    A hypothesis test is a formal statistical test we use to reject or fail to reject a statistical hypothesis. We always use the following steps to perform a hypothesis test: Step 1: State the null and alternative hypotheses. The null hypothesis, denoted as H0, is the hypothesis that the sample data occurs purely from chance.

  15. Scientific hypothesis

    scientific hypothesis, an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world.The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an "If…then" statement summarizing the idea and in the ability to be supported or refuted through observation and experimentation.

  16. What a Hypothesis Is and How to Formulate One

    A hypothesis is a prediction of what will be found at the outcome of a research project and is typically focused on the relationship between two different variables studied in the research. It is usually based on both theoretical expectations about how things work and already existing scientific evidence. Within social science, a hypothesis can ...

  17. The Scientific Method

    HYPOTHESIS. The next stage of the Scientific Method is known as the "hypothesis." This word basically means "a possible solution to a problem, based on knowledge and research." The hypothesis is a simple statement that defines what you think the outcome of your experiment will be.

  18. Revising the Research Question

    Revising the Research Question. Once you receive feedback from your classmates, teacher, and others, you will be in a place where you can revise your research question. Remember that the final decision is yours. You will have to decide which advice for revision is useful and to the point and which is not. To assist you in this process, we offer ...

  19. What Is Peer Review?

    The most common types are: Single-blind review. Double-blind review. Triple-blind review. Collaborative review. Open review. Relatedly, peer assessment is a process where your peers provide you with feedback on something you've written, based on a set of criteria or benchmarks from an instructor.

  20. Step 8: Refine Hypotheses and Carry Out Additional Studies If Necessary

    Step 8: Refine Hypotheses and Carry Out Additional Studies If Necessary. If analytical studies do not confirm any of the hypotheses generated by descriptive epidemiology, then you need to go back to the descriptive epidemiology and consider other sources and routes of transmission. In addition, even if analytical studies establish the source ...

  21. Video Example: Revising Logical Fallacies Flashcards

    Revising an argument means adding to, changing, clarifying, reorganizing, or removing content. What does it mean to revise an argument? You should confirm the expert and his or her experience is relevant to your argument, accurate in its findings, and up to date. If you choose to appeal to an expert as part of your evidence for your claim, what ...

  22. Quiz 9 Psy 101 Flashcards

    Study with Quizlet and memorize flashcards containing terms like Why were IQ tests developed, originally?, Can educational programs help children taken from extremely impoverished environments improve their intellectual performance?, If someone charges that a job application is "biased" against people with poor eyesight, what evidence would be most important for evaluating that claim? and more.