Data and Statistical Sources: Empirical Articles: Finding Empirical Articles

  • About Empirical Articles
  • Finding Empirical Articles

Search strategies

This page primarily describes how to find empirical articles using the EBSCO databases that the library subscribes to. However, there are many databases at Cornell that are not presented in the EBSCO format. Find them listed by subject at this link . You can use similar strateg to find empirical articles.  

You may also add specific statistical terms to your search, such as chi, t-test, p-value, or standard deviation. Try searching with terms used in the scientific method: method, results, discussion, or conclusion.

Empirical Articles in EBSCO

Cornell subscribes to scores of databases that provide full text journal articles. Many of the databases are purchased from EBSCO and can be searched using its interface.

Here is a sample search in the Business Source Complete database. Other EBSCO databases with empirical articles and a similar search interface are listed in the box called "EBSCO Databases.

Note that "Economics - Statistical Methods" is a subject term . It is combined with the keyword  "labor economics." Instead of typing 'DE "ECONOMICS -- Statistical Methods'' in the search box, you can just type ECONOMICS -- Statistical Methods and select "Subject Term" from the drop down menu.

Sample search in Business Source Complete database

EBSCO Databases

  • Academic Search Premier This multi-disciplinary database provides full text for more than 8,500 journals, including full text for more than 4,600 peer-reviewed titles. PDF backfiles to 1975 or further are available for well over one hundred journals, and searchable cited references are provided for more than 1,000 titles.
  • Business Source Complete Business Source Complete provides full text for scholarly business journals and other sources, including full text for more than 1,800 peer-reviewed business publications. Coverage includes virtually all subject areas related to business. This database provides full text (PDF) for top scholarly journals, including the Harvard Business Review. It also includes industry and country reports from Euromonitor and company and industry reports from Datamonitor.
  • EconLit with Full Text Abstracts, indexing, and full-text articles in all fields of economics, including capital markets, country studies, econometrics, economic forecasting, environmental economics, government regulations, labor economics, monetary theory, and urban economics.
  • PsycINFO Contains citations and summaries of the international literature in psychology and related behavioral and social sciences, including psychiatry, sociology, anthropology, education, pharmacology, and linguistics. Includes applied psychology, communication systems, developmental psychology, educational psychology, experimental human and animal psychology, personality, physical and psychological disorders, physiological psychology and intervention, professional personnel and issues, psychometrics, social processes and issues, sports psychology and leisure, and treatment and prevention.
  • Sociology Source Ultimate An expanded version of SocINDEX, including greater coverage of peer-reviewed journals, international resources and open access titles. Provides citations and direct links to the texts of journal articles, book chapters and conference proceedings, some as far back as 1880. Comprehensive coverage encompassing sub-disciplines and related areas of the social sciences, including labor, crime, demography, economic sociology, immigration, ethnic, racial and gender studies, family, political sociology, religion, development, social psychology, social structure, social work, socio-cultural anthropology, social history, theory, methodology, and more.”
  • MEDLINE Compiled by the U.S. National Library of Medicine (NLM), MEDLINE is the world's most comprehensive source of life sciences and biomedical bibliographic information. It contains nearly eleven million records from over 7,300 different publications from 1965 to present.

Search Terms

Some keywords for research studies:

  • Empirical Studies
  • Observations
  • Methodology
  • Correlation
  • Standard Deviation
  • << Previous: About Empirical Articles
  • Last Updated: Feb 3, 2023 9:13 AM
  • URL: https://guides.library.cornell.edu/empiricalarticles
  • USC Libraries
  • Research Guides
  • Identify Empirical Articles

*Education: Identify Empirical Articles

  • Google Scholar tips and instructions
  • Newspaper Articles
  • Video Tutorials
  • EDUC 508 Library Session
  • Statistics and Data
  • Tests & Measurements
  • Citation Managers
  • APA Style This link opens in a new window
  • Scan & Deliver and Interlibrary Loan
  • Educational Leadership
  • Global Executive Ed.D.
  • Marriage & Family Therapy
  • Organizational Change & Leadership
  • Literacy Education
  • Accreditation
  • Journal Ranking Metrics
  • Publishing Your Research
  • Education and STEM Databases

How to Recognize Empirical Journal Articles

Definition of an empirical study:  An empirical research article reports the results of a study that uses data derived from actual observation or experimentation. Empirical research articles are examples of primary research.

Parts of a standard empirical research article:  (articles will not necessary use the exact terms listed below.)

  • Abstract  ... A paragraph length description of what the study includes.
  • Introduction ...Includes a statement of the hypotheses for the research and a review of other research on the topic.
  • Who are participants
  • Design of the study
  • What the participants did
  • What measures were used
  • Results ...Describes the outcomes of the measures of the study.
  • Discussion ...Contains the interpretations and implications of the study.
  • References ...Contains citation information on the material cited in the report. (also called bibliography or works cited)

Characteristics of an Empirical Article:

  • Empirical articles will include charts, graphs, or statistical analysis.
  • Empirical research articles are usually substantial, maybe from 8-30 pages long.
  • There is always a bibliography found at the end of the article.

Type of publications that publish empirical studies:

  • Empirical research articles are published in scholarly or academic journals
  • These journals are also called “peer-reviewed,” or “refereed” publications.

Examples of such publications include:

  • American Educational Research Journal
  • Computers & Education
  • Journal of Educational Psychology

Databases that contain empirical research:  (selected list only)

  • List of other useful databases by subject area

This page is adapted from Eric Karkhoff's  Sociology Research Guide: Identify Empirical Articles page (Cal State Fullerton Pollak Library).

Sample Empirical Articles

Roschelle, J., Feng, M., Murphy, R. F., & Mason, C. A. (2016). Online Mathematics Homework Increases Student Achievement. AERA Open .  ( L INK TO ARTICLE )

Lester, J., Yamanaka, A., & Struthers, B. (2016). Gender microaggressions and learning environments: The role of physical space in teaching pedagogy and communication.  Community College Journal of Research and Practice , 40(11), 909-926. ( LINK TO ARTICLE )

  • << Previous: Newspaper Articles
  • Next: Workshops and Webinars >>
  • Last Updated: May 21, 2024 1:38 PM
  • URL: https://libguides.usc.edu/education

Library Homepage

Identifying Empirical Research Articles

  • Identifying Empirical Articles
  • Searching for Empirical Research Articles

Where to find empirical research articles

Finding empirical research.

When searching for empirical research, it can be helpful to use terms that relate to the method used in empirical research in addition to keywords that describe your topic. For example: 

  • (generalized anxiety  AND  treatment*)  AND  (randomized clinical trial*  OR  clinical trial*)

You might also try using terms related to the type of instrument used:

  • (generalized anxiety  AND  intervention*)  AND  (survey  OR  questionnaire)

You can also narrow your results to peer-review . Usually databases have a peer-review check box that you can select. To learn more about peer review, see our related guide:

  • Understand Peer Review

Searching by Methodology

Some databases give you the option to do an advanced search by  methodology, where you can choose "empirical study" as a type. Here's an example from PsycInfo: 

screenshot of PsycInfo advanced search page that highlights the methodology filter.

Other filters includes things like document type, age group, population, language, and target audience. You can use these to narrow your search and get more relevant results.

Databasics: How to Filter by Methodology in ProQuest's PsycInfo + PsycArticles

Part of our Databasics YouTube series, this short video shows you how to limit by methodology in ProQuest's PsycInfo + PsycArticles database.

Attribution

Information in this guide adapted from Boston College Libraries' guide to " Finding Empirical Research "; Brandeis Library's " Finding Empirical Studies "; and CSUSM's " How do I know if a research article is empirical? "

  • << Previous: Identifying Empirical Articles
  • Last Updated: Nov 16, 2023 8:24 AM

CityU Home - CityU Catalog

Creative Commons License

Banner

  • Introduction & Help
  • DSM-5 & Reference Books
  • Books & E-Books
  • Find Empirical Research Articles
  • Search Tips
  • Streaming Video
  • APA Style This link opens in a new window

Finding Empirical Research Articles

  • Introduction
  • Methods or Methodology
  • Results or Findings

The method for finding empirical research articles varies depending upon the database* being used. 

1. The PsycARTICLES and PsycInfo databases (both from the APA) includes a Methodology filter that can be used to identify empirical studies. Look for the filter on the Advanced Search screen. To see a list and description of all of the of methodology filter options in PsycARTICLES and PsycInfo visit the  APA Databases Methodology Field Values page .

Methodology filter in PsychARTICLES database

2. When using databases that do not provide a methodology filter—including ProQuest Psychology Journals and Academic Search Complete—experiment with using keywords to retrieve articles on your topic that contain empirical research. For example:

  • empirical research
  • empirical study
  • quantitative study
  • qualitative study
  • longitudinal study
  • observation
  • questionnaire
  • methodology
  • participants

Qualitative research can be challenging to find as these methodologies are not always well-indexed in the databases. Here are some suggested keywords for retrieving articles that include qualitative research.

  • qualitative
  • ethnograph*
  • observation*
  • "case study”
  • "focus group"
  • "phenomenological research"
  • "conversation analysis"

*Recommended databases are listed on the  Databases: Find Journal Articles page of this guide.

  • << Previous: Databases: Find Journal Articles
  • Next: Search Tips >>
  • Last Updated: May 9, 2024 9:22 AM
  • URL: https://libguides.bentley.edu/psychology

Experimental (Empirical) Research Articles

  • Library vs. Google
  • Background Reading
  • Keyword Searching
  • Evaluating Sources
  • Citing Sources
  • Need more help?

How Can I Find Experimental (Empirical) Articles?

Many of the recommended databases in this research guide contain scholarly experimental articles (also known as empirical articles or research studies or primary research). Search in databases like: 

  • APA PsycInfo ​
  • ScienceDirect

Because those databases are rich in scholarly experimental articles, any well-structured search that you enter will retrieve experimental/empirical articles. These searches, for example, will retrieve many experimental/empirical articles:

  • caffeine AND "reaction time"
  • aging AND ("cognitive function" OR "cognitive ability")
  • "child development" AND play

Experimental (Empirical) Articles: How Will I Know One When I See One?

Scholarly experimental articles  to conduct and publish an experiment, an author or team of authors designs an experiment, gathers data, then analyzes the data and discusses the results of the experiment. a published experiment or research study will therefore  look  very different from other types of articles (newspaper stories, magazine articles, essays, etc.) found in our library databases..

In fact, newspapers, magazines, and websites written by journalists report on psychology research all the time, summarizing published experiments in non-technical language for the general public. Although that kind of article can be interesting to read (and can even lead you to look up the original experiment published by the researchers themselves),  to write a research paper about a psychology topic, you should, generally, use experimental articles written by researchers. The following guidelines will help you recognize an experimental article, written by the researchers themselves and published in a scholarly journal.

Structure of a Experimental Article Typically, an experimental article has the following sections:

  • The author summarizes her article
  • The author discusses the general background of her research topic; often, she will present a literature review, that is, summarize what other experts have written on this particular research topic
  • The author describes the experiment she designed and conducted
  • The author presents the data she gathered during her experiment
  • The author offers ideas about the importance and implications of her research findings, and speculates on future directions that similar research might take
  • The author gives a References list of sources she used in her paper

Look for articles structured in that way--they will be experimental/empirical articles. ​

Also, experimental/empirical articles are written in very formal, technical language (even the titles of the articles sound complicated!) and will usually contain numerical data presented in tables. 

As noted above, when you search in a database like APA PsycInfo, it's really easy to find experimental/empirical articles, once you know what you're looking for. Just in case, though, here is a shortcut that might help:

First, do your keyword search, for example:

search menu in APA PsycInfo

In the results screen, on the left-hand side, scroll down until you see "Methodology." You can use that menu to refine your search by limiting the articles to empirical studies only:

Methodology menu in APA PsycInfo

You can learn learn more about advanced search techniques in APA PsycInfo here . 

  • << Previous: Resources
  • Next: Research Tips >>
  • Last Updated: Jan 5, 2024 11:55 AM
  • URL: https://libguides.umgc.edu/psychology

Empirical Articles/Studies: How to tell and find quality articles

How to tell what's an empirical article.

Step 1:  What's an empirical article?

Learn to recognize an empirical article by watching the video.

Step 2:  Judge for yourself

Open the articles by clicking on the letters.  Decide which ones are empirical and the sort of thing you would want to cite.

Step 3:  Quiz yourself

Check your understanding with this one question quiz .

Finding empirical studies

Watch the video to see how easy it is to find empirical articles using PsycINFO. 

  • PsycINFO at a Glance Want a little advice on searching or the steps to get the stuff?

For databases that don't have a way to limit by methodology, some strategies :

1) Look through the results, scan the abstract for clues to recognize it's empirical.

2) Try searching with words that describe types of empirical studies (list not exhaustive):

empirical OR qualitative OR quantitative OR "action research" OR "case study" OR "controlled trial" OR "focus group"

3) Enter other terms you'd expect to see in an abstract.  Some suggestions:

findings OR participant* OR investigat*

Marilee Birchfield

Profile Photo

  • Last Updated: Aug 22, 2022 9:44 AM
  • URL: https://guides.library.sc.edu/empirical

Purdue University

  • Ask a Librarian

Research: Overview & Approaches

  • Getting Started with Undergraduate Research
  • Planning & Getting Started
  • Building Your Knowledge Base
  • Locating Sources
  • Reading Scholarly Articles
  • Creating a Literature Review
  • Productivity & Organizing Research
  • Scholarly and Professional Relationships

Introduction to Empirical Research

Databases for finding empirical research, guided search, google scholar, examples of empirical research, sources and further reading.

  • Interpretive Research
  • Action-Based Research
  • Creative & Experimental Approaches

Your Librarian

Profile Photo

  • Introductory Video This video covers what empirical research is, what kinds of questions and methods empirical researchers use, and some tips for finding empirical research articles in your discipline.

Video Tutorial

  • Guided Search: Finding Empirical Research Articles This is a hands-on tutorial that will allow you to use your own search terms to find resources.

Google Scholar Search

  • Study on radiation transfer in human skin for cosmetics
  • Long-Term Mobile Phone Use and the Risk of Vestibular Schwannoma: A Danish Nationwide Cohort Study
  • Emissions Impacts and Benefits of Plug-In Hybrid Electric Vehicles and Vehicle-to-Grid Services
  • Review of design considerations and technological challenges for successful development and deployment of plug-in hybrid electric vehicles
  • Endocrine disrupters and human health: could oestrogenic chemicals in body care cosmetics adversely affect breast cancer incidence in women?

where can i find empirical research articles

  • << Previous: Scholarly and Professional Relationships
  • Next: Interpretive Research >>
  • Last Updated: May 23, 2024 11:51 AM
  • URL: https://guides.lib.purdue.edu/research_approaches

Banner

  • University of Memphis Libraries
  • Research Guides

Empirical Research: Defining, Identifying, & Finding

Identifying empirical research.

  • Defining Empirical Research

Finding the Characteristics of Empirical Research in an Article

The abstract.

  • Introduction
  • Database Tools
  • Search Terms
  • Image Descriptions

Once you know the characteristics of empirical research , the next question is how to find those characteristics when reading a scholarly, peer-reviewed journal article. Knowing the basic structure of an article will help you identify those characteristics quickly. 

The IMRaD Layout

Many scholarly, peer-reviewed journal articles, especially empirical articles, are structured according to the IMRaD layout. IMRaD stands for "Introduction, Methods, Results, and Discussion." These are the major sections of the article, and each part has an important role: 

  • Introduction: explains the research project and why it is needed. 
  • Methods: details how the research was conducted. 
  • Results: provides the data from the research.
  • Discussion: explains the importance of the results. 

While an IMRaD article will have these sections, it may use different names for these sections or split them into subsections. 

While just because an article is structured in an IMRaD layout is not enough to say it is empirical, specific characteristics of empirical research are more likely to be in certain sections , so knowing them will help you find the characteristics more quickly. Click the link for each section to learn what empirical research characteristics are in that section and common alternative names for those sections: 

Use this video for a quick overview of the sections of an academic article: 

Journal articles will also have an abstract which summarizes the article. That summary often includes simplified information from different IMRaD sections, which can give you a good sense of whether the research is empirical. Most library databases and other academic search tools will show you the abstract in your search results, making it the first place you can look for evidence that an article is empirical. 

There are two types of abstracts: structured and unstructured. 

Structured Abstracts

Structured abstracts   are organized and labeled in a way that replicates the IMRaD format. If you know what characteristics of empirical research are located in a particular IMRaD section, you can skim that section of the structured abstract to look for them. 

Example of a structured abstract.  Long description available through "Image description" link.

[ Image description ] 

Unstructured Abstracts

Unstructured abstracts   do not label the parts of the summary and are generally a single block paragraph. You will not be able to skim through an unstructured abstract for empirical research characteristics as easily, but some of those characteristics will still be there. Often the unstructured abstract will include some version of the research question and simplified descriptions of the design, methodology, and sample. 

Example of an unstructured abstract. Long description available through "Image description" link.

[ Image description ]

  • << Previous: Defining Empirical Research
  • Next: Introduction >>
  • Last Updated: Apr 2, 2024 11:25 AM
  • URL: https://libguides.memphis.edu/empirical-research

Banner

  • University of La Verne
  • Subject Guides

Identify Empirical Research Articles

  • Interactive Tutorial
  • Literature Matrix
  • Guide to the Successful Thesis and Dissertation: A Handbook for Students and Faculty
  • Practical Guide to the Qualitative Dissertation
  • Guide to Writing Empirical Papers, Theses, and Dissertations

What is a Literature Review--YouTube

Literature Review Guides

  • How to write a literature review
  • The Literature Review:a few steps on conducting it Permission granted from Writing at the University of Toronto
  • Six steps for writing a literature review This blog, written by Tanya Golash-Bozal PhD an Associate Professor of Sociology at the University of California at Merced, offers a very nice and simple advice on how to write a literature review from the point of view of an experience professional.
  • The Writing Center, University of North Carolina at Chapel Hill Permission granted to use this guide.
  • Writing Center University of North Carolina
  • Literature Reviews Otago Polytechnic in New Zealand produced this guide and in my opinion, it is one of the best. NOTE: Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. However, all images and Otago Polytechnic videos are copyrighted.

What Are Empirical Articles?

As a student at the University of La Verne, faculty may instruct you to read and analyze empirical articles when writing a research paper, a senior or master's project, or a doctoral dissertation. How can you recognize an empirical article in an academic discipline? An empirical research article is an article which reports research based on actual observations or experiments. The research may use quantitative research methods, which generate numerical data and seek to establish causal relationships between two or more variables.(1) Empirical research articles may use qualitative research methods, which objectively and critically analyze behaviors, beliefs, feelings, or values with few or no numerical data available for analysis.(2)

How can I determine if I have found an empirical article?

When looking at an article or the abstract of an article, here are some guidelines to use to decide if an article is an empirical article.

  • Is the article published in an academic, scholarly, or professional journal? Popular magazines such as Business Week or Newsweek do not publish empirical research articles; academic journals such as Business Communication Quarterly or Journal of Psychology may publish empirical articles. Some professional journals, such as JAMA: Journal of the American Medical Association publish empirical research. Other professional journals, such as Coach & Athletic Director publish articles of professional interest, but they do not publish research articles.
  • Does the abstract of the article mention a study, an observation, an analysis or a number of participants or subjects? Was data collected, a survey or questionnaire administered, an assessment or measurement used, an interview conducted? All of these terms indicate possible methodologies used in empirical research.
  • Introduction -The introduction provides a very brief summary of the research.
  • Methodology -The method section describes how the research was conducted, including who the participants were, the design of the study, what the participants did, and what measures were used.
  • Results -The results section describes the outcomes of the measures of the study.
  • Discussion -The discussion section contains the interpretations and implications of the study.
  • Conclusion -
  • References -A reference section contains information about the articles and books cited in the report and should be substantial.
  • How long is the article? An empirical article is usually substantial; it is normally seven or more pages long.

When in doubt if an article is an empirical research article, share the article citation and abstract with your professor or a librarian so that we can help you become better at recognizing the differences between empirical research and other types of scholarly articles.

How can I search for empirical research articles using the electronic databases available through Wilson Library?

  • A quick and somewhat superficial way to look for empirical research is to type your search terms into the database's search boxes, then type STUDY OR STUDIES in the final search box to look for studies on your topic area. Be certain to use the ability to limit your search to scholarly/professional journals if that is available on the database. Evaluate the results of your search using the guidelines above to determine if any of the articles are empirical research articles.
  • In EbscoHost databases, such as Education Source , on the Advanced Search page you should see a PUBLICATION TYPE field; highlight the appropriate entry. Empirical research may not be the term used; look for a term that may be a synonym for empirical research. ERIC uses REPORTS-RESEARCH. Also find the field for INTENDED AUDIENCE and highlight RESEARCHER. PsycArticles and Psycinfo include a field for METHODOLOGY where you can highlight EMPIRICAL STUDY. National Criminal Justice Reference Service Abstracts has a field for DOCUMENT TYPE; highlight STUDIES/RESEARCH REPORTS. Then evaluate the articles you find using the guidelines above to determine if an article is empirical.
  • In ProQuest databases, such as ProQuest Psychology Journals , on the Advanced Search page look under MORE SEARCH OPTIONS and click on the pull down menu for DOCUMENT TYPE and highlight an appropriate type, such as REPORT or EVIDENCE BASED. Also look for the SOURCE TYPE field and highlight SCHOLARLY JOURNALS. Evaluate the search results using the guidelines to determine if an article is empirical.
  • Pub Med Central , Sage Premier , Science Direct , Wiley Interscience , and Wiley Interscience Humanities and Social Sciences consist of scholarly and professional journals which publish primarily empirical articles. After conducting a subject search in these databases, evaluate the items you find by using the guidelines above for deciding if an article is empirical.
  • "Quantitative research" A Dictionary of Nursing. Oxford University Press, 2008. Oxford Reference Online. Oxford University Press. University of La Verne. 25 August 2009
  • "Qualitative analysis" A Dictionary of Public Health. Ed. John M. Last, Oxford University Press, 2007. Oxford Reference Online . Oxford University Press. University of La Verne. 25 August 2009

Empirical Articles:Tips on Database Searching

  • Identifying Empirical Articles
  • Next: Interactive Tutorial >>
  • Last Updated: Mar 5, 2024 4:49 PM
  • URL: https://laverne.libguides.com/empirical-articles
  • UMGC Library
  • Ask a Librarian

Q. How do I find an empirical research article?

  • 3 Academic Integrity
  • 2 Academic Search Complete
  • 1 Alumni Library Services
  • 1 Amazon Career Choice
  • 1 Annotated Bibliographies
  • 1 Artificial Intelligence
  • 1 Ask a Librarian
  • 3 Books 24x7
  • 2 Career Insider by Vault
  • 1 CareerQuest
  • 1 Case Studies
  • 1 catalogUSMAI
  • 28 Citation
  • 6 Cited References
  • 20 Class Assignments
  • 7 Company Research
  • 1 Copyright
  • 25 Databases
  • 4 Doctoral Program
  • 5 DocumentExpress
  • 2 Draft Coach
  • 4 eReadings
  • 3 eReserves
  • 13 Error message
  • 2 Faculty Training, Library Access
  • 4 Full-text
  • 1 Impact Factor
  • 6 Industry Research
  • 1 Institutional Repository
  • 2 Journal Finder
  • 1 Journal List
  • 1 LexisNexis Academic
  • 1 Library catalog
  • 10 Library Information
  • 1 Library Liaisons
  • 2 MasterFile Premier
  • 2 Newspapers
  • 3 Nexis Uni
  • 5 OneSearch
  • 1 Paper Formatting
  • 2 Peer Reviewed & Scholarly Articles
  • 2 Plagiarism Tutorials
  • 1 Primary Resources
  • 2 Print books
  • 1 recording
  • 1 Research Guides
  • 1 SAGE Business Cases
  • 3 Scholarly Articles
  • 16 Searching
  • 1 Software and Hardware
  • 4 Textbooks
  • 1 Ulrichsweb Global Serials Directory
  • 1 UMUC Research and Scholarship
  • 1 Wall Street Journal
  • 4 Writing Help

Answered By: Robert Miller Last Updated: Jan 05, 2024     Views: 368

What is an empirical article?

Characteristics of an empirical article

How do I find empirical articles?

Further details

Empirical research articles are also known as experimental or primary research articles.

Empirical articles are written by scientists reporting on an experiment or similar research that they conducted.

You'll find empirical articles in scholarly journals (also known as academic or peer-reviewed journals) within the library databases.

An empirical article will almost always be written in technical, specialized language, intended for an audience of experts rather than the general public (the writing will "sound" scientific). Often, you'll see quantitative (numerical) data arranged in tables or charts. And an empirical article will almost always have a specific structure following (more or less) these headings within the article:

  • The author summarizes her article
  • The author discusses the general background of her research topic; often, she will present a literature review, that is, summarize what other experts have written on this particular research topic
  • The author describes the study she designed and conducted
  • The author presents the data she gathered during her experiment
  • The author offers ideas about the importance and implications of her research findings, and speculates on future directions that similar research might take
  • The author gives a References list of sources she used in her paper

A reasonable keyword search on almost any scientific, medical, or technical topic, for example:

  • trauma AND "therapy animals"
  • "climate change" AND "polar ice"

will bring up many empirical articles in the following databases, which, being science-oriented, contain almost exclusively empirical articles. So just review the characteristics of an empirical article above, and you should be able to find them in library databases such as:

  • Academic Search Ultimate
  • Science Direct
  • APA PsycArticles
  • APA PsycInfo

In other databases, such Business Source Ultimate or Environment Complete , if you limit your search to "scholarly" only (sometimes labeled "academic" or "peer-reviewed" only), then many of your results will probably be empirical articles. Again, review an article to see if it matches the characteristics outlined above .

  • Finding Experimental (Empirical) Research Articles (psychology)
  • Primary Research Articles (general science)
  • Share on Facebook

Was this helpful? Yes 0 No 0

Need more help?

Related topics.

Penn State University Libraries

Empirical research in the social sciences and education.

  • What is Empirical Research and How to Read It
  • Finding Empirical Research in Library Databases
  • Designing Empirical Research
  • Ethics, Cultural Responsiveness, and Anti-Racism in Research
  • Citing, Writing, and Presenting Your Work

Contact the Librarian at your campus for more help!

Ellysa Cahoy

Introduction: What is Empirical Research?

Empirical research is based on observed and measured phenomena and derives knowledge from actual experience rather than from theory or belief. 

How do you know if a study is empirical? Read the subheadings within the article, book, or report and look for a description of the research "methodology."  Ask yourself: Could I recreate this study and test these results?

Key characteristics to look for:

  • Specific research questions to be answered
  • Definition of the population, behavior, or   phenomena being studied
  • Description of the process used to study this population or phenomena, including selection criteria, controls, and testing instruments (such as surveys)

Another hint: some scholarly journals use a specific layout, called the "IMRaD" format, to communicate empirical research findings. Such articles typically have 4 components:

  • Introduction : sometimes called "literature review" -- what is currently known about the topic -- usually includes a theoretical framework and/or discussion of previous studies
  • Methodology: sometimes called "research design" -- how to recreate the study -- usually describes the population, research process, and analytical tools used in the present study
  • Results : sometimes called "findings" -- what was learned through the study -- usually appears as statistical data or as substantial quotations from research participants
  • Discussion : sometimes called "conclusion" or "implications" -- why the study is important -- usually describes how the research results influence professional practices or future studies

Reading and Evaluating Scholarly Materials

Reading research can be a challenge. However, the tutorials and videos below can help. They explain what scholarly articles look like, how to read them, and how to evaluate them:

  • CRAAP Checklist A frequently-used checklist that helps you examine the currency, relevance, authority, accuracy, and purpose of an information source.
  • IF I APPLY A newer model of evaluating sources which encourages you to think about your own biases as a reader, as well as concerns about the item you are reading.
  • Credo Video: How to Read Scholarly Materials (4 min.)
  • Credo Tutorial: How to Read Scholarly Materials
  • Credo Tutorial: Evaluating Information
  • Credo Video: Evaluating Statistics (4 min.)
  • Next: Finding Empirical Research in Library Databases >>
  • Last Updated: Feb 18, 2024 8:33 PM
  • URL: https://guides.libraries.psu.edu/emp

where can i find empirical research articles

  • Brandeis Library
  • Research Guides
  • Find Empirical Studies
  • Find Articles & Databases
  • Reference Works
  • Find Data & Statistics

What is an Empirical Study?

Where to look for empirical studies, find empirical studies: apa psycinfo & apa psycarticles, find empirical studies: eric.

  • Search Strategies

An  empirical study  reports the findings from a study   that uses data derived from an actual experiment or observation. Key components of an empirical study:

  • Abstract - Provides a brief overview of the research.
  • Introduction - The introduction contextualizes the research by providing a review of previous research on the topic. It also is the section where the hypothesis is stated. 
  • Method  - The methods area describes how the research was conducted, including the design of the study, who the participants were and what they did, and any measurements that were taken during the study.
  • Results  - The results section describes the outcome of the study. 
  • Discussion  - The discussion section addresses the researchers' interpretations of their study and any future implications from their findings.
  • References  - A list of works that were cited in the study.

Try searching in the following databases for empirical studies in education:

  • APA PsycArticles This link opens in a new window Covers general psychology and specialized, basic, applied, clinical and theoretical research in psychology. Contains all journal articles, letters to the editor and errata from each of 49 journals by the APA and 9 from allied organizations. Coverage 1988 to the present.
  • APA PsycInfo (EBSCO) This link opens in a new window Citations and summaries of journal articles, book chapters, books, dissertations and technical reports in psychology. Includes information about the psychological aspects of related disciplines such as medicine, psychiatry, nursing, sociology, education, pharmacology, physiology, linguistics, anthropology, business and law. Coverage 1887 to present, includes 1,700+ international sources in over 35 languages.
  • Education Research Complete (EBSCO) This link opens in a new window Scholarly journal articles, dissertations, professional development resources, and other materials on topics related to the field of education. Coverage includes early childhood through adult education and all education specialties.
  • ERIC (ProQuest) This link opens in a new window Database sponsored by the U.S. Department of Education that provides access to scholarly journals, curriculum and teaching guides, research reports, and other materials related to the field of education.

There are a few strategies you can use to limit your search results to empirical studies within education and related disciplines. 

Two of our databases, APA PsycINFO and APA PsycARTICLES offer the option to limit search results to a specific methodology, including empirical studies. The methodology search facet is located on the advanced search page for APA PsycINFO and APA PsycARTICLES near the bottom of the page:  

PsycINFO advanced search page with "Empirical Study" selected under the Methodology search facet.

ERIC (ProQuest)   doesn't have an easy way to search for empirical studies. You can try adding "empirical" into your search terms or you can try limiting your search results to research reports by completing the following steps:

  • Use the "Advanced Search" page
  • Type in your search terms
  • Scroll down the page to "Document Type" and select "143: Reports- Research"
  • Click "Search"

Document Type facet in ERIC. 143: Reports - Research is selected.

Another strategy for finding empirical studies is to add different combinations of the following search terms: 

  • methodology (or method)
  • action research
  • participant observation OR participants
  • qualitative or quantitative
  • << Previous: Find Data & Statistics
  • Next: Search Strategies >>
  • Last Updated: Apr 3, 2024 11:43 AM
  • URL: https://guides.library.brandeis.edu/education

Kennesaw State University

Library System

  • Kennesaw State University
  • Ask-A-Librarian
  • Research and Instructional Services

KSU Library System FAQs

  • Access Services
  • Collection Development
  • Interlibrary Loan and GIL Express
  • 14 Faculty Services
  • 54 Research

Q: How do I find empirical articles?

An empirical article is a research article that reports the results of a study that uses data derived from actual observation or experimentation. Empirical articles may also be called research reports, research studies, or empirical studies. They are most common in Education or Psychology.

Empirical research articles are published in scholarly or academic journals. These journal are also called “peer-reviewed,” or “refereed” publications.

Empirical articles often contain these sections:

  • Introduction
  • Literature review
  • Methodology

The sections may vary in the articles; however, the information that would fall within these sections should be present in an empirical article.

You may use the following databases to locate empirical articles.

1.  PsycINFO and PsychARTICLES

Select Advanced Search

Screen shot of EBSCO search menu with "Advanced Search" highlighted with a red frame.

  • Enter search terms in the search box at top of the screen
  • Scroll down to Methodology and select: Empirical Study. There are subsets below this category that you can choose based on your needs.

2.  ERIC

  • Enter your search terms in the search box at top of the screen
  • Scroll down the screen and locate the Publication Types dropdown box

where can i find empirical research articles

3.  Academic Search Complete

Screen shot of EBSCO search menu with "Advanced Search" highlighted with a red frame.

  • In the first line of the search box, enter your search terms
  • In a different line of the search box, enter the following:

         "study OR methodology OR subjects OR data OR results OR findings OR discussion"

Screen shot of EBSCO "Advanced Search" open and the words "study OR methodology OR subjects OR data OR results OR findings OR discussion" entered into the second search field.

For information on how to find a database, please click  here

For more information

For more help, email the Research and Instructional Services department , or call or visit:

Kennesaw: Sturgis Library Checkout Desk (Ground floor) 470-578-6202

Marietta: Johnson Library Checkout Desk (1 st  floor) 470-578-7276

Last updated

Last updated.

  • Share on Facebook

Was this helpful? Yes 0 No 0

Research Help

Research Help 

Schedule a Research Consultation

How to Find Articles Based on Experimental Research/FCS

  • Experimental Research
  • Research Concepts
  • Finding Info in PsycINFO

Profile Photo

What is Experimental Research?

" Experimental Research " is based on observed and measured phenomena and derives knowledge from actual experience rather than from theory or belief.  This type of research may be referred to as Empirical Research, Qualitative Research, or Quantitative Research.  How do you know if a study is experimental? Read the subheadings within the article, book, or report and look for a description of the research "methodology."  Ask yourself: Would it be possible to recreate this study and test these results?  

Key characteristics to look for:  

  • Specific research questions to be answered
  • Definition of the population, behavior, or   phenomena being studied
  • Description of the process used to study this population or phenomena, including selection criteria, controls, and testing instruments (such as surveys)

Another hint: some scholarly journals use a specific layout, called the "IMRaD" format, to communicate empirical research findings. Such articles typically have 4 components:  

  • Introduction : sometimes called "literature review" -- what is currently known about the topic -- usually includes a theoretical framework and/or discussion of previous studies
  • Methodology: sometimes called "research design" -- how to recreate the study -- usually describes the population, research process, and analytical tools
  • Results : sometimes called "findings" -- what was learned through the study -- usually appears as statistical data or as substantial quotations from research participants
  • Discussion : sometimes called "conclusion" or "implications" -- why the study is important -- usually describes how the research results influence professional practices or future studies

For additional help in deciding if you have a research article, see:

  • How to Evaluate Your Article Search Results in 5 Minutes by Steve Brantley Last Updated Sep 18, 2023 77 views this year
  • How to Read a Journal Article by Steve Brantley Last Updated Sep 15, 2023 37 views this year
  • Next: Research Concepts >>
  • Last Updated: Aug 14, 2023 11:46 AM
  • URL: https://eiu.libguides.com/EmpArt

Eastern Illinois University Logo

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts

Collection 

Empirical modeling

Empirical modeling involves the development of models that explain, predict, or simulate a particular aspect of the world, rather than purely theoretical or abstract principles. Empirical modeling starts with real-world data and observations, and then builds frameworks that are calibrated and validated against datasets. Specifically, statistical analysis and simulation techniques are employed to extract patterns and inferences, and to test hypotheses about how different variables interact. Empirical models are applied in a wide range of fields, including economics, epidemiology, environmental science, social sciences, and engineering, providing valuable insights and support for decision-making in each. More importantly, empirical modeling is iterative and dynamic. As new data become available, models are refined and updated to improve their accuracy and relevance. This ongoing process of validation and recalibration is what makes empirical modeling particularly powerful in dealing with complex, evolving issues.

This Collection welcomes original research on developing more adaptable, interpretable, and predictive approaches via the integration of advanced statistical methods, machine learning algorithms, and data science principles.

statistics data curve

Lazaros Gallos, PhD

Rutgers University, USA

Song-Ju Kim, PhD

Tokyo University of Science, Japan

  • Collection content
  • How to submit
  • About the Guest Editors
  • Collection policies

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

where can i find empirical research articles

Position: Why We Must Rethink Empirical Research in Machine Learning

We warn against a common but incomplete understanding of empirical research in machine learning that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical machine learning research is fashioned as confirmatory research while it should rather be considered exploratory.

1 The Non-Replicable ML Research Enigma

In his Caltech commencement address “Cargo Cult Science” ∗ , 1 1 1 As our paper contains some jargon, we have included a glossary in the appendix; asterisks ( ∗ ) (*) ( ∗ ) in the text denote covered terms. Richard Feynman ( 1974 ) described how researchers employ practices that conflict with scientific principles to adhere to a certain way of doing things. This position paper warns against similar tendencies in empirical research in machine learning (ML) and calls for a mindset change to address methodological and epistemic challenges of experimentation. There is ML research that does not replicate.     From an empirical scientific perspective, non-replicable research is a fundamental problem. As Karl Popper (p. 66 1959/2002 ) phrased it: “non-reproducible single occurrences are of no significance to science.” 2 2 2 Reproducible here does not refer to exact computational reproducibility ∗  but generally to arriving at the same scientific conclusions, termed replicability ∗  in this paper. Consequently, ML research that does not replicate has far-reaching epistemic ∗   and practical consequences. From an epistemological ∗  point of view, it means that research results are unreliable and, to some extent, it calls into question progress in the field. In practice, it may jeopardize applied empirical researchers’ confidence in experimental results and discourage them from applying ML methods, even though these novel approaches might be beneficial. For example, ML is increasingly being used in the medical domain, and this is often promising in terms of patient benefit. However, there are also examples indicating that applied researchers (are starting to) have concerns about ML being used in this high-stakes area. Consider, for example, this quite drastic warning by Dhiman et al. ( 2022 , p. 2) : “Machine learning is often portrayed as offering many advantages […]. However, these advantages have not yet materialised into patient benefit […]. Given the increasing concern about the methodological quality and risk of bias of prediction model studies [emphasis added], caution is warranted and the lack of uptake of models in medical practice is not surprising.” That is, if the ML community does not improve rigor in empirical methodological research, we think there may be a risk of a backlash against the use of ML in practice. In general, there is a growing body of empirical evidence showing that conclusions drawn from experimental results in ML were overly optimistic at the time of publication and could not be replicated in subsequent studies. For example, Melis et al. ( 2018 , p. 1) “arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularized, outperform more recent models”; Henderson et al. ( 2018 , p. 3213) found for deep reinforcement learning “that both intrinsic (e.g. random seeds, environment properties) and extrinsic sources (e.g. hyperparameters, codebases) of non-determinism can contribute to difficulties in reproducing baseline algorithms”; Christodoulou et al. ( 2019 , p. 12) found in a systematic review “no performance benefit of machine learning over logistic regression for clinical prediction models”; Elor & Averbuch-Elor ( 2022 , p. 1) found in their study on data balancing in classification “that balancing does not improve prediction performance for the strong” classifiers; see also Lucic et al. ( 2018 ) , Riquelme et al. ( 2018 ) , Raff ( 2019 ) , Herrmann et al. ( 2020 ) , Ferrari Dacrema et al. ( 2021 ) , Marie et al. ( 2021 ) , Buchka et al. ( 2021 ) , Narang et al. ( 2021 ) , van den Goorbergh et al. ( 2022 ) , Mateus et al. ( 2023 ) , McElfresh et al. ( 2023 ) , or the surveys by Liao et al. ( 2021 ) and Kapoor & Narayanan ( 2023 ) for similar findings. In concrete terms, there is published ML research that is, as Popper would say, of no significance to science , but we do not know how much! We have been warned; don’t we listen?     We are by no means the first to raise these and related issues, and the very fact that we are not the first is a matter of even graver concern. We think that empirical research in ML finds itself in a situation where practicing questionable research practices, such as state-of-the-art-hacking (SOTA-hacking; Gencoglu et al., 2019 ; Hullman et al., 2022 ), has sometimes become more rewarding than following the long line of literature warning against it. Langley wrote an editorial “Machine Learning as an Experimental Science” as early as 1988, and Drummond and Hand pointed out problems with experimental method comparison in ML already in 2006. Apart from these specific examples, there is a range of literature over the last decades dealing with similar issues (e.g., Hooker, 1995 ; McGeoch, 2002 ; Johnson, 2002 ; Drummond, 2009 ; Drummond & Japkowicz, 2010 ; Mannarswamy & Roy, 2018 ; Sculley et al., 2018 ; Lipton & Steinhardt, 2018 ; Bouthillier et al., 2019 ; Liao et al., 2021 ; D’Amour et al., 2022 ; Raff & Farris, 2022 ; Lones, 2023 ; Trosten, 2023 ) . Specifically relevant is the paper by Nakkiran & Belkin ( 2022 , p. 2) , in which they note a “perceived lack of legitimacy and real lack of community for good experimental science” (still) exists. If we continue not to take these warnings seriously the amount of non-replicable research will only continue to increase, as the cited very recent empirical findings indicate. We do not believe that deliberate actions on the part of individuals have led to this situation but that there is a general unawareness of the fact that, while “follow[ing] all the apparent precepts and forms of scientific investigation [in ML],” one can be “missing something essential.” In particular, this includes that “if you’re doing an experiment, you should report everything that you think might make it invalid—not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked” (quotes from Feynman, 1974 , p. 11) . Misaligned incentives and pressure to publish positive results contribute to this situation (e.g., Smaldino & McElreath, 2016 ) . One of a kind? At the intersection of formal and empirical sciences.     We believe that one of the main reasons for this is that ML stands, like few other disciplines, at the interface between formal sciences and real-world applications. Because ML has strong foundations in formal sciences such as mathematics, (theoretical) computer science (CS), and mathematical statistics, many ML researchers are accustomed to reasoning mathematically about abstract objects – ML methods – using formal proofs. On the other hand, ML can also very much be considered a (software) engineering science, to create practical systems that can learn and improve their performance by interacting with their environment. Lastly, and especially concerning experimentation in ML, there exists an applied statistical perspective with a focus on thorough inductive reasoning. With its tradition in data analysis and design of experiments, it emphasizes the empirical aspects of ML research. These different perspectives, with their specific objectives, methodology, and terminology, have their unique virtues, but they also have their blind spots. The formal science perspective aims at providing absolute certainty and deep insights through the definition of abstract concepts and mathematical proofs but is often not well suited to explain complex real-world phenomena, as these concepts and proofs very often have to be based on strongly simplifying assumptions. The engineering perspective brought us incredible application improvements, but at the same time, not all conducted experiments are optimally designed to generalize results beyond the specific application context (which is also often only implicitly defined), as the references provided at the beginning demonstrate. A statistical perspective , which we adopt here, is very sensitive to such empirical issues – explaining/analyzing real-world phenomena and generalizing beyond a specific context (inductive reasoning) – and thus particularly suited to explain 1) why ML is faced with non-replicable research, and 2) how a more complete and nuanced understanding of empirical research in ML can help to overcome this situation. With empirical ML we thus mean in a broad sense the systematic investigation of ML algorithms, techniques, and conceptual questions through simulations, experimentation, and observation. It deals with real objects: implementations of algorithms – which are usually more complex than their theoretical counterparts (e.g., Kriegel et al., 2017 ) – running on physical computers; data gathered and produced/simulated in the real world; and their interplay. Rather than focusing solely on theoretical analysis and proofs, empirical research emphasizes practical evaluations using real-world and/or synthetic data. Empirical ML, as understood here, requires a mindset very different from engineering and formal sciences and a different approach to methodology to allow for the full incorporation of the uncertainties inherent in dealing with real-world entities in experiments. In our view, the discussed literature, raising similar points, has two main shortcomings: 1) they address only specific aspects of the problem and do not provide a comprehensive picture; 2) there is a confusion of terminology. For example, Bouthillier et al. ( 2019 ) distinguish between exploratory and empirical research. Nakkiran & Belkin ( 2022 ) use the term good experimental research and contrast it in particular with improving applications . Sculley et al. ( 2018 ) talk about empirical advancements and empirical analysis that are not complemented by sufficient empirical rigor . And Drummond ( 2006 ) discusses ML as an experimental science hardly using the term empirical at all. To overcome these issues, we gather opinions and (empirical) evidence scattered across the literature and different domains and try to develop a comprehensive synthesis . For example, similar problems have been discussed in bioinformatics for some time (e.g., Yousefi et al., 2010 ; Boulesteix, 2010 ) . We also take into account literature from other, more distant fields facing related issues, such as psychology and medicine. We believe this comprehensive picture will allow for a broader and deeper understanding of the complexity of the situation, which may at first glance appear to be rather easy to solve, e.g., by more (rigorous) statistical machinery or more open research artifacts. It is our conviction that without this deeper understanding, a situation that has been warned about in vain for so long cannot be overcome.

2 The Status Quo of Empirical ML

Recent advances..

It is important to emphasize that there have been encouraging first steps in terms of empirical ML research recently. This includes the newly created publication formats Transactions on Machine Learning Research (TMLR), Journal of Data-centric Machine Learning Research (DMLR), or the NeurIPS Datasets and Benchmarks Track launched in 2021. These venues explicitly include in their scope, e.g., “reproducibility studies of previously published results or claims” ( TMLR, n.d. ) , “systematic analyses of existing systems on novel datasets or benchmarks that yield important new insight” ( DMLR, n.d. ) , and “systematic analyses of existing systems on novel datasets yielding important new insight” ( NeurIPS, n.d. ) . Further examples are the I Can’t Believe It’s Not Better! (ICBINB) workshop series (e.g., Forde et al., 2020 ) and Repository of Unexpected Negative Results ( ICBINB Initiative, n.d. ) and efforts towards preregistration (e.g., Albanie et al., 2021 ) and reproducibility (e.g., Sinha et al., 2023 , ) . These developments, while very important, are not sufficient in our view to overcome the problems empirical ML faces. For example, while computational reproducibility ∗  may be a necessary condition, it is not a sufficient condition for replicability (e.g., Bouthillier et al., 2019 ) . Furthermore, while the topics in the above formats cover many important aspects of empirical ML, we feel that they do not emphasize enough the importance of true replication of research, which is paramount from an empirical perspective. Most importantly, a situation in which a line of research warning us for a long time has been largely neglected will not be overcome by such practical changes alone. It also requires a change in awareness – of the importance of proper empirical ML but maybe even more of its limitations; and that there are different, equally valid types of proper empirical inquiry. We see this lack of awareness evidenced by TMLR (n.d.) itself: “TMLR emphasizes technical correctness over subjective significance, to ensure that we facilitate scientific discourse on topics that may not yet be accepted in mainstream venues [emphasis added] but may be important in the future.” This is expressed in the talk introducing TMLR, too. 3 3 3 TMLR - A New Open Journal For Machine Learning: https://youtu.be/Uc1r1LfJtds Judging by the example of other empirical sciences, this general lack of awareness of proper empirical ML is certainly the most difficult thing to overcome. Below we discuss problems we identified as symptoms of this lack.

Problem 1: Lack of unbiased experiments and scrutiny.     Most method comparisons are carried out as part of a paper introducing a new method and are usually biased in favor of the new method (see Section  1 for examples). Sculley et al. ( 2018 , p. 1) found that “[l]ooking over papers from the last year, there seems to be a clear trend of multiple groups finding that prior work in fast moving fields may have missed improvements or key insights due to things as simple as hyperparameter tuning studies [∗] or ablation studies.” Moreover, for a neutral method comparison study of survival prediction methods, it has been shown that method rankings can vary considerably depending on design and analysis choices made at the meta-level (e.g., the selected set of datasets, performance metric, aggregation method) and that any method – even a simple baseline – can achieve almost any rank ( Nießl et al., 2022 ; see also Sonabend et al., 2022 ). We are convinced that it is not far-fetched to conclude that quite often results demonstrating the superiority of a newly proposed method are obtained by an experimental design favorable to that method. As in other disciplines (Munafò et al., 2017 ) , there are structural issues (e.g., publication bias, pressure to publish, lack of replication studies) and questionable practices (e.g., hypothesizing after the results are known [ Kerr, 1998 , HARKing, ] and p 𝑝 p italic_p -hacking [ Simonsohn et al., 2014 ]) that contribute to this lack of unbiased experiments and scrutiny. At the individual level, in particular, there is a lack of awareness that method comparisons performed as part of a paper introducing a new method are not well suited to draw reliable conclusions about a method beyond the datasets considered, especially if 1) the number of datasets considered is small  (Dehghani et al., 2021 ; Koch et al., 2021 ) , 2) there is meta-level overfitting on a single benchmark design  (Recht et al., 2019 ; Beyer et al., 2020 ) , 3) the set of datasets selected for the experiments is biased in favor of the newly proposed method, and 4) the authors are much more familiar with the new method than with its competitors, as is the case frequently (Johnson, 2002 ; Boulesteix et al., 2013 , 2017 ) . Furthermore, it is very easy to artificially make a method appear superior (e.g., Jelizarow et al., 2010 ; Norel et al., 2011 ; Nießl et al., 2022 ; Ullmann et al., 2023 ; Pawel et al., 2024 ; Nießl et al., 2024 ) , and publication bias towards positive results is a strong incentive to engage in SOTA-hacking and demonstrate the superiority of a newly proposed method (Sculley et al., 2018 ; Gencoglu et al., 2019 ) . At the system level there is a publication bias and a lack of replication and neutral method comparison studies (e.g., Boulesteix et al., 2013 , 2015b ) . Sculley et al. ( 2018 , p. 1) “observe that the rate of empirical advancement [larger and more complex experiments] may not have been matched by consistent increase in the level of empirical rigor across the field as a whole.” In unsupervised learning, the problem is more pronounced than in supervised learning because “there is much less of a benchmarking tradition in the clustering area than in the field of supervised learning” (Van Mechelen et al., 2023 , p. 2; see also Zimmermann, 2020 ) .

Problem 2: Lack of legitimacy.     The second problem highlights a specific aspect of the lack of awareness of how different types of empirical research can contribute to ML. The problem was addressed by Nakkiran & Belkin ( 2022 ) and we completely agree with their description:    “In mainstream ML venues, there is a perceived lack of legitimacy and a real lack of community for good experimental science – which neither proves a theorem nor improves an application. This effectively suppresses a mode of scientific inquiry which has historically been critical to scientific progress, and which has shown promise in both ML and in CS more generally” (Nakkiran & Belkin, 2022 , p. 2) . They identify a strong bias of the ML community towards mathematical proofs (formal science perspective) and application improvements (engineering perspective), while good experimental science that does not focus on one of the above is not incentivized nor encouraged. Nakkiran & Belkin ( 2022 ) see this evidenced by the lack of specific subject areas, the exclusion from recent calls for papers, the lack of explicit guidelines for reviewers, and the organization of separate workshops on experimental scientific investigation at major ML conferences. In particular, reviewers “often ask for application improvements” and “for ‘theoretical justification’ for purely experimental papers” (Nakkiran & Belkin, 2022 , pp. 2–3) . Together these factors point to a structural problem hindering the recognition and promotion of some sorts of experimental research in ML. We completely agree with this view but think it may not immediately be clear what distinguishes improving an application from good experimental science at first sight. 4 4 4 To avoid misunderstandings: we do consider mathematical proofs and application improvements very valuable research! As we understand it, the focus on application improvement means that much empirical/experimental research in ML focuses on developing a new method and demonstrating that it is superior to existing methods by improving some (predictive) performance metric on specific real-world benchmark datasets. Good experimental science, on the other hand, is not about improving performance. It is about improving understanding and knowledge of a problem, a (class of) methods, or a phenomenon. Sculley et al. ( 2018 , p. 2) emphasize that “[e]mpirical studies [in ML] have become challenges to be ‘won’, rather than a process for developing insight and understanding. Ideally, the benefit of working with real data is to tune and examine the behavior of an algorithm under various sampling distributions, to learn about the strengths and weaknesses of the algorithms, as one would do in controlled studies.” And Rendsburg et al. ( 2020 , p. 9) argue, “it is particularly important that our community actively attempts to understand the inherent inductive biases, strengths, and also the weaknesses of algorithms. Finding examples where an algorithm works is important – but maybe even more important is to understand under which circumstances the algorithm produces misleading results.”

Problem 3: Lack of conceptual clarity and operationalization.     There is a perceived lack of clarity about some important abstract concepts that are the objects of ML research on the one side and a lack of clear operationalization ∗  in empirical investigations on the other side. Both aspects affect the validity of experiments in empirical ML. This problem is the most complex one and probably for that reason the most difficult to describe in precise terms (cf. Saitta & Neri, 1998 ) . However, since we think that this problem affects the validity of empirical research in ML in a fundamental way, an account of empirical ML that does not attempt to make it tangible would be incomplete. We aim to narrow down the problem by explicating examples for supervised learning and unsupervised learning. In other sciences such as psychology and physics, validity ∗ , the fact that the experimental measurement process actually measures what it is intended to measure, is fundamental. It inevitably depends on a strict and thorough operationalization in what way abstract concepts that are to be measured relate to measurable entities in the real world. Note that “[o]perational analysis is an excellent diagnostic tool for revealing where our knowledge is weak, in order to guide our efforts to strengthening it. The Bridgmanian ideal [∗] is always to back up concepts with operational definitions, that is, to ensure that every concept is independently measurable in every circumstance under which it is used” (Chang, 2004 , p. 147) . It is puzzling that validity and other quality criteria of empirical research have gained little attention in ML so far (e.g., Myrtveit et al., 2005 ; Segebarth et al., 2020 ; Raji et al., 2021 ) . Experimental validity in supervised learning.     For supervised learning, the problem can be exemplified by the question of inference from experimental results on real data in method comparison and evaluation studies. 5 5 5 Another example independently affecting validity is underspecification, which “is common in modern ML pipelines, such as those based on deep learning” (D’Amour et al., 2022 , p. 2) . Typically, the goal is to generalize the observed performance difference between methods to datasets that were not included in a study, which would require specifying when datasets are from the same/different domain. The problem is that it is not at all clear in what sense results obtained from one set of real datasets can be generalized to any other set of datasets, as this would require a clear understanding of the distribution of the data-generating processes by which each dataset is generated (e.g., Aha, 1992 ; Salzberg, 1997 ; Boulesteix et al., 2015a ; Herrmann, 2022 ; Strobl & Leisch, 2024 ) . Without a definition of the population of data-generating processes, i.e., (some) clarity about an abstract concept, it can be argued that it is not clear what a real data comparison study actually measures. In other words, the collection of datasets considered “will not be representative of real data sets in any formal sense” (Hand, 2006 , p. 12) . Dietterich ( 1998 , p. 4) even went so far as claiming that how to perform benchmark experiments on real datasets properly is “perhaps the most fundamental and difficult question in machine learning.” Experimental validity in unsupervised learning.     Arguably, the situation is even more involved in unsupervised learning (e.g., Kleinberg, 2002 ; von Luxburg et al., 2012 ; Zimek & Filzmoser, 2018 ; Herrmann, 2022 ) . First of all, “there is no […] direct measure of success. It is difficult to ascertain the validity of inferences drawn from the output of most unsupervised learning algorithms” (Hastie et al., 2009 , p. 487) . This is aggravated by an ambiguity about the abstract concepts of interest. Consider, for example, cluster analysis. 6 6 6 See also Herrmann et al. ( 2023b ) for outlier detection. Usually, clusters are conceptualized as the modes of a mixture of (normal) distributions. However, there is a different perspective that considers cluster analysis from a topological perspective and conceptualizes clusters as the connected components of a dataset (Niyogi et al., 2011 ) . It is not clear if these different notions of clusters 1) conceptualize clusters equally well, 2) can be related to the same real-world entities, and 3) whether clustering methods developed based on these different notions are equally suitable for all clustering problems. There is some evidence that suggests this is not the case (Herrmann et al., 2023a ) .

Problem summary.     We argue that much empirical ML research is prone to overly optimistic, unreliable, and difficult-to-refute judgments and conclusions. Many experiments in empirical ML research are based on insufficiently operationalized experimental setups, partially due to ambiguous and inconclusive conceptualizations underlying the experiments. To draw more reliable conclusions, we need more explicit, context-specific operationalizations and clearer delineations of the abstract concepts that are to be investigated. Recall that “[o]perational analysis is an excellent diagnostic tool for revealing where our knowledge is weak, in order to guide our efforts in strengthening it” (Chang, 2004 , p. 147) . That sometimes good experimental research is not encouraged enough in ML (see Problem 2) and biased experiments still occur more often than desirable (see Problem 1), exacerbates the situation considerably. The former is an excellent approach for improving insight and understanding in the sense outlined above, biased experiments tend to make this more difficult. These aspects are becoming specifically important in deep learning where the sheer complexity of today’s models, especially of foundation models, makes mathematical analysis extremely difficult. Instead, the analysis often needs to be largely experimental and thus requires thorough experimentation at the highest possible level.

3 Improving the Status Quo: More Richness in Empirical Methodological Research

A unifying view: we need exploratory and confirmatory..

Confirmatory research ∗ , also known as hypothesis-testing research, aims to test preexisting hypotheses to confirm or refute existing theories. Researchers design specific studies to evaluate hypotheses derived from existing knowledge experimentally. Typically, this involves a structured and predefined research design, a priori hypotheses, and often statistical analyses to draw conclusive inferences. In contrast, exploratory research is an open-ended approach that aims to gain insight and understanding in a new or unexplored area. It is often conducted when little is known about the phenomenon under study. It involves gathering information, identifying patterns, and formulating specific hypotheses for further investigation. One of our main points is that to improve empirical ML towards more thorough, reliable, and insightful methodological research both exploratory and confirmatory research are needed in ML (cf. Tukey, 1980 ) . In general, the problems described can be placed in this broader epistemic context. We argue that most empirical research in ML is perceived as confirmatory research, when it should rather be considered to be exploratory from an epistemic perspective (see also Bouthillier et al., 2019 ) . At the same time, purely exploratory methodological research focusing on improving insight and understanding experimentally (cf. Dietterich, 1990 ) and research like neutral method comparison and replication studies, which can be considered more rigorous in the confirmatory sense, are not seen as an equally important contribution to the field. For the time being, it is worth making this distinction, yet, we discuss why it is an oversimplification from an epistemic perspective in Section  4 – even more so, because we distinguish two types of exploratory empirical methodological research in the following: 7 7 7 It is not our intention to establish a precise terminology, but we think this structure will be of assistance to the reader. insight-oriented exploratory research ∗  in contrast to method-developing exploratory research ∗ . We think insight-oriented exploratory research is what Nakkiran & Belkin ( 2022 ) mean by good experimental research, and what they mean by application improvements is a conflation of both method-developing exploratory and (supposedly) confirmatory research.

More insight-oriented exploratory research.     In principle, the good thing about moving towards more insight-oriented exploratory methodological research in ML is that there are no epistemological obstacles to overcome. The neighboring field data mining and knowledge discovery clearly has an exploratory nature and is very much in the spirit of Tukey’s exploratory data analysis. There are also already some examples of influential ML research that can be considered insight-oriented and exploratory, e.g., Frankle & Carbin ( 2019 ) , Belkin et al. ( 2019 ) , Recht et al. ( 2019 ) , Rendsburg et al. ( 2020 ) , Zhang et al. ( 2021 ) , or Power et al. ( 2021 ) . So, rather than epistemic aspects, it is the incentives and attitudes in scientific practice towards this type of research that are an obstacle to its successful dissemination. In particular, an alleged lack of novelty and originality is often invoked, which leads to rejections. Yet, without the esteem expressed by acceptance for publication, in particular in major ML venues, there is simply little incentive to engage in exploratory ML research. More importantly, it reinforces the impression among students and young scientists that exploratory research is not an integral part of science. It is therefore necessary to stimulate, encourage, and provide opportunities to make such research visible. Nakkiran & Belkin ( 2022 , pp. 4–5) propose to establish a special subject area within ML conferences for “Experimental Science of Machine Learning,” focusing on “experimental investigation into the nature of learning and learning systems.” The types of papers outlined include those with “surprising experiments,” “empirical conjectures,” “refining existing phenomena,” “formalizing intuition,” and presentation of “new measurement tool[s],” all aiming to improve the understanding of ML empirically. They also provide guidelines specifically tailored to the review of this type of research.

More (actual) confirmatory research.     As outlined, we believe most current empirical ML research (i.e., application improvements) is a mixture of method-developing exploratory research and (supposedly) confirmatory research. 8 8 8 In a sense, this limits the potential of the former and renders the latter largely useless, with biased experiments as a result. For this reason, we add a focus on well-designed, neutral method comparison and replication studies. The scrutiny and rigor these examples of (actual) confirmatory empirical research provide are sorely needed if we are to work toward more reliable and replicable research. Neutral method comparison studies include experiments that are less biased in favor of newly proposed methods (Boulesteix et al., 2013 ; Lim et al., 2000 ; Ali & Smith, 2006 ; Fernández-Delgado et al., 2014 ) . First, this includes prespecified, strictly adhered-to designs of the experimental setup, including in particular a clearly specified set of datasets and tasks. Ideally, neutral comparison studies focus on the comparison of already existing methods and are carried out by a group of authors approximately equally familiar with all the methods under consideration (Boulesteix et al., 2013 ) . Such studies ensure more neutrality and are less prone to overly optimistic conclusions than studies proposing a method, since there is much less of an incentive to promote a particular method. Second, proper uncertainty quantification is required when analyzing empirical results in ML, especially w.r.t. the different stages of inference (e.g., model fitting, model selection, pipeline construction, and performance estimation) (see Nadeau & Bengio, 2003 ; Bengio & Grandvalet, 2004 ; Hothorn et al., 2005 ; Bates et al., 2023 ) . Moreover, if statistical significance testing is to be conducted to test for statistically significant performance differences across different real-world datasets, as described, e.g., by Demšar ( 2006 ) , Eugster et al. ( 2012 ) , Boulesteix et al. ( 2015a ) , or Eisinga et al. ( 2017 ) , the methodological rigor established in other empirical domains should be applied (Munafò et al., 2017 ) , in particular, efforts towards prior sample size calculations are important (Boulesteix et al., 2017 ) . Moreover, we need more replication studies and meta-studies . These types of research face similar reservations as insight-oriented exploratory experimental research. However, replication studies are indispensable to assess the amount of non-replicable research and to prevent it from being increased. Such studies attempt to reach the same scientific conclusions as previous studies, to provide additional empirical evidence for observed phenomena. Meta-studies analyze and summarize the so accumulated evidence on a specific phenomenon. This process is the default to reach conclusions in other sciences and is important as single studies can be false and/or contradict each other. In ML, this can range from studies that attempt to replicate an experiment exactly (e.g., Lohmann et al., 2022 ) or slightly modify an experiment’s design (e.g., by using a different set of data in the replication of a neutral comparison study) to more comprehensive tuning and ablation studies of experiments conducted in method-developing research (e.g., Rendsburg et al., 2020 ; Kobak & Linderman, 2021 ) . The latter certainly overlaps with insight-oriented exploratory research. It is important to emphasize that it is in the nature of things that a replication is not an original or novel scientific contribution in the conventional sense, and not necessarily can important new insights be gained beyond the replication of previously observed results. Rather, it is an explicit attempt to arrive at the same results and conclusions as a previous study. The scientific relevance, which is well acknowledged in other empirical sciences such as physics or medicine, lies in gathering additional empirical evidence for a hypothesis through a successful replication. Moreover, a replication study may, but does not necessarily, also raise epistemic questions, point to experimental improvements, or provide refined concepts, especially in failed replications.

More infrastructure.     To achieve this, practical limitations also need to be overcome. We require more dedicated infrastructure to make the proposed forms of research more (easily) realizable. In particular, there is a need for more and better open databases of well-curated and well-understood datasets such as OpenML  (Vanschoren et al., 2013 ) or OpenML Benchmarking Suites  (Bischl et al., 2021 ) . Moreover, well-maintained open-source software for systematic benchmark experiments, such as the AutoML Benchmark  (Gijsbers et al., 2024 ) , HPOBench  (Eggensperger et al., 2021 ) , NAS-Bench-Suite  (Mehta et al., 2022 ) , or AlgoPerf  (Dahl et al., 2023 ) , are needed. Platforms for public leaderboards and model sharing (e.g., Hugging Face) are another important aspect, although some of these platforms are geared towards horse racing based on predictive performance and therefore do not necessarily also provide scientific insights or interpretability. Yet, the standards and automatic nature of such platforms have the advantage that they offer concrete reference points for criticism and debate. Finally, reviewer guidelines implementing our suggestions and dedicated venues for currently hard-to-publish empirical work will allow the full potential of empirical ML to be realized (Sculley et al., 2018 ; Nakkiran & Belkin, 2022 ) . Moreover, without more education , none of this will be possible. Given the different perspectives – formal science, engineering, statistical – from which ML can be viewed, it is very difficult to include each in the appropriate depth in a single study program. While a recent survey of 101 undergraduate data science programs in the U.S. showed that all included an introductory course in statistics (Bile Hassan et al., 2021 ) , statistics has only recently (2023) been included as a core topic in the curriculum recommendations for CS ∗   (Joint Task Force on Computing Curricula, 2023 ) . It is also questionable if introductory courses are sufficient to avoid crucial gaps that can lead to the adoption of questionable research practices (cf. Gigerenzer, 2018 ) . Furthermore, nearly no study program contains a dedicated course on design and analysis of (computer) experiments (Santner et al., 2003 ; Box et al., 2005 ; Dean et al., 2017 ) , which we deem especially relevant for our context here. In general, we agree with De Veaux et al. ( 2017 , pp. 16–17) that many “courses traditionally found in computer science, statistics, and mathematics offerings should be redesigned for the data science [or ML] major in the interest of efficiency and the potential synergy that integrated courses would offer.”

Finally, we would like to offer concrete and practicable advice to specific target groups, in addition to the general recommendations above. Advice for junior researchers. (1) Read the positive examples of insight-oriented exploratory research in ML (listed above), about the design of experiments, the critical discussion on statistical testing, and the basics of philosophy of science. (2) Educate yourself in Open Science practices (e.g., see The Turing Way Community, 2023 ) . (3) Engage with researchers from other disciplines as data (the one on which ML models are trained) can only really be understood if one understands how it was generated. (4) Consider making empirical research in ML a (partial) research focus. Advice for senior researchers. (1) Allow your junior researchers to write (great) papers on empirical aspects of ML, even if those may be relatively difficult to publish in major venues for now. Our personal experience is that these papers can still be highly cited and become very influential. (2) Learn from other fields; what we are experiencing in terms of non-replicable research is not a new phenomenon. (3) Please do not perceive this paper as an attack on ML but rather as an honest attempt to improve it and, more importantly, to improve its impact. Advice for venue organizers and editors (see also Nakkiran & Belkin, 2022 ) . (1) Encourage all forms of proper empirical ML to be submitted (in particular, this includes insight-oriented exploratory research), e.g., by creating special tracks or adding keywords but also by allowing such work on main tracks. The idea is to create special measures for the topic to increase awareness but not to isolate or ban all such papers to special (workshop) tracks with (potentially) lower perceived value. (2) Consider giving out awards for positive examples of these types of research. (3) Consider establishing positions like reproducibility and replicability editors for venues and journals. (4) Give concrete advice for best practices, so authors and reviewers have clear guidelines to follow. Note that this should not be confined to asking “Were the empirical results subjected to statistical tests?” (without further information); this is close to the opposite of what we think is needed. 9 9 9 One anonymous reviewer also suggested that venues start collecting metadata on reasons for rejection. Such data could serve as a basis to evaluate if certain types of ML research face a systematic bias.

4 Beyond the Status Quo: Rethinking Empirical ML as a Maturing Science

The exploratory-confirmatory research continuum..

With ML’s strong foundation in formal sciences, where absolute certainty can be achieved by formal proofs, the clear distinction between exploratory and confirmatory research that has been invoked so far may seem natural. Yet, from an empirical perspective, i.e., whenever one deals with entities in the real world, it is itself an oversimplifying dichotomy, and empirical research is better thought of as a continuum from exploratory to confirmatory, with an ideal of purely exploratory research at one end and of strictly confirmatory research at the other (e.g., Wagenmakers et al., 2012 ; Oberauer & Lewandowsky, 2019 ; Szollosi & Donkin, 2021 ; Scheel et al., 2021 ; Devezer et al., 2021 ; Rubin & Donkin, 2022 ; Höfler et al., 2022 ; Fife & Rodgers, 2022 ) . Based on that notion, Fife & Rodgers ( 2022 ) argue that “psychology may not be mature enough to justify confirmatory research” (p. 453) and that “[t]he maturity of any science puts a cap on the exploratory/confirmatory continuum” (p. 462). Given the similarities between research in psychology and ML as described by Hullman et al. ( 2022 ) , we think similar holds for ML and we suggest ML should be considered as a maturing (empirical) science as well. 10 10 10 There are also differences between ML and psychology considerably simplifying our lives: we usually do not experiment on humans but algorithms on computers and have more control over experiments, larger sample sizes, and lower experimental costs. Hullman et al. ( 2022 , p. 355) “identify common themes in reform discussions, like overreliance on asymptotic theory and non-credible beliefs about real-world data-generating processes.” That said, confirmatory research in ML as advocated in Section  3 is still very different from strict confirmatory research in other disciplines. Rather it can be seen as rough confirmatory research (Fife & Rodgers, 2022 ; Tukey, 1973 ) that follows the same principles, but – as outlined – it is unclear how results can be generalized (e.g., using statistical tests) – a cornerstone of strict confirmatory research. But this should not be taken as a caveat. Rough confirmatory research allows for flexibility that strict confirmatory research does not (Fife & Rodgers, 2022 ) . The framework proposed by Heinze et al. ( 2024 , p. 1) can be seen as a way of mapping this rather abstract idea into more concrete guidelines for scientific practice. In the context of biostatistics, they propose to consider four phases of methodological research, analogous to clinical research in drug development: “(I) proposing a new methodological idea while providing, for example, logical reasoning or proofs, (II) providing empirical evidence, first in a narrow target setting, then (III) in an extended range of settings and for various outcomes, accompanied by appropriate application examples, and (IV) investigations that establish a method as sufficiently well-understood to know when it is preferred over others and when it is not; that is, its pitfalls.”

Statistical significance tests: Words of caution, revisited! The problem of empirical research as a continuum is more involved epistemologically and cannot be discussed in full detail here. An important aspect that needs to be discussed is its relation to the misguided use of statistical testing. This point has been made by Drummond ( 2006 ) before and in more detail. We revisit it here, enriching it with more recent literature on the issue. In particular, routinely adding statistical machinery to an (already underspecified and/or biased) experimental design to test for statistically significant differences in performance – as is frequently done and/or explicitly asked for (e.g., Henderson et al., 2018 ; Marie et al., 2021 ) – does not improve the epistemic relevance of the results by much nor does it add much additional insight over other data aggregations. In fact, “[s]tatistical significance was never meant to imply scientific importance,” and you should not “conclude anything about scientific or practical importance based on statistical significance (or lack thereof)” (Wasserstein et al., 2019 , pp. 2, 1) . On the contrary, the misguided beliefs in and use of statistical rituals (Gigerenzer, 2018 ) is largely responsible for the replication crisis in other empirical disciplines. The reasons are complex. First of all, the modern theory of statistical hypothesis testing (SHT) is a conflation of two historically distinct types of testing theory ∗ . Important epistemological questions about when statistical tests are appropriate are obscured by this mixed theory (e.g., Schneider, 2015 ; Gigerenzer & Marewski, 2015 ; Rubin, 2020 ) . More importantly, specifically for experiments in ML, both theories are developed for experimental designs based on samples randomly drawn from a population of interest (Schneider, 2015 ) . In general, the assumptions underlying the theory of statistical testing as an inferential tool are usually not met in many applications (Greenland, 2023 ) . In fact, the editors of the The American Statistician special issue “Statistical Inference in the 21st Century: A World Beyond p < 0.05 𝑝 0.05 p<0.05 italic_p < 0.05 ” went so far as to conclude, “based on [their] review of the articles in this special issue and the broader literature, that it is time to stop using the term ‘statistically significant’ entirely” (Wasserstein et al., 2019 , p. 2) . Note that we want to warn against an overemphasis on as well as an uncritical use of statistical tests; we do not argue against statistical testing in general. Quite the contrary, we argue for a more diverse set of analysis tools (applied with care and critical reflection), including but not limited to statistical testing. We also want to stress that statistical testing cannot remedy more fundamental problems such as poor experimental design. To summarize the main points, we emphasize:

Valid statistical testing inevitably depends on a thorough and well-designed experimental setup.

Statistical testing should not be applied routinely and requires thought and careful preparation to be valid and insightful.

Improper statistical testing and/or its uneducated interpretation are – widely acknowledged – a main driver for non-replicable results in other empirical sciences.

The discussion about these issues has been going on for decades and has resulted in a large body of literature, some of which is condensed in the mentioned special issue of The American Statistician .

So, while we argue for more experiments in a confirmatory spirit to improve the status quo of empirical ML (see Section 3 ), especially using neutral method comparison and replication studies, we also emphasize that it is important to keep in mind their current epistemic limitations. In particular, we warn against common misconceptions about and inappropriate use of SHT. The problem is that the underlying “misunderstandings stem from a set of interrelated cognitive biases that reflect innate human compulsions which even the most advanced mathematical training seems to do nothing to staunch, and may even aggravate: Dichotomania, the tendency to reduce quantitative scales to dichotomies; nullism, the tendency to believe or at least act as if an unrefuted null hypothesis is true; and statistical reification, the tendency to forget that mathematical arguments say nothing about reality except to the extent the assumptions they make (which are often implicit) can be mapped into reality in a way that makes them all correct simultaneously” (Greenland, 2023 , p. 911) .

Most current empirical ML research should rather be viewed as exploratory.     As outlined, confirmatory research aims to test preexisting hypotheses, while exploratory research involves gathering information, identifying patterns, and formulating specific hypotheses for further investigation. We think, currently, most of the empirical research in ML is conducted as part of a paper introducing a new method and is fashioned as confirmatory research even though it is exploratory in nature. In our view, this is reflected especially in the routine use of statistical tests to aggregate benchmark results: the exploratory phase of method development (e.g., trying out different method variants) largely invalidates post hoc statistical tests. As Strobl & Leisch ( 2024 , p. 2) put it: “In methodological research, comparison studies are often published either with the explicit or implicit aim to promote a new method by means of showing that it outperforms existing methods.” In other words, the conducted experiments are set up to confirm the (implicit) hypothesis that the proposed method constitutes an improvement. Systemic pressures and conventions, as well as ML’s strong roots in formal sciences and focus on improving applications, encourage this mindset and the practice of invoking confirmatory arguments. This is expressed in statements such as that “[i]t is well-known that reviewers ask for application improvements” and “for ‘theoretical justification’ for purely experimental papers, even when the experiments alone constitute a valid scientific contribution” (Nakkiran & Belkin, 2022 , pp. 2–3) . The problem with not emphasizing the exploratory nature is that “exploratory findings have a slippery way of ‘transforming’ into planned findings as the research process progresses” (Calin-Jageman & Cumming, 2019 , p. 275) and “[a]t the bottom of that slippery slope one often finds results that don’t reproduce” (Wasserstein et al., 2019 , p. 3) . Shifting the focus to an exploratory notion of method development is an opportunity to fully allow “to understand under which circumstances the algorithm produces misleading results” (Rendsburg et al., 2020 , p. 9) and to “learn about [its] strengths and weaknesses” (Sculley et al., 2018 , p. 2) and clearly report them.

5 Conclusion

This work offers perspectives on ML that outline how it should move from a field being largely driven by mathematical proofs and application improvements to also becoming a full-fledged empirical field driven by multiple types of experimental research. By providing concrete practical guidance but at the same time moderating expectations of what empirical research can achieve, we wish to contribute to greater overall reliability and trustworthiness. For every don’t, there is a do.     However, we are aware that our explanations may initially leave the reader unsatisfied when it comes to translating the conclusions into scientific practice. For example, those who were hoping for guidelines on the correct use of statistical tests may well be at a complete loss. However, we do not believe that this is actually the case. If you are inclined to perform statistical tests as described by Demšar ( 2006 ) , do so, but also be aware of the Do-lists described by Wasserstein et al. ( 2019 , Ch. 3, 7) . In this regard, we consider the following comment by Wasserstein et al. (ib., p. 6) very noteworthy: “Researchers of any ilk may rarely advertise their personal modesty. Yet, the most successful ones cultivate a practice of being modest throughout their research, by understanding and clearly expressing the limitations of their work.” Furthermore, do not only rely on real data, use simulated data as well. Simulations are an excellent tool for operationalization, i.e., mapping abstract concepts to measurable entities. Yet, the most important point is that we should be open to different ways of doing experimental research and should not penalize research just because it does not follow certain established conventions. As Nakkiran & Belkin ( 2022 , p. 6) put it: “Each paper must be evaluated on an individual basis”; this is challenging, but they suggest guidelines. The community should take them up to address this issue. Embracing inconclusiveness.     Summarizing the perspectives on empirical ML covered here, and returning to the idea of mature sciences, we believe that for ML to mature as an (empirical) science, a greater awareness of some epistemic limitations, but also of the plurality of ways to gain insights, might be all it needs. We believe that if empirical research is one thing, it is not conclusive and no single empirical study can prove anything with absolute certainty. It must be scrutinized, repeated, and reassessed in a sense of epistemic iteration ∗ (Chang, 2004 ) . That said, we conclude by quoting Chang ’s thoughts (ib., p. 243) on science in general: “If something is actually uncertain, our knowledge is superior if it is accompanied by an appropriate degree of doubt rather than blind faith. If the reasons we have for a certain belief are inconclusive, being aware of the inconclusiveness prepares us better for the possibility that other reasons may emerge to overturn our belief. With a critical awareness of uncertainty and inconclusiveness, our knowledge reaches a higher level of flexibility and sophistication.”

Impact Statement

This position paper aims to advance machine learning by addressing practical challenges and epistemic constraints of empirical research that are often overlooked. We believe this has implications for machine learning research in general, as it can help to improve the reliability and credibility of research results. We also believe that our contribution can have a broader positive social and ethical impact by preventing misdirected efforts and resources.

Acknowledgments

We thank the four anonymous reviewers for their valuable comments and suggestions. Katharina Eggensperger is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 – Project number 390727645. Anne-Laure Boulesteix was partly funded by DFG grant BO3139/9-1.

  • Aha (1992) Aha, D. W. Generalizing from case studies: A case study. In Sleeman, D. and Edwards, P. (eds.), Machine Learning Proceedings 1992 , pp.  1–10, San Francisco, CA, United States, 1992. Morgan Kaufmann. doi: 10.1016/B978-1-55860-247-2.50006-1 .
  • Albanie et al. (2021) Albanie, S., Henriques, J., Bertinetto, L., Hernandez-Garcia, A., Doughty, H., and Varol, G. The pre-registration workshop: An alternative publication model for machine learning research [Workshop]. Thirty-Fifth Conference on Neural Information Processing Systems , Online, 2021. https://neurips.cc/Conferences/2021/Schedule?showEvent=21885 .
  • Ali & Smith (2006) Ali, S. and Smith, K. A. On learning algorithm selection for classification. Applied Soft Computing , 6(2):119–138, 2006. doi: 10.1016/j.asoc.2004.12.002 .
  • Barba (2018) Barba, L. A. Terminologies for reproducible research. arXiv:1802.03311 [cs.DL] , 2018. doi: 10.48550/arXiv.1802.03311 .
  • Bates et al. (2023) Bates, S., Hastie, T., and Tibshirani, R. Cross-validation: What does it estimate and how well does it do it? Journal of the American Statistical Association , pp.  1–12, 2023. doi: 10.1080/01621459.2023.2197686 .
  • Belkin et al. (2019) Belkin, M., Hsu, D., Ma, S., and Mandal, S. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences , 116(32):15849–15854, 2019. doi: 10.1073/pnas.1903070116 .
  • Bengio & Grandvalet (2004) Bengio, Y. and Grandvalet, Y. No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research , 5:1089–1105, 2004. https://www.jmlr.org/papers/v5/grandvalet04a.html .
  • Bergstra et al. (2011) Bergstra, J., Bardenet, R., Bengio, Y., and Kégl, B. Algorithms for hyper-parameter optimization. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems , volume 24, Granada, Spain, 2011. Curran Associates, Inc. https://papers.neurips.cc/paper_files/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html .
  • Bergstra et al. (2013) Bergstra, J., Yamins, D., and Cox, D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Dasgupta, S. and McAllester, D. (eds.), Proceedings of the 30th International Conference on Machine Learning , pp.  115–123, Atlanta, GA, United States, 2013. PMLR. https://proceedings.mlr.press/v28/bergstra13.html .
  • Beyer et al. (2020) Beyer, L., Hénaff, O. J., Kolesnikov, A., Zhai, X., and van den Oord, A. Are we done with ImageNet? arXiv:2006.07159 [cs.CV] , 2020. doi: 10.48550/arXiv.2006.07159 .
  • Bile Hassan et al. (2021) Bile Hassan, I., Ghanem, T., Jacobson, D., Jin, S., Johnson, K., Sulieman, D., and Wei, W. Data science curriculum design: A case study. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education , pp.  529–534, Online, 2021. Association for Computing Machinery. doi: 10.1145/3408877.3432443 .
  • Bischl et al. (2021) Bischl, B., Casalicchio, G., Feurer, M., Gijsbers, P., Hutter, F., Lang, M., Gomes Mantovani, R., van Rijn, J., and Vanschoren, J. OpenML benchmarking suites. In Vanschoren, J. and Yeung, S. (eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , volume 1, Online, 2021. Curran Associates, Inc. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/hash/c7e1249ffc03eb9ded908c236bd1996d-Abstract-round2.html .
  • Bischl et al. (2023) Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Becker, M., Boulesteix, A.-L., Deng, D., and Lindauer, M. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 13(2):e1484, 2023. doi: 10.1002/widm.1484 .
  • Boulesteix (2010) Boulesteix, A.-L. Over-optimism in bioinformatics research. Bioinformatics , 26(3):437–439, 2010. doi: 10.1093/bioinformatics/btp648 .
  • Boulesteix et al. (2013) Boulesteix, A.-L., Lauer, S., and Eugster, M. J. A. A plea for neutral comparison studies in computational sciences. PLoS ONE , 8(4):e61562, 2013. doi: 10.1371/journal.pone.0061562 .
  • Boulesteix et al. (2015a) Boulesteix, A.-L., Hable, R., Lauer, S., and Eugster, M. J. A. A statistical framework for hypothesis testing in real data comparison studies. The American Statistician , 69(3):201–212, 2015a. doi: 10.1080/00031305.2015.1005128 .
  • Boulesteix et al. (2015b) Boulesteix, A.-L., Stierle, V., and Hapfelmeier, A. Publication bias in methodological computational research. Cancer Informatics , 14(S5):11–19, 2015b. doi: 10.4137/CIN.S30747 .
  • Boulesteix et al. (2017) Boulesteix, A.-L., Wilson, R., and Hapfelmeier, A. Towards evidence-based computational statistics: Lessons from clinical research on the role and design of real-data benchmark studies. BMC Medical Research Methodology , 17(1):138, 2017. doi: 10.1186/s12874-017-0417-2 .
  • Bouthillier et al. (2019) Bouthillier, X., Laurent, C., and Vincent, P. Unreproducible research is reproducible. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning , pp.  725–734, Long Beach, CA, United States, 2019. PMLR. https://proceedings.mlr.press/v97/bouthillier19a.html .
  • Bouthillier et al. (2021) Bouthillier, X., Delaunay, P., Bronzi, M., Trofimov, A., Nichyporuk, B., Szeto, J., Mohammadi Sepahvand, N., Raff, E., Madan, K., Voleti, V., Ebrahimi Kahou, S., Michalski, V., Arbel, T., Pal, C., Varoquaux, G., and Vincent, P. Accounting for variance in machine learning benchmarks. In Smola, A., Dimakis, A., and Stoica, I. (eds.), Proceedings of Machine Learning and Systems , volume 3, pp.  747–769, Online, 2021. https://proceedings.mlsys.org/paper_files/paper/2021/hash/0184b0cd3cfb185989f858a1d9f5c1eb-Abstract.html .
  • Box et al. (2005) Box, G. E. P., Hunter, J. S., and Hunter, W. G. Statistics for Experimenters: Design, Innovation, and Discovery . Wiley Series in Probability and Statistics. John Wiley Wiley and Sons, 2nd edition, 2005.
  • Bridgman (1927) Bridgman, P. W. The Logic of Modern Physics . Macmillan, 1927.
  • Buchka et al. (2021) Buchka, S., Hapfelmeier, A., Gardner, P. P., Wilson, R., and Boulesteix, A.-L. On the optimistic performance evaluation of newly introduced bioinformatic methods. Genome Biology , 22(1):152, 2021. doi: 10.1186/s13059-021-02365-4 .
  • Calin-Jageman & Cumming (2019) Calin-Jageman, R. J. and Cumming, G. The new statistics for better science: Ask how much, how uncertain, and what else is known. The American Statistician , 73(sup1):271–280, 2019. doi: 10.1080/00031305.2018.1518266 .
  • Campell (1957) Campell, D. T. Factors relevant to the validity of experiments in social settings. Psychological Bulletin , 54(4):297–312, 1957. doi: 10.1037/h0040950 .
  • Chang (2004) Chang, H. Inventing temperature: Measurement and scientific progress . Oxford University Press, 2004.
  • Chang (2021) Chang, H. Operationalism. In Zalta, E. N. (ed.), The Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, Fall 2021 edition, 2021. https://plato.stanford.edu/archives/fall2021/entries/operationalism/ .
  • Christodoulou et al. (2019) Christodoulou, E., Ma, J., Collins, G. S., Steyerberg, E. W., Verbakel, J. Y., and Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology , 110:12–22, 2019. doi: 10.1016/j.jclinepi.2019.02.004 .
  • Conference on Neural Information Processing Systems (n.d.) Conference on Neural Information Processing Systems. NeurIPS 2023 Datasets and Benchmarks Track, n.d. Retrieved January 31, 2024, from https://neurips.cc/Conferences/2023/CallForDatasetsBenchmarks .
  • Dahl et al. (2023) Dahl, G. E., Schneider, F., Nado, Z., Agarwal, N., Sastry, C. S., Hennig, P., Medapati, S., Eschenhagen, R., Kasimbeg, P., Suo, D., Bae, J., Gilmer, J., Peirson, A. L., Khan, B., Anil, R., Rabbat, M., Krishnan, S., Snider, D., Amid, E., Chen, K., Maddison, C. J., Vasudev, R., Badura, M., Garg, A., and Mattson, P. Benchmarking neural network training algorithms. arXiv:2306.07179 [cs.LG] , 2023. doi: 10.48550/arXiv.2306.07179 .
  • D’Amour et al. (2022) D’Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., Chen, C., Deaton, J., Eisenstein, J., Hoffman, M. D., Hormozdiari, F., Houlsby, N., Hou, S., Jerfel, G., Karthikesalingam, A., Lucic, M., Ma, Y., McLean, C., Mincu, D., Mitani, A., Montanari, A., Nado, Z., Natarajan, V., Nielson, C., Osborne, T. F., Raman, R., Ramasamy, K., Sayres, R., Schrouff, J., Seneviratne, M., Sequeira, S., Suresh, H., Veitch, V., Vladymyrov, M., Wang, X., Webster, K., Yadlowsky, S., Yun, T., Zhai, X., and Sculley, D. Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research , 23:1–61, 2022. https://www.jmlr.org/papers/v23/20-1335.html .
  • De Veaux et al. (2017) De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., Tiruviluamala, N., Uhlig, P. X., Washington, T. M., Wesley, C. L., White, D., and Ye, P. Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application , 4:15–30, 2017. doi: 10.1146/annurev-statistics-060116-053930 .
  • Dean et al. (2017) Dean, A., Voss, D., and Draguljić, D. Design and Analysis of Experiments . Springer Texts in Statistics. Springer, 2nd edition, 2017. doi: 10.1007/978-3-319-52250-0 .
  • Dehghani et al. (2021) Dehghani, M., Tay, Y., Gritsenko, A. A., Zhao, Z., Houlsby, N., Diaz, F., Metzler, D., and Vinyals, O. The benchmark lottery. arXiv:2107.07002 [cs.LG] , 2021. doi: 10.48550/arXiv.2107.07002 .
  • Demšar (2006) Demšar, J. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research , 7:1–30, 2006. https://jmlr.org/papers/v7/demsar06a.html .
  • Devezer et al. (2021) Devezer, B., Navarro, D. J., Vandekerckhove, J., and Buzbas, E. O. The case for formal methodology in scientific reform. Royal Society Open Science , 8(3):200805, 2021. doi: 10.1098/rsos.200805 .
  • Dhiman et al. (2022) Dhiman, P., Ma, J., Andaur Navarro, C. L., Speich, B., Bullock, G., Damen, J. A. A., Hooft, L., Kirtley, S., Riley, R. D., Van Calster, B., Moons, K. G. M., and Collins, G. S. Risk of bias of prognostic models developed using machine learning: A systematic review in oncology. Diagnostic and Prognostic Research , 6:13, 2022. doi: 10.1186/s41512-022-00126-w .
  • Dietterich (1990) Dietterich, T. G. Exploratory research in machine learning. Machine Learning , 5(1):5–9, 1990. doi: 10.1007/BF00115892 .
  • Dietterich (1998) Dietterich, T. G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation , 10(7):1895–1923, 1998. doi: 10.1162/089976698300017197 .
  • Drummond (2006) Drummond, C. Machine learning as an experimental science (revisited). In AAAI Workshop on Evaluation Methods for Machine Learning , Boston, MA, United States, 2006. https://aaai.org/papers/ws06-06-002-machine-learning-as-an-experimental-science-revisited/ .
  • Drummond (2009) Drummond, C. Replicability is not reproducibility: Nor is it good science. In Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th Annual International Conference on Machine Learning , Montreal, Canada, 2009. https://www.site.uottawa.ca/~cdrummon/pubs/ICMLws09.pdf .
  • Drummond & Japkowicz (2010) Drummond, C. and Japkowicz, N. Warning: Statistical benchmarking is addictive. Kicking the habit in machine learning. Journal of Experimental & Theoretical Artificial Intelligence , 22(1):67–80, 2010. doi: 10.1080/09528130903010295 .
  • Eggensperger et al. (2021) Eggensperger, K., Müller, P., Mallik, N., Feurer, M., Sass, R., Klein, A., Awad, N., Lindauer, M., and Hutter, F. HPOBench: A collection of reproducible multi-fidelity benchmark problems for HPO. In Vanschoren, J. and Yeung, S. (eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , volume 1, Online, 2021. Curran Associates, Inc. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/hash/93db85ed909c13838ff95ccfa94cebd9-Abstract-round2.html .
  • Eisinga et al. (2017) Eisinga, R., Heskes, T., Pelzer, B., and Te Grotenhuis, M. Exact p -values for pairwise comparison of Friedman rank sums, with application to comparing classifiers. BMC Bioinformatics , 18(1):68, 2017. doi: 10.1186/s12859-017-1486-2 .
  • Elor & Averbuch-Elor (2022) Elor, Y. and Averbuch-Elor, H. To SMOTE, or not to SMOTE? arXiv:2201.08528 [cs.LG] , 2022. doi: 10.48550/arXiv.2201.08528 .
  • Eugster et al. (2012) Eugster, M. J. A., Hothorn, T., and Leisch, F. Domain-based benchmark experiments: Exploratory and inferential analysis. Austrian Journal of Statistics , 41(1):5–26, 2012. doi: 10.17713/ajs.v41i1.185 .
  • Fernández-Delgado et al. (2014) Fernández-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research , 15:3133–3181, 2014. https://jmlr.org/papers/v15/delgado14a.html .
  • Ferrari Dacrema et al. (2021) Ferrari Dacrema, M., Boglio, S., Cremonesi, P., and Jannach, D. A troubling analysis of reproducibility and progress in recommender systems research. ACM Transactions on Information Systems , 39(2):1–49, 2021. doi: 10.1145/3434185 .
  • Feurer & Hutter (2019) Feurer, M. and Hutter, F. Hyperparameter optimization. In Hutter, F., Kotthoff, L., and Vanschoren, J. (eds.), Automated Machine Learning: Methods, Systems, Challenges , The Springer Series on Challenges in Machine Learning, pp.  3–33. Springer, 2019. doi: 10.1007/978-3-030-05318-5_1 .
  • Feynman (1974) Feynman, R. P. Cargo cult science. Engineering and Science , 37(7):10–13, 1974. Transcript of commencement address given at the California Institute of Technology. Available at http://calteches.library.caltech.edu/51/2/CargoCult.htm .
  • Fife & Rodgers (2022) Fife, D. A. and Rodgers, J. L. Understanding the exploratory/confirmatory data analysis continuum: Moving beyond the “replication crisis”. American Psychologist , 77(3):453–466, 2022. doi: 10.1037/amp0000886 .
  • Forde et al. (2020) Forde, J. Z., Ruiz, F., Pradier, M. F., and Schein, A. I can’t believe it’s not better! Bridging the gap between theory and empiricism in probabilistic machine learning [Workshop]. Thirty-Fourth Conference on Neural Information Processing Systems , Online, 2020. https://neurips.cc/virtual/2020/protected/workshop_16124.html .
  • Foster (2024) Foster, C. Methodological pragmatism in educational research: From qualitative-quantitative to exploratory-confirmatory distinctions. International Journal of Research & Method in Education , 47(1):4–19, 2024. doi: 10.1080/1743727X.2023.2210063 .
  • Frankle & Carbin (2019) Frankle, J. and Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In 7th International Conference on Learning Representations , New Orleans, LA, United States, 2019. https://openreview.net/forum?id=rJl-b3RcF7 .
  • Franklin & Perovic (2023) Franklin, A. and Perovic, S. Experiment in physics. In Zalta, E. N. and Nodelman, U. (eds.), The Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, Fall 2023 edition, 2023. https://plato.stanford.edu/archives/fall2023/entries/physics-experiment/ .
  • Gencoglu et al. (2019) Gencoglu, O., van Gils, M., Guldogan, E., Morikawa, C., Süzen, M., Gruber, M., Leinonen, J., and Huttunen, H. HARK side of deep learning – From grad student descent to automated machine learning. arXiv:1904.07633 [cs.LG] , 2019. doi: 10.48550/arXiv.1904.07633 .
  • Gigerenzer (2018) Gigerenzer, G. Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science , 1(2):198–218, 2018. doi: 10.1177/2515245918771329 .
  • Gigerenzer & Marewski (2015) Gigerenzer, G. and Marewski, J. N. Surrogate science: The idol of a universal method for scientific inference. Journal of Management , 41(2):421–440, 2015. doi: 10.1177/0149206314547522 .
  • Gijsbers et al. (2024) Gijsbers, P., Bueno, M. L. P., Coors, S., LeDell, E., Poirier, S., Thomas, J., Bischl, B., and Vanschoren, J. AMLB: An AutoML benchmark. Journal of Machine Learning Research , 25:1–65, 2024. https://www.jmlr.org/papers/v25/22-0493.html .
  • Greenland (2023) Greenland, S. Connecting simple and precise P -values to complex and ambiguous realities (includes rejoinder to comments on “Divergence vs. decision P -values”). Scandinavian Journal of Statistics , 50(3):899–914, 2023. doi: 10.1111/sjos.12645 .
  • Gundersen (2021) Gundersen, O. E. The fundamental principles of reproducibility. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , 379(2197):20200210, 2021. doi: 10.1098/rsta.2020.0210 .
  • Hand (2006) Hand, D. J. Classifier technology and the illusion of progress. Statistical Science , 21(1):1–14, 2006. doi: 10.1214/088342306000000060 .
  • Hastie et al. (2009) Hastie, T., Tibshirani, R., and Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction . Springer Series in Statistics. Springer, 2nd edition, 2009. doi: 10.1007/978-0-387-84858-7 .
  • Heinze et al. (2024) Heinze, G., Boulesteix, A.-L., Kammer, M., Morris, T. P., White, I. R., and Simulation Panel of the STRATOS initiative. Phases of methodological research in biostatistics—building the evidence base for new methods. Biometrical Journal , 66(1):2200222, 2024. doi: 10.1002/bimj.202200222 .
  • Henderson et al. (2018) Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., and Meger, D. Deep reinforcement learning that matters. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence , pp.  3207–3214, New Orleans, LA, United States, 2018. AAAI Press. doi: 10.1609/aaai.v32i1.11694 .
  • Herrmann (2022) Herrmann, M. Towards more reliable machine learning: Conceptual insights and practical approaches for unsupervised manifold learning and supervised benchmark studies . PhD thesis, Ludwig-Maximilians-Universität München, Munich, Germany, 2022. doi: 10.5282/edoc.30789 .
  • Herrmann et al. (2020) Herrmann, M., Probst, P., Hornung, R., Jurinovic, V., and Boulesteix, A.-L. Large-scale benchmark study of survival prediction methods using multi-omics data. Briefings in Bioinformatics , 22(3):bbaa167, 2020. doi: 10.1093/bib/bbaa167 .
  • Herrmann et al. (2023a) Herrmann, M., Kazempour, D., Scheipl, F., and Kröger, P. Enhancing cluster analysis via topological manifold learning. Data Mining and Knowledge Discovery , pp.  1–48, 2023a. doi: 10.1007/s10618-023-00980-2 .
  • Herrmann et al. (2023b) Herrmann, M., Pfisterer, F., and Scheipl, F. A geometric framework for outlier detection in high-dimensional data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 13(3):e1491, 2023b. doi: 10.1002/widm.1491 .
  • Höfler et al. (2022) Höfler, M., Scherbaum, S., Kanske, P., McDonald, B., and Miller, R. Means to valuable exploration: I. The blending of confirmation and exploration and how to resolve it. Meta-Psychology , 6, 2022. doi: 10.15626/MP.2021.2837 .
  • Hooker (1995) Hooker, J. N. Testing heuristics: We have it all wrong. Journal of Heuristics , 1:33–42, 1995. doi: 10.1007/BF02430364 .
  • Hothorn et al. (2005) Hothorn, T., Leisch, F., Zeileis, A., and Hornik, K. The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics , 14(3):675–699, 2005. doi: 10.1198/106186005X59630 .
  • Hullman et al. (2022) Hullman, J., Kapoor, S., Nanayakkara, P., Gelman, A., and Narayanan, A. The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society , pp.  335–348, Oxford, United Kingdom, 2022. Association for Computing Machinery. doi: 10.1145/3514094.3534196 .
  • ICBINB Initiative (n.d.) ICBINB Initiative. ICBINB Repository of Unexpected Negative Results, n.d. Retrieved January 31, 2024, from http://icbinb.cc/icbinb-repository-of-unexpected-negative-results/ .
  • Jaeger & Halliday (1998) Jaeger, R. G. and Halliday, T. R. On confirmatory versus exploratory research. Herpetologica , 54:S64–S66, 1998. https://www.jstor.org/stable/3893289 .
  • Jelizarow et al. (2010) Jelizarow, M., Guillemot, V., Tenenhaus, A., Strimmer, K., and Boulesteix, A.-L. Over-optimism in bioinformatics: An illustration. Bioinformatics , 26(16):1990–1998, 2010. doi: 10.1093/bioinformatics/btq323 .
  • Johnson (2002) Johnson, D. S. A theoretician’s guide to the experimental analysis of algorithms. In Goldwasser, M. H., Johnson, D. S., and McGeoch, C. C. (eds.), Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth DIMACS Implementation Challenges , volume 59 of Series in Discrete Mathematics & Theoretical Computer Science , pp.  215–250. American Mathematical Society, 2002.
  • Joint Task Force on Computing Curricula (2013) Joint Task Force on Computing Curricula. Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science . Association for Computing Machinery and IEEE Computer Society, 2013. doi: 10.1145/2534860 .
  • Joint Task Force on Computing Curricula (2023) Joint Task Force on Computing Curricula. Computer Science Curricula 2023 – The Final Report . Association for Computing Machinery, IEEE Computer Society, and Association for the Advancement of Artificial Intelligence, 2023. The report is not yet listed on the ACM curricula recommendation website. The final version is available at https://csed.acm.org/final-report/ .
  • Journal of Data-centric Machine Learning Research (n.d.) Journal of Data-centric Machine Learning Research. Submission guidelines for authors, n.d. Retrieved January 31, 2024, from https://data.mlr.press/submissions.html .
  • Kapoor & Narayanan (2023) Kapoor, S. and Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns , 4(9):100804, 2023. doi: 10.1016/j.patter.2023.100804 .
  • Kerr (1998) Kerr, N. L. HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review , 2(3):196–217, 1998. doi: 10.1207/s15327957pspr0203_4 .
  • Kimmelman et al. (2014) Kimmelman, J., Mogil, J. S., and Dirnagl, U. Distinguishing between exploratory and confirmatory preclinical research will improve translation. PLoS Biology , 12(5):e1001863, 2014. doi: 10.1371/journal.pbio.1001863 .
  • Kleinberg (2002) Kleinberg, J. An impossibility theorem for clustering. In Becker, S., Thrun, S., and Obermayer, K. (eds.), Advances in Neural Information Processing Systems , volume 15, Vancouver, Canada, 2002. MIT Press. https://papers.neurips.cc/paper_files/paper/2002/hash/43e4e6a6f341e00671e123714de019a8-Abstract.html .
  • Kobak & Linderman (2021) Kobak, D. and Linderman, G. C. Initialization is critical for preserving global data structure in both t -SNE and UMAP. Nature Biotechnology , 39(2):156–157, 2021. doi: 10.1038/s41587-020-00809-z .
  • Koch et al. (2021) Koch, B., Denton, E., Hanna, A., and Foster, J. G. Reduced, reused and recycled: The life of a dataset in machine learning research. In Vanschoren, J. and Yeung, S. (eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , volume 1, Online, 2021. Curran Associates, Inc. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/hash/3b8a614226a953a8cd9526fca6fe9ba5-Abstract-round2.html .
  • Kriegel et al. (2017) Kriegel, H.-P., Schubert, E., and Zimek, A. The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowledge and Information Systems , 52(2):341–378, 2017. doi: 10.1007/s10115-016-1004-2 .
  • Langley (1988) Langley, P. Machine learning as an experimental science. Machine Learning , 3(1):5–8, 1988. doi: 10.1023/A:1022623814640 .
  • Liao et al. (2021) Liao, T., Taori, R., Raji, I. D., and Schmidt, L. Are we learning yet? A meta review of evaluation failures across machine learning. In Vanschoren, J. and Yeung, S. (eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , volume 1, Online, 2021. Curran Associates, Inc. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/hash/757b505cfd34c64c85ca5b5690ee5293-Abstract-round2.html .
  • Lim et al. (2000) Lim, T.-S., Loh, W.-Y., and Shih, Y.-S. A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning , 40(3):203–228, 2000. doi: 10.1023/A:1007608224229 .
  • Lindstrom (2023) Lindstrom, L. Cargo cults. In The Open Encyclopedia of Anthropology . Facsimile of the first edition in The Cambridge Encyclopedia of Anthropology , 2023. doi: 10.29164/18cargo .
  • Lipton & Steinhardt (2018) Lipton, Z. C. and Steinhardt, J. Troubling trends in machine learning scholarship. arXiv:1807.03341 [stat.ML] , 2018. doi: 10.48550/arXiv.1807.03341 .
  • Lohmann et al. (2022) Lohmann, A., Astivia, O. L. O., Morris, T. P., and Groenwold, R. H. H. It’s time! Ten reasons to start replicating simulation studies. Frontiers in Epidemiology , 2:973470, 2022. doi: 10.3389/fepid.2022.973470 .
  • Lones (2023) Lones, M. A. How to avoid machine learning pitfalls: A guide for academic researchers. arXiv:2108.02497 [cs] , 2023. doi: 10.48550/arXiv.2108.02497 .
  • Lucic et al. (2018) Lucic, M., Kurach, K., Michalski, M., Gelly, S., and Bousquet, O. Are GANs created equal? A large-scale study. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems , volume 31, Montréal, Canada, 2018. Curran Associates Inc. https://papers.neurips.cc/paper_files/paper/2018/hash/e46de7e1bcaaced9a54f1e9d0d2f800d-Abstract.html .
  • Mannarswamy & Roy (2018) Mannarswamy, S. and Roy, S. Evolving AI from research to real life – Some challenges and suggestions. In Lang, J. (ed.), Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence , pp.  5172–5179, Stockholm, Sweden, 2018. AAAI Press. doi: 10.24963/ijcai.2018/717 .
  • Marie et al. (2021) Marie, B., Fujita, A., and Rubino, R. Scientific credibility of machine translation research: A meta-evaluation of 769 papers. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pp.  7297–7306, Online, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.566 .
  • Mateus et al. (2023) Mateus, P., Volmer, L., Wee, L., Aerts, H. J. W. L., Hoebers, F., Dekker, A., and Bermejo, I. Image based prognosis in head and neck cancer using convolutional neural networks: A case study in reproducibility and optimization. Scientific Reports , 13:18176, 2023. doi: 10.1038/s41598-023-45486-5 .
  • McElfresh et al. (2023) McElfresh, D., Khandagale, S., Valverde, J., C, V. P., Feuer, B., Hegde, C., Ramakrishnan, G., Goldblum, M., and White, C. When do neural nets outperform boosted trees on tabular data? In Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems , volume 36, New Orleans, LA, United States, 2023. Curran Associates Inc. https://papers.neurips.cc/paper_files/paper/2023/hash/f06d5ebd4ff40b40dd97e30cee632123-Abstract-Datasets_and_Benchmarks.html .
  • McGeoch (2002) McGeoch, C. C. Experimental analysis of algorithms. In Pardalos, P. M. and Romeijn, H. E. (eds.), Handbook of Global Optimization: Volume 2 , pp.  489–513. Springer, 2002. doi: 10.1007/978-1-4757-5362-2_14 .
  • Mehta et al. (2022) Mehta, Y., White, C., Zela, A., Krishnakumar, A., Zabergja, G., Moradian, S., Safari, M., Yu, K., and Hutter, F. NAS-Bench-Suite: NAS evaluation is (now) surprisingly easy. In 10th International Conference on Learning Representations , Online, 2022. https://openreview.net/forum?id=0DLwqQLmqV .
  • Melis et al. (2018) Melis, G., Dyer, C., and Blunsom, P. On the state of the art of evaluation in neural language models. In 6th International Conference on Learning Representations , Vancouver, Canada, 2018. https://openreview.net/forum?id=ByJHuTgA- .
  • Merriam-Webster (n.d.) Merriam-Webster. Epistemic. In Merriam-Webster.com dictionary , n.d. Retrieved May 1, 2024, from https://www.merriam-webster.com/dictionary/epistemic .
  • Munafò et al. (2017) Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., and Ioannidis, J. P. A. A manifesto for reproducible science. Nature Human Behaviour , 1(1):1–9, 2017. doi: 10.1038/s41562-016-0021 .
  • Myrtveit et al. (2005) Myrtveit, I., Stensrud, E., and Shepperd, M. Reliability and validity in comparative studies of software prediction models. IEEE Transactions on Software Engineering , 31(5):380–391, 2005. doi: 10.1109/TSE.2005.58 .
  • Nadeau & Bengio (2003) Nadeau, C. and Bengio, Y. Inference for the generalization error. Machine Learning , 52(3):239–281, 2003. doi: 10.1023/A:1024068626366 .
  • Nakkiran & Belkin (2022) Nakkiran, P. and Belkin, M. Incentivizing empirical science in machine learning. In ML Evaluation Standards Workshop at ICLR 2022 , Online, 2022. https://ml-eval.github.io/assets/pdf/science_ml_proposal_2am.pdf .
  • Narang et al. (2021) Narang, S., Chung, H. W., Tay, Y., Fedus, W., Fevry, T., Matena, M., Malkan, K., Fiedel, N., Shazeer, N., Lan, Z., Zhou, Y., Li, W., Ding, N., Marcus, J., Roberts, A., and Raffel, C. Do transformer modifications transfer across implementations and applications? In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pp.  5758–5773, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.465 .
  • National Academies of Sciences, Engineering, and Medicine (2019) National Academies of Sciences, Engineering, and Medicine. Reproducibility and Replicability in Science . National Academies Press, 2019. doi: 10.17226/25303 .
  • Nießl et al. (2022) Nießl, C., Herrmann, M., Wiedemann, C., Casalicchio, G., and Boulesteix, A.-L. Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 12(2):e1441, 2022. doi: 10.1002/widm.1441 .
  • Nießl et al. (2024) Nießl, C., Hoffmann, S., Ullmann, T., and Boulesteix, A.-L. Explaining the optimistic performance evaluation of newly proposed methods: A cross-design validation experiment. Biometrical Journal , 66(1):2200238, 2024. doi: 10.1002/bimj.202200238 .
  • Nilsen et al. (2020) Nilsen, E. B., Bowler, D. E., and Linnell, J. D. C. Exploratory and confirmatory research in the open science era. Journal of Applied Ecology , 57(4):842–847, 2020. doi: 10.1111/1365-2664.13571 .
  • Niyogi et al. (2011) Niyogi, P., Smale, S., and Weinberger, S. A topological view of unsupervised learning from noisy data. SIAM Journal on Computing , 40(3):646–663, 2011. doi: 10.1137/090762932 .
  • Norel et al. (2011) Norel, R., Rice, J. J., and Stolovitzky, G. The self‐assessment trap: Can we all be better than average? Molecular Systems Biology , 7(1):537, 2011. doi: 10.1038/msb.2011.70 .
  • Nosek et al. (2018) Nosek, B. A., Ebersole, C. R., DeHaven, A. C., and Mellor, D. T. The preregistration revolution. Proceedings of the National Academy of Sciences , 115(11):2600–2606, 2018. doi: 10.1073/pnas.1708274114 .
  • Oberauer & Lewandowsky (2019) Oberauer, K. and Lewandowsky, S. Addressing the theory crisis in psychology. Psychonomic Bulletin & Review , 26(5):1596–1618, 2019. doi: 10.3758/s13423-019-01645-2 .
  • Pawel et al. (2024) Pawel, S., Kook, L., and Reeve, K. Pitfalls and potentials in simulation studies: Questionable research practices in comparative simulation studies allow for spurious claims of superiority of any method. Biometrical Journal , 66(1):2200091, 2024. doi: 10.1002/bimj.202200091 .
  • Pineau et al. (2021) Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d’Alché Buc, F., Fox, E., and Larochelle, H. Improving reproducibility in machine learning research (a report from the NeurIPS 2019 reproducibility program). Journal of Machine Learning Research , 22:1–20, 2021. https://jmlr.org/papers/v22/20-303.html .
  • Plesser (2018) Plesser, H. E. Reproducibility vs. replicability: A brief history of a confused terminology. Frontiers in Neuroinformatics , 11:76, 2018. doi: 10.3389/fninf.2017.00076 .
  • Popper (2002) Popper, K. R. The Logic of Scientific Discovery . Routledge, 2002. The work was originally published in 1935 in German. The first English edition was published in 1959.
  • Power et al. (2021) Power, A., Burda, Y., Edwards, H., Babuschkin, I., and Misra, V. Grokking: Generalization beyond overfitting on small algorithmic datasets. In Mathematical Reasoning in General Artificial Intelligence Workshop at ICLR 2021 , Online, 2021. https://mathai-iclr.github.io/papers/papers/MATHAI_29_paper.pdf .
  • Raff (2019) Raff, E. A step toward quantifying independently reproducible machine learning research. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems , volume 32, Vancouver, Canada, 2019. Curran Associates, Inc. https://papers.neurips.cc/paper_files/paper/2019/hash/c429429bf1f2af051f2021dc92a8ebea-Abstract.html .
  • Raff & Farris (2022) Raff, E. and Farris, A. L. A siren song of open source reproducibility. In ML Evaluation Standards Workshop at ICLR 2022 , Online, 2022. doi: 10.48550/arXiv.2204.04372 .
  • Raji et al. (2021) Raji, D., Denton, E., Bender, E. M., Hanna, A., and Paullada, A. AI and the everything in the whole wide world benchmark. In Vanschoren, J. and Yeung, S. (eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , volume 1, Online, 2021. Curran Associates, Inc. https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/084b6fbb10729ed4da8c3d3f5a3ae7c9-Abstract-round2.html .
  • Recht et al. (2019) Recht, B., Roelofs, R., Schmidt, L., and Shankar, V. Do ImageNet classifiers generalize to ImageNet? In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning , pp.  5389–5400, Long Beach, CA, United States, 2019. PMLR. https://proceedings.mlr.press/v97/recht19a.html .
  • Rendsburg et al. (2020) Rendsburg, L., Heidrich, H., and von Luxburg, U. NetGAN without GAN: From random walks to low-rank approximations. In Daumé III, H. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning , pp.  8073–8082, Online, 2020. PMLR. https://proceedings.mlr.press/v119/rendsburg20a.html .
  • Riquelme et al. (2018) Riquelme, C., Tucker, G., and Snoek, J. Deep Bayesian bandits showdown: An empirical comparison of Bayesian deep networks for Thompson sampling. In 6th International Conference on Learning Representations , Vancouver, Canada, 2018. https://openreview.net/forum?id=SyYe6k-CW .
  • Roettger (2021) Roettger, T. B. Preregistration in experimental linguistics: Applications, challenges, and limitations. Linguistics , 59(5):1227–1249, 2021. doi: 10.1515/ling-2019-0048 .
  • Rubin (2020) Rubin, M. “Repeated sampling from the same population?” A critique of Neyman and Pearson’s responses to Fisher. European Journal for Philosophy of Science , 10:42, 2020. doi: 10.1007/s13194-020-00309-6 .
  • Rubin & Donkin (2022) Rubin, M. and Donkin, C. Exploratory hypothesis tests can be more compelling than confirmatory hypothesis tests. Philosophical Psychology , pp.  1–29, 2022. doi: 10.1080/09515089.2022.2113771 .
  • Saitta & Neri (1998) Saitta, L. and Neri, F. Learning in the “real world”. Machine Learning , 30(2–3):133–163, 1998. doi: 10.1023/A:1007448122119 .
  • Salzberg (1997) Salzberg, S. L. On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery , 1:317–328, 1997. doi: 10.1023/A:1009752403260 .
  • Santner et al. (2003) Santner, T. J., Williams, B. J., and Notz, W. I. The Design and Analysis of Computer Experiments . Springer Series in Statistics. Springer, 2003. doi: 10.1007/978-1-4757-3799-8 .
  • Scheel et al. (2021) Scheel, A. M., Tiokhin, L., Isager, P. M., and Lakens, D. Why hypothesis testers should spend less time testing hypotheses. Perspectives on Psychological Science , 16(4):744–755, 2021. doi: 10.1177/1745691620966795 .
  • Schneider (2015) Schneider, J. W. Null hypothesis significance tests. A mix-up of two different theories: The basis for widespread confusion and numerous misinterpretations. Scientometrics , 102(1):411–432, 2015. doi: 10.1007/s11192-014-1251-5 .
  • Schwab & Held (2020) Schwab, S. and Held, L. Different worlds confirmatory versus exploratory research. Significance , 17(2):8–9, 2020. doi: 10.1111/1740-9713.01369 .
  • Sculley et al. (2018) Sculley, D., Snoek, J., Wiltschko, A., and Rahimi, A. Winner’s curse? On pace, progress, and empirical rigor. In 6th International Conference on Learning Representations – Workshop , Vancouver, Canada, 2018. https://openreview.net/forum?id=rJWF0Fywf .
  • Segebarth et al. (2020) Segebarth, D., Griebel, M., Stein, N., von Collenberg, C. R., Martin, C., Fiedler, D., Comeras, L. B., Sah, A., Schoeffler, V., Lüffe, T., Dürr, A., Gupta, R., Sasi, M., Lillesaar, C., Lange, M. D., Tasan, R. O., Singewald, N., Pape, H.-C., Flath, C. M., and Blum, R. On the objectivity, reliability, and validity of deep learning enabled bioimage analyses. eLife , 9:e59780, 2020. doi: 10.7554/eLife.59780 .
  • Simonsohn et al. (2014) Simonsohn, U., Nelson, L. D., and Simmons, J. P. P -curve: A key to the file-drawer. Journal of Experimental Psychology: General , 143(2):534–547, 2014. doi: 10.1037/a0033242 .
  • Sinha et al. (2023, October 18) Sinha, K., Forde, J. Z., Samiei, M., Ghosh, A., Sutawika, L., and Panigrahi, S. S. Announcing MLRC 2023. ML Reproducibility Challenge, 2023, October 18. Retrieved January 31, 2024, from https://reproml.org/blog/announcing_mlrc2023/ .
  • Smaldino & McElreath (2016) Smaldino, P. E. and McElreath, R. The natural selection of bad science. Royal Society Open Science , 3(9):160384, 2016. doi: 10.1098/rsos.160384 .
  • Sonabend et al. (2022) Sonabend, R., Bender, A., and Vollmer, S. Avoiding c-hacking when evaluating survival distribution predictions with discrimination measures. Bioinformatics , 38(17):4178–4184, 2022. doi: 10.1093/bioinformatics/btac451 .
  • Steup (2006) Steup, M. Epistemology. In Zalta, E. N. (ed.), The Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, Spring 2006 edition, 2006. https://plato.stanford.edu/archives/spr2006/entries/epistemology/ .
  • Steup & Neta (2020) Steup, M. and Neta, R. Epistemology. In Zalta, E. N. (ed.), The Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, Fall 2020 edition, 2020. https://plato.stanford.edu/archives/fall2020/entries/epistemology/ .
  • Strobl & Leisch (2024) Strobl, C. and Leisch, F. Against the “one method fits all data sets” philosophy for comparison studies in methodological research. Biometrical Journal , 66(1):2200104, 2024. doi: 10.1002/bimj.202200104 .
  • Szollosi & Donkin (2021) Szollosi, A. and Donkin, C. Arrested theory development: The misguided distinction between exploratory and confirmatory research. Perspectives on Psychological Science , 16(4):717–724, 2021. doi: 10.1177/1745691620966796 .
  • Tatman et al. (2018) Tatman, R., VanderPlas, J., and Dane, S. A practical taxonomy of reproducibility for machine learning research. In Reproducibility in Machine Learning Workshop at ICML 2018 , Stockholm, Sweden, 2018. https://openreview.net/forum?id=B1eYYK5QgX .
  • The Turing Way Community (2023) The Turing Way Community. The Turing Way: A handbook for reproducible, ethical and collaborative research . Zenodo, 2023. doi: 10.5281/zenodo.7625728 .
  • Transactions on Machine Learning Research (n.d.) Transactions on Machine Learning Research. Transactions on Machine Learning Research, n.d. Retrieved January 31, 2024, from https://jmlr.org/tmlr/index.html .
  • Transactions on Machine Learning Research (n.d.) Transactions on Machine Learning Research. Submission guidelines and editorial policies, n.d. Retrieved January 31, 2024, from https://jmlr.org/tmlr/editorial-policies.html .
  • Trosten (2023) Trosten, D. J. Questionable practices in methodological deep learning research. In Proceedings of the Northern Lights Deep Learning Workshop , volume 4, 2023. doi: 10.7557/18.6804 .
  • Tukey (1973) Tukey, J. W. Exploratory data analysis as part of a larger whole. In Proceedings of the Eighteenth Conference on the Design of Experiments in Army Research, Development and Testing , pp.  1–10, Aberdeen, MD, United States, 1973. U.S. Army Research Office. https://apps.dtic.mil/sti/citations/AD0776910 .
  • Tukey (1980) Tukey, J. W. We need both exploratory and confirmatory. The American Statistician , 34(1):23–25, 1980. doi: 10.2307/2682991 .
  • Ullmann et al. (2023) Ullmann, T., Beer, A., Hünemörder, M., Seidl, T., and Boulesteix, A.-L. Over-optimistic evaluation and reporting of novel cluster algorithms: An illustrative study. Advances in Data Analysis and Classification , 17(1):211–238, 2023. doi: 10.1007/s11634-022-00496-5 .
  • van den Goorbergh et al. (2022) van den Goorbergh, R., van Smeden, M., Timmerman, D., and Van Calster, B. The harm of class imbalance corrections for risk prediction models: Illustration and simulation using logistic regression. Journal of the American Medical Informatics Association , 29(9):1525–1534, 2022. doi: 10.1093/jamia/ocac093 .
  • Van Mechelen et al. (2023) Van Mechelen, I., Boulesteix, A.-L., Dangl, R., Dean, N., Hennig, C., Leisch, F., Steinley, D., and Warrens, M. J. A white paper on good research practices in benchmarking: The case of cluster analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 13(6):e1511, 2023. doi: 10.1002/widm.1511 .
  • Vanschoren et al. (2013) Vanschoren, J., van Rijn, J. N., Bischl, B., and Torgo, L. OpenML: Networked science in machine learning. SIGKDD Explorations , 15(2):49–60, 2013. doi: 10.1145/2641190.2641198 .
  • von Luxburg et al. (2012) von Luxburg, U., Williamson, R. C., and Guyon, I. Clustering: Science or art? In Guyon, I., Dror, G., Lemaire, V., Taylor, G., and Silver, D. (eds.), Proceedings of ICML Workshop on Unsupervised and Transfer Learning , pp.  65–79, Bellevue, WA, United States, 2012. PMLR. https://proceedings.mlr.press/v27/luxburg12a.html .
  • Wagenmakers et al. (2012) Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., and Kievit, R. A. An agenda for purely confirmatory research. Perspectives on Psychological Science , 7(6):632–638, 2012. doi: 10.1177/1745691612463078 .
  • Wasserstein et al. (2019) Wasserstein, R. L., Schirm, A. L., and Lazar, N. A. Moving to a world beyond “ p < 0.05”. The American Statistician , 73(sup1):1–19, 2019. doi: 10.1080/00031305.2019.1583913 .
  • Yousefi et al. (2010) Yousefi, M. R., Hua, J., Sima, C., and Dougherty, E. R. Reporting bias when using real data sets to analyze classification performance. Bioinformatics , 26(1):68–76, 2010. doi: 10.1093/bioinformatics/btp605 .
  • Zhang et al. (2021) Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM , 64(3):107–115, 2021. doi: 10.1145/3446776 .
  • Zimek & Filzmoser (2018) Zimek, A. and Filzmoser, P. There and back again: Outlier detection between statistical reasoning and data mining algorithms. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 8(6):e1280, 2018. doi: 10.1002/widm.1280 .
  • Zimmermann (2020) Zimmermann, A. Method evaluation, parameterization, and result validation in unsupervised data mining: A critical survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 10(2):e1330, 2020. doi: 10.1002/widm.1330 .

Bridgmanian ideal. Used by Chang ( 2004 ) to describe a specific notion of operationalization . Refers to Percy Williams Bridgman (1882–1961), Nobel laureate in physics for his work on high-pressure physics, who also made contributions to the philosophy of science. Operational analysis is the topic of his book The Logic of Modern Physics , in which he argues in particular that “[i]n general, we mean by any concept nothing more than a set of operations; the concept is synonymous with the corresponding set of operations ” (Bridgman, 1927 , p. 5) . This strict perspective on operationalization (also referred to as operationalism) has attracted a lot of criticism, see “Operationalism” in The Stanford Encyclopedia of Philosophy (Chang, 2021 ) . In particular, Chang ( 2004 , p. 148) points out that it builds on “an overly restrictive notion of meaning, which comes down to reduction of meaning to measurement, which [Chang] refer[s] to as Bridgman’s reductive doctrine of meaning .”

Cargo Cult Science. The term cargo cult refers to social movements that originated in Melanesia: “The modal cargo cult was an agitation or organised social movement of Melanesian villagers in pursuit of ‘cargo’ by means of renewed or invented ritual action that they hoped would induce ancestral spirits or other powerful beings to provide” (Lindstrom, 2023 , p. 1) . Richard Phillips Feynman (1918–1988), theoretical physicist and Nobel laureate, adapted the term to describe ritualized scientific practices which “follow all the apparent precepts and forms of scientific investigation, but [which are] missing something essential” (Feynman, 1974 , p. 11) .

Confirmatory research. Also known as hypothesis-testing research, aims to test preexisting hypotheses to confirm or refute existing theories. Researchers design specific studies to evaluate hypotheses derived from existing knowledge experimentally. Typically, this involves a structured and predefined research design, a priori hypotheses, and often statistical analyses to draw conclusive inferences. It is a well-established term in many fields other than ML. For example, general references are Schwab & Held ( 2020 ) , Nosek et al. ( 2018 ) , and Munafò et al. ( 2017 ) . Field-specific references include Jaeger & Halliday ( 1998 ) or Nilsen et al. ( 2020 ) for biology, Wagenmakers et al. ( 2012 ) for psychology, Kimmelman et al. ( 2014 ) for preclinical research, Roettger ( 2021 ) for linguistics, or Foster ( 2024 ) for educational research. The term confirmatory might appear to be in conflict with the principle of falsification established by Popper ( 1959/2002 ) . According to Popper, scientific theories cannot be conclusively confirmed, only falsified. It is important to emphasize that confirmatory research has a narrower scope rooted in Neyman-Pearson statistical testing theory (see the glossary entry on Two historically distinct types of testing theory ). This theory provides a framework for a statistically justified decision between a null hypothesis and an alternative hypothesis based on the available data. The hypothesis to be established (e.g., there is an effect) is usually stated as the alternative hypothesis and confirmation means rejecting the null hypothesis (e.g., there is no effect) for the alternative.

Curricula recommendations for CS. The report Computer Science Curricula 2013 lists “Intelligent Systems” (including basics in ML) as a Core (Tier2) topic but “still believe[s] it is not necessary for all CS programs to require a full course in probability theory [or statistics]” (Joint Task Force on Computing Curricula, 2013 , p. 50) . This has changed with the latest (2023) version insofar as statistics is now considered a CS Core topic in “Mathematical and Statistical Foundations”, which is one of several knowledge areas (Joint Task Force on Computing Curricula, 2023 ) .

Epistemic, epistemological. Both coming from the Greek word for knowledge or understanding, the terms are sometimes used synonymously and sometimes with distinct, more precise meanings. If the distinction is made, epistemic relates to knowledge itself, while epistemological relates to “the study of the nature and grounds of knowledge” (Merriam-Webster, n.d. ) , i.e., epistemology. For epistemology, an early edition of The Stanford Encyclopedia of Philosophy gives the following definition: “Defined narrowly, epistemology is the study of knowledge and justified belief. […] Understood more broadly, epistemology is about issues having to do with the creation and dissemination of knowledge in particular areas of inquiry” (Steup, 2006 ) . The most recent edition states in more abstract terms that “[m]uch recent work in formal epistemology is an attempt to understand how our degrees of confidence are rationally constrained by our evidence […]” and that “epistemology seeks to understand one or another kind of cognitive success […]” (Steup & Neta, 2020 ) .

Epistemic iteration. Chang ( 2004 ) introduced the concept and defined it in his glossary as a “process in which successive stages of knowledge, each building on the preceding one, are created in order to enhance the achievement of certain epistemic goals. It differs crucially from mathematical iteration in that the latter is used to approach a correct answer that is known, or at least in principle knowable, by other means” (p. 253). For thorough discussions, see Chapters 1 (pp. 46–48) and 5.

Exploratory research . As also specified in the main body of the paper, refers to an open-ended approach that aims to gain insight and understanding in a new or unexplored area (in contrast to confirmatory research ). It is often conducted when little is known about the phenomenon under study. It involves gathering information, identifying patterns, and formulating specific hypotheses for further investigation.

Hyperparameter tuning studies. Aim to find the best-performing configuration for an ML model class, including baselines (Feurer & Hutter, 2019 ; Bischl et al., 2023 ) . Tuned models can then be compared more objectively and fairly. Hyperparameter tuning (or the lack of it) is an important source of variation in benchmark studies  (Bouthillier et al., 2021 ) and has been shown to have a strong effect on the outcome of results (see for example the references in Bouthillier et al., 2021 or our introduction). Treating the hyperparameter optimization as part of the problem of quantifying the performance of an algorithm has been suggested by Bergstra et al. ( 2011 ) and Bergstra et al. ( 2013 ) .

Insight-oriented exploratory research. Refers to experimental research in ML that aims to gain insight, rather than inventing/developing a new method. It does not necessarily involve a very specific hypothesis to be pursued, but it is about improving the understanding and knowledge of a problem, a (class of) existing methods, or a phenomenon.

Method-developing exploratory research. Refers to experimental research in ML carried out in the process of developing a new ML method. This can include method comparison experiments, but in particular, it refers to exploration that takes place during the development process. This may include, for example, trying different method variants or specifying hyperparameter configurations and implementation details.

Operationalization. Chang ( 2004 , p. 256) provides the following definition in his glossary: “The process of giving operational meaning to a concept where there was none before. Operationalization may or may not involve the specification of explicit measurement methods.” Operational meaning refers to “the meaning of a concept that is embodied in the physical operations whose description involves the concept.” For a thorough discussion, see Chapter 4 (pp. 197–219).

Replicability (vs. reproducibility). There is no consistent use of these terms in the broader literature (for discussions, e.g., see Barba, 2018 ; Plesser, 2018 ; Gundersen, 2021 ; Pineau et al., 2021 ) . We use the term reproducibility in a narrow technical sense (see the glossary entry on computational reproducibility ). In contrast, replicability here means arriving at the same scientific conclusions in a broad sense. This terminology is in line with the National Academies of Sciences, Engineering, and Medicine ( 2019 ) . In terms of the reliability of results, it means that replicability is more important than reproducibility. Note that Drummond ( 2009 ) , for example, uses the terms the reverse way.

Reproducibility (computational). Means that the provided code technically achieves the same result on the provided data, and not that code, experimental design, or analysis are error-free and that we can qualitatively reach the same conclusions for the same general question under slightly different technical conditions. It is thus not a sufficient condition for replicability. Note that Tatman et al. ( 2018 ) differentiate three levels of reproducibility.

Two historically distinct types of testing theory . This refers to two approaches to statistical testing developed by Ronald Aylmer Fisher (1890–1962) on the one side and Jerzy Neyman (1894–1981) and Egon Sharpe Pearson (1895–1980) on the other. Only the former includes p 𝑝 p italic_p -values and a single (null) hypothesis. The latter includes two hypotheses and hinges on statistical power and Type I and II errors (Schneider, 2015 , p. 413) . More generally, Fisher’s approach is “[b]ased on the concept of a ‘hypothetical infinite population’,” “[r]oots in inductive philosophy” and “[a]pplies to any single experiment (short run),” while Neyman-Pearson’s approach is “[b]ased on a clearly defined population,” “[r]oots in deductive philosophy,” and “[a]pplies only to ongoing, identical repetitions of an experiment, not to any single experiment (long run)” (Schneider, 2015 , p. 415, Table 1) .

Validity. Note that there is no concise definition of the term. In psychology, internal and external validity are differentiated in particular. According to Campell ( 1957 , p. 297) internal validity asks if “in fact the experimental stimulus make some significant difference in this specific instance?” External validity, on the other hand, asks “to what populations, settings, and variables can this effect be generalized?” The former appears to be closely related to in-distribution generalization performance in ML, the latter to out-of-distribution generalization. In contrast, The Stanford Encyclopedia of Philosophy states for experiments in physics (Franklin & Perovic, 2023 ) : “Physics, and natural science in general, is a reasonable enterprise based on valid [emphasis added] experimental evidence, criticism, and rational discussion.” Several strategies that may be used to validate observations are specified. These include the following: 1) “Experimental checks and calibration, in which the experimental apparatus reproduces known phenomena”; 2) “Reproducing artifacts that are known in advance to be present”; 3) “Elimination of plausible sources of error and alternative explanations of the result”; 4) “Using the results themselves to argue for their validity”; 5) “Using an independently well-corroborated theory of the phenomena to explain the results”; 6) “Using an apparatus based on a well-corroborated theory”; 7) “Using statistical arguments.” However, it is emphasized that “[t]here are many experiments in which these strategies are applied, but whose results are later shown to be incorrect […]. Experiment is fallible. Neither are these strategies exclusive or exhaustive. No single one of them, or fixed combination of them, guarantees the validity of an experimental result” (Franklin & Perovic, 2023 ) .

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Does repetition equal more of the same? tie strength and thematic orientation in R&D networks

Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft

* E-mail: [email protected]

Affiliations INGENIO (CSIC-UPV), Universitat Politècnica de València, Valencia, Spain, ANETI Lab, Corvinus Institute for Advanced Studies (CIAS), Corvinus University, Budapest, Hungary

ORCID logo

Roles Conceptualization, Formal analysis, Supervision, Writing – review & editing

Affiliation INGENIO (CSIC-UPV), Universitat Politècnica de València, Valencia, Spain

Roles Conceptualization, Supervision, Writing – review & editing

  • Dima Yankova, 
  • Pablo D’Este, 
  • Mónica García-Melón

PLOS

  • Published: May 23, 2024
  • https://doi.org/10.1371/journal.pone.0303912
  • Reader Comments

Fig 1

Despite organizations’ documented tendency to repeat research collaborations with prior partners, scholarly understanding on the implications of recurring interactions for the content of the collaboration has been fairly limited. This paper investigates whether and under what conditions organizations use repeated research partnerships to explore new topics, as opposed to deepening their expertise in a single one (exploitation). The empirical analysis is based on the Spanish region of Valencia and its publicly funded R&D network. Employing lexical similarity to compare the topic and content of project abstracts, we find that strong ties are not always associated with the exploitation of the same topic. Yet, exploration is more likely when at least one of the partners mobilizes a network of distinct contacts and can access novel knowledge.

Citation: Yankova D, D’Este P, García-Melón M (2024) Does repetition equal more of the same? tie strength and thematic orientation in R&D networks. PLoS ONE 19(5): e0303912. https://doi.org/10.1371/journal.pone.0303912

Editor: Bruno Miguel Pinto Damásio, Universidade Nova de Lisboa, PORTUGAL

Received: September 22, 2023; Accepted: May 2, 2024; Published: May 23, 2024

Copyright: © 2024 Yankova et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in the study are available from the Zenodo repository (DOI: 10.5281/zenodo.10899714 ).

Funding: DY, as part of the POLISS Innovative Training Network, has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 860887. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

The literature on inter-organizational networks has demonstrated organizations’ proclivity to repeat research interactions with prior partners, resulting in stronger ties and a reinforcement of existing network structures [ 1 , 2 ]. Empirical studies have documented this type of organizational inertia in partner selection within both national and international R&D networks [ 3 – 7 ]. Yet, the implications of repeated engagements for the content of the collaboration remain an issue of contested debate. Do organizations leverage repeated R&D ties to deepen their expertise in a particular thematic domain or could strong links also be associated with the exploration of different topics? The answer to these questions can shed light on the value of repeated ties for individual and collective performance.

Innovation scholars have argued that strong bonds between partners are subject to declining marginal benefits [ 8 , 9 ]. At first organizations accumulate gains from solidifying existing relationships, as transaction costs decrease [ 10 ] and the exchange of complex and tacit knowledge becomes easier, but beyond a certain threshold the learning potential for both parties may be exhausted [ 10 – 13 ]. Social embeddedness can begin to act as a filter for the entry of new knowledge and ideas, causing cognitive isolation and suboptimal innovative performance [ 14 – 16 ].

So far scholarly understanding on the role and functionality of repeated interactions has been constructed independently and with little consideration for the nature of ties [ 17 , 18 ]. Few studies have looked explicitly at interaction processes and the strategic decisions partners make when repeating research collaborations [ 19 ]. The goal of this paper is to shed light precisely on this issue, by comparing the topics of recurring collaborations. It investigates whether and under what conditions organizations use repeated engagements to explore new topics, as opposed to deepening their expertise in a single one.

This question is important for several reasons. If two organizations systematically exploit the same topic and tap into the same knowledge domain, their interactions will likely yield benefits at first, but hinder long-term innovative performance. If, however, subsequent collaborations begin to explore different topics, either because a priori the organizations involved possess a diverse internal repository of competencies and skills, or because they are capable of continuously sourcing novel knowledge through additional partnerships, the prospects of decreasing marginal benefits may weaken. Hence, the basic premise of this paper is that the relationship between strong ties and performance will be at least partially contingent on the nature of the exchange between partners, and their strategic orientation (exploration vs exploitation) in instances of repeated engagement. For the rest of the paper, we will use the term exploitation to denote persistent focus on the same topic in recurrent collaborations, while exploration refers to a shift in focus towards new topics, which differ from those addressed in the first instance of engagement between partners.

Disentangling the connection between the strength of ties and their thematic orientation merits scholarly attention also as a departure from the structuralist perspective, which dominates knowledge network studies, and which treats inter-organizational links as virtually homogeneous [ 19 – 21 ]. By qualifying strong ties based on their content, we can learn about the specific functions that seemingly identical types of relationships exercise in the context of inter-organizational networks [ 18 , 22 , 23 ].

To conduct the empirical analysis, we concentrate on the Spanish region of Valencia. We collected information on all R&D partnerships, formed between 2016 and 2022, which received public subsidy from one of the top two regional sources of innovation-related funding. The final dataset of 194 collaborative projects was used to map the local inter-organizational network and assess to what extent repeated engagements between partners in the 7 years of observations were associated with either topic exploration or exploitation. We also test how partners’ access to diverse knowledge and resources influenced the likelihood of them adopting one strategic approach over the other in subsequent collaborations. Given the rich literature on the benefits of degree centrality for learning, knowledge recombination and sustained innovative performance [ 24 – 27 ], well-connected actors may be more likely to explore new topics when re-engaging with the same partner.

This paper adds to a growing stream of literature that recognizes the importance of strong ties as a frequent phenomenon in interpersonal and interorganizational networks [ 28 – 30 ]. It aims to illuminate the interplay between the strength of a collaboration tie and its nature or thematic orientation (exploitative vs explorative). The contribution is thus twofold: first, we develop a theoretical argument to suggest that the relationship between repeated engagement and thematic orientation is fundamental for disentangling the effect of social cohesion on individual and collective performance. Though previous studies have demonstrated a clear link between network structural properties and actors’ performance [ 24 , 31 – 33 ], there is still relatively little understanding on the precise mechanisms which underlie this relationship [ 17 , 19 , 21 ]. Second, from a methodological perspective, our study applies recent advancements in machine learning and natural language processing (NLP) techniques to build a measure of thematic orientation that is based on the lexical similarity between project abstracts. Instances of NLP usage in the innovation and management literature are increasing [ 34 – 36 ], but they have concentrated primarily on patents’ textual data, whereas our goal is to showcase the potential of such methods to advance scholarly understanding of R&D networks and the value of inter-organizational linkages.

From a policy standpoint, our analysis is also highly relevant. In regional R&D networks, topic exploitation may be a desirable outcome if efforts are directed toward building competitive advantage in nascent or underexplored economic domains. Conversely, it can be highly undesirable if the network is stagnating, and policymakers are looking to branch out of existing development paths. Therefore, understanding how and under what conditions repeated engagement between regional partners is associated with either topic exploitation or exploration could help policymakers steer more effective network interventions.

The paper is structured as follows: section 2 lays the theoretical foundation of the study and relates it to the literature on inter-organizational networks and social capital. Section 3 introduces the characteristics of the dataset, and section 4 outlines our approach to operationalizing thematic orientation by building a measure of abstract similarity. Section 5 details the results of the analysis, while section 6 discusses their implications for theory and policy.

2. Theoretical background

2.1. the connection between the strength of ties and their thematic orientation (explorative vs exploitative).

A growing body of literature points to the importance of social embeddedness in driving the structural evolution of inter-organizational networks [ 37 ]. The formation of new partnerships between organizations is perceived in the context of their existing social structure and their history of prior ties [ 1 , 2 ]. Past engagement seems to impact the course of future cooperation in a path-dependent fashion, as former ties repeat themselves. This form of organizational inertia in partner selection has been observed across different types of networks. For instance, when analyzing the evolution of an industrial cluster in Italy, Lazzaretti & Capone found that the collaborative work experience developed between two actors is particularly influential in shaping tie formation during the cluster emergence phase, although less so in the development stage [ 38 ]. The presence of a previous relationship was also shown to influence SMEs’ partner selection in the process of consolidating a regional innovation network [ 4 ], while additional evidence suggests this effect intensifies during periods of crisis [ 39 ]. Similarly, in a study on university-industry research networks in the UK, D’Este & Iammarino noted the strong role played by prior joint experience, which the authors choose to conceptualize as a form of organizational rather than social proximity [ 5 ]. Balland et al. also observed a consistently stable effect of social embeddedness on inter-firm relations when analyzing the global video game industry [ 40 ]. Finally, several studies on EU-FP network collaboration patterns have highlighted the propensity of organizations, be they firms or public research bodies, to select familiar partners with whom they share a history of prior engagement [ 3 , 6 , 7 ].

The implications of strong ties for performance have been the subject of many empirical studies. Some highlight the benefits of strong bonds for fine-grained knowledge sharing, in line with Coleman’s theory on social capital [ 41 ]. Repeated engagements tend to engender “relational” trust between participating entities [ 42 – 44 ]. This can in turn reduce actors’ perception of expected opportunistic behavior, decrease transaction costs and ease the transfer of both complex and tacit knowledge [ 10 – 13 , 17 , 45 , 46 ]. On the other hand, strong ties between partners may also reinforce retention mechanisms and prevent the inflow or nonredundant information [ 15 , 16 , 47 ]. When organizational partners become so narrowly focused on a particular type of activity, a transition toward new developments becomes difficult, leading companies to display inferior economic performance [ 14 ].

Taking on board both perspectives, scholars have settled the relationship between strong ties and performance as an inverted U-shape. Organizations benefit from consolidating strong relationships up to a certain level, beyond which social embeddedness can act as a filter for the entry of new knowledge and perspectives, causing cognitive isolation and suboptimal innovative performance [ 8 , 9 ]. This situation is also known as “the proximity paradox”, since the same factors that drive actors to connect and exchange knowledge may also lead them to innovate less in the long run [ 48 , 49 ].

In this paper, we argue that the consequences of repeated collaborations for individual and collective performance cannot be fully disentangled without examining the content of ties and acknowledging that organizations may leverage repeated interactions for different purposes. In his seminal work on the strength of weak ties, Granovetter noted that “treating only the strength of ties ignores […] all the important issues involving their content”, and stressed that the relationship between strength and degree of specialization of ties deserves further analysis [ 50 ]. In addition, as pointed out by Reagans & McEvily many studies tend to infer knowledge transfer from the association between network structure–including tie strength–and performance, without directly examining the essence of the exchange [ 17 , 19 , 51 ].

In order to unpack this association, we employ the so-called “connectionist” perspective [ 20 ], which looks beyond the structural or topological properties of the network, and treats ties as conduits of knowledge and resource flow [ 18 , 22 , 23 ]. This approach acknowledges that seemingly identical types of links could exercise distinct functions and transmit varying kinds of resources. Furthermore, it recognizes that organizations in alliance networks are not simply “helpless targets of structural influence”, but active agents who make conscious decisions about the way they leverage strong bonds [ 19 ].

Take for instance the following scenario: if two organizations systematically tackle the same topic in multiple R&D collaborations, their interactions may follow the inverted U-shape scholars describe, whereby topic exploitation would at first yield positive outcomes, but if continued for too long would hamper innovative performance. In the process of exploiting the same topic multiple times, organizations are expected to tap into the same knowledge pool, building expertise at the start but eventually exhausting the recombination potential. If, however, subsequent collaborations begin to tackle different topics, either because a priori partners possess diverse internal repository of competencies, or because they are capable of sourcing those through third-party links, the graph of decreasing marginal benefits may take a different shape. At the very least, we can expect the threshold of redundancy, when the two partners have little learning space left, to become higher. Hence, the relationship between strong ties and performance is contingent on the content of the exchange between partners, and whether or not they choose to adopt an explorative or exploitative approach in repetitive engagements. This is not to suggest that one is inherently preferable than the other. Rather, our goal is to illustrate that structurally equivalent relations, in the form of strong bonds, can have very different consequences for knowledge exchange and learning depending on the content of the collaboration itself. To examine the heterogeneity of organizational approaches to repeated collaborations, we pose the following research question:

  • Q1: To what extent do organizations leverage repeated collaborations to exploit the same topic multiple times or to explore new ones ?

2.2. Factors that moderate the relationship between strong ties and thematic orientation

The extent to which organizations use repeated collaborations to exploit prior topics or explore new ones may be influenced by their access to complementary knowledge and resources from third parties. Assuming that the knowledge repository of an entity is not static, forming alliances with a wide range of partners creates new pipelines for fresh ideas, perspectives, and information to flow [ 22 , 23 ]. This may in turn inspire greater diversification in the topics and content of repeated collaborations.

So far, multiple empirical studies have demonstrated that the size of a firm’s ( ego ) network, defined in terms of both direct and indirect contacts ( alters ), is positively associated with innovative output [ 24 , 27 , 31 , 52 ]. The theoretical framework, underlying these findings, assumes that well-connected organizations will have a more timely access to larger volumes of information through their established relationships. Yet, in line with the resource-based perspective [ 53 ], some researchers have argued it is not the sheer number of connections that matters, as much as the diversity of knowledge which can be sourced through direct relationships [ 31 , 54 ]. In other words, a focus on the composition of the ego’s first-order network, and more specifically the number of distinct partners, may be more appropriate. Assuming that each organization holds a unique set of assets and capabilities, direct relationships to multiple organizations may provide the best access to non-redundant knowledge and resources. In other words, we posit that the number of new first-order connections partners build before re-engaging with each other again may influence their propensity to explore new topics in a repeated exchange. To examine this issue further, we propose a second research question:

  • Q2: To what extent does partners’ range of new connections to other organizations inspire new topic exploration in their repeated collaborations ?

3. Data collection and context

To investigate the relationship between repeated engagement and thematic orientation, we focus on an existing knowledge network formed by collaborative publicly funded R&D projects in the Spanish region of Valencia. The choice of a regional level network is appropriate given the widespread consensus that knowledge sharing tends to be highly concentrated and it mostly takes place in dense, local networks with rich social capital [ 55 – 57 ]. We extracted data on all awarded projects from the official grant resolution records of two regional organizations: the Valencian Institute for Business Competitiveness (IVACE) and the Valencian Innovation Agency (AVI). IVACE was established in 1984 and its mission is geared toward assisting regional SMEs in increasing their competitiveness and overall innovative capacity. AVI, on the other hand, was created more recently in 2018, specifically for the purpose of managing the innovation strategy of Valencia and improving the regional productive model. Together, these two organizations manage approximately 75% of the 1.6 billion Euros that the regional government has designated for the implementation of the local innovation strategy [ 58 ]. Thus, we can be fairly confident that by concentrating on AVI and IVACE, we are capturing a substantial amount of the publicly subsidized R&D collaboration network in the region.

According to public records, in the period 2016–2022 the two organizations funded a total of 220 collaborative R&D projects, under three lines of action: “R&D in cooperation” (IVACE), “Strategic projects in cooperation” (AVI) and “Consolidation of the business value chain” (AVI). While the programs managed by each organization have certain differences in eligibility criteria, they share a common purpose–to enhance downstream R&D cooperation between regional actors and to support the creation of new products, processes, or services through one of two types of projects: (1) industrial research and design or (2) experimental development. All three programs open calls on an annual basis, and none places restrictions with regards to research themes. Average project duration is between one and two years. The calls by IVACE are open to private companies only, while those managed by AVI allow for all type of regional actors, including universities, research centers, technological institutes, and even non-profits, to participate. The mean subsidy per project is 87000 EUR for IVACE, and 475000 EUR for AVI. Hence, we are de facto accounting for projects of different size and membership structure (firm-firm, firm-university, etc.).

Once the list of all 220 projects and their team members was compiled, a separate search was performed to collect textual descriptions for each project via several channels: (a) the official website of an organization involved in the project, (b) newspaper articles, or (c) the website of the funding entity. When no information about the collaboration was available online, we requested a brief description of activities from the principal investigator of the leading organization. Thus, our final sample consists of 194 R&D projects with a description longer than 50 words. This represents 88% of the entire list of funded projects in the time period of study. For the remaining 12% we were either unable to obtain a textual description or the one we had was too short and therefore insufficient to carry out a meaningful textual analysis. Most project descriptions mention the objective of the partnership, planned activities, and expected results. As shown in Fig 1 , the majority of abstract lengths fall within a narrow range of 50 to 450 words, with only a few outliers. Since the mean (234 words) and the median (221 words) values are rather close, we can assert that the abstracts in our sample are of comparable length.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0303912.g001

The resulting R&D network consists of 362 individual organizations. 78% of them are private for-profits and about a third of all entities (nodes) participated in more than one project. The total number of realized links (edges) is 779, and roughly 5% of them were repeated at least once in the 7-year period. Table 1 provides descriptive statistics of the final sample, on which the analysis was performed.

thumbnail

https://doi.org/10.1371/journal.pone.0303912.t001

The “Technological institutes” of Valencia can be considered a unique element of the local innovation ecosystem, and as such merit further contextualization. Established with support from regional business associations and the government between the 1970s and the 1990s, the institutes operate as private research non-profit entities, whose primary goal is to support regional SMEs in advancing their capacities and innovative activity. Each institute is housed in a single geographic location (i.e., no dispersion of research activity), and is generally dedicated to a specific field, such as energy, textile, biomechanics, or others. The region also has several independent research centers, which do not belong to a university structure. Others work exclusively on health-related topics. These are the so-called “Health research institutes” (see Table 1 ).

Finally, with respect to university-type beneficiaries, we have disaggregated the larger organizations into specific departments and teams. This means that every time a grant resolution referred to a particular university, a manual search was performed to identify the exact entity within the university structure that engaged in the collaboration. This allows us to build a fine-grained image of the regional R&D network and more importantly–it facilitates the operationalization of repeated engagements. Take for instance the following scenario: a company x completing two projects with university y can hardly be considered a case of repeated engagement, unless we can confirm that both instances concerned the same department or research team within the university (disaggregate level). More details on the operationalization of all variables are provided in the following section.

4. Variables and methods

To answer the main research questions, we build a 2-step approach, and our unit of analysis is the pair of R&D projects. First, we construct all possible combinations of project pairs. Since our sample consists of n = 194 projects, all pairs amount to N = n*(n-1)/2 = 18721. Then we compare those that share a common dyad of partners–what we consider instances of repeated collaboration–to those that do not. We use descriptive analysis to shed light on the first research question. In the second stage, we isolate only project pairs which represent instances of repeated collaboration, meaning: they share at least one partner dyad in common (75 pairs), in order to test how the access to diverse sources of knowledge and resources of the two organizations influences the observed thematic orientation in their repeated engagements (explorative vs exploitative). In this second stage, we run a beta regression model, which is particularly suitable when the variable of interest is continuous and restricted to the interval (0,1) [ 59 ]. Fig 2 summarizes the 2-step methodological approach graphically. Below we elaborate the operationalization of our dependent and independent variables.

thumbnail

https://doi.org/10.1371/journal.pone.0303912.g002

4.1. Constructing a measure of ties’ thematic orientation

Since we are interested in analyzing whether organizations leverage repeated R&D collaborations to further exploit a particular topic or explore new ones, our primary dependent variable compares the thematic similarity between pairs of projects. Let us first revise the logic of this approach before diving into the empirical calculation.

Following the “connectionist” view of inter-organizational ties as pipelines that transmit tangible and intangible resources [ 20 , 23 ], one arguably reliable way to infer the content of those “pipe” flows is by tracing the description provided by the actors themselves . Joint project abstracts, or other types of descriptive project documentation, tend to provide sufficient information on–among other things–the specific area of intervention that partner organizations are focusing their collaboration on. Analyzing large volumes of text, however, is both challenging and burdensome. Fortunately, recent advancements in machine learning and NLP have opened up new possibilities for systematic interpretation of textual documents, including thematic classification and comparison.

Instances of NLP application in the innovation and management literature have proliferated, but so far they focus primarily on textual data from patents [ 60 ]. Balsmeier et al., for example, introduce a measure of patent novelty, based on the first occurrence of a word in the patent corpus [ 34 ]. Kaplan & Vakili use topic modelling, an unsupervised machine learning technique, to uncover the emergence of new topics in patent data and interpret those as cognitive breakthroughs [ 35 ]. Also relying on textual analysis, Kelly et al. construct a measure of lexical similarity to quantify commonality in the topical content of patents, in order to identify significant ones–that is patents whose content is distinct from prior patents (more novel), but similar to future ones (more impactful) [ 36 ].

Here we propose to leverage some of these advancements to measure the lexical similarity between pairs of R&D project abstracts so as to discern if repeated collaborations deal with the same topic. Lexical similarity is determined by the degree of lexical overlap, that is: how many terms from document i also appear in document j . It is a corpus-based method, which takes into account the co-occurrence of words across the entire collection of documents (corpus) [ 61 , 62 ]. We assume that projects whose descriptions show high levels of lexical similarity represent collaborative work in thematically proximate fields. When those projects were carried out by the same teams, we can interpret their repeated engagements as a continuation of previous work (thematic exploitation). Alternatively, lower similarity between project descriptions suggests that partners likely explored a completely different topic in their subsequent R&D collaboration.

The following section details all steps in the calculation of the Abstract Similarity Score.

4.1.1. Measuring abstract similarity.

Based on the sample of 194 regional collaborative R&D projects, we calculate a lexical similarity score for all possible pairs of project abstracts (18721 pairs). We begin by constructing a document-term matrix (DTM), whereby each row represents a unique document (project abstract) and each column represents one term (word). The value of each matrix cell ij reflects the number of times term j appears in document i . Before creating the DTM, the corpus of abstracts is pre-processed. This includes translating project abstracts from Spanish to English, and removing punctuation, numbers and stopwords, such as pronouns, articles, specific verbs, and other common speech elements which carry little useful information. In addition, terms are trimmed to their meaningful linguistic base or root form, called a lemma. This yields a total of 3846 unique terms. A full account of the pre-processing steps is available in the section Supporting Information (see S1 Fig ).

As highlighted by Kelly et al., a key consideration in building any similarity metric for a pair of text documents is to appropriately weigh the words by their importance [ 36 ]. This is particularly crucial for our sample, since project descriptions follow a common structure and certain words (“objective”, “results”, “activities”) will be registered with greater frequency across the majority of text pairs, making them appear more similar than they really are. To account for that, we employ the “term-frequency-inverse-document-frequency” (TF-IDF) transformation method.

where can i find empirical research articles

The final DTM (dimensions: 194 x 3846) is quite sparse, and most term-frequency vectors contain many 0 values. To estimate how textually close two project abstracts are, we use cosine similarity, which is measured by the cosine of the angle between a pair of term-frequency vectors and determines whether they are roughly pointing in the same direction. It is one of the earliest and most widely used distributional measures [ 64 ]. The advantage of using cosine similarity is that it ignores zero-matches, essentially safeguarding against false positives. For example, two term-frequency vectors may have many 0 values in common, but this does not make them similar, since the corresponding documents share few words. Cosine similarity focuses on the words two vectors have in common and the respective weight of these words [ 65 ]. It is a continuous metric that goes from 0 to 1. A high similarity score implies that two abstracts use the same set of words in the same proportion, while a lower similarity value shows no significant overlap between the texts.

4.1.2. Illustrating the method with practical examples.

Next, we discuss the meaning of the Abstract Similarity Score in practice. We examine first a pair of projects, which has one of the highest similarity scores (0.44) in our sample. Given that abstract length surpasses 300 words, we have included only selected excerpts from the text, which highlight succinctly what each project is about.

Project a description

“[…] Detection and control of sulphate-reducing bacteria in drinking water infrastructures is presented in order to detect the critical points of the drinking water distribution network and implement the necessary improvements to reduce the risk of leaks and prevent water from losing its quality . The main objective of the project is to control and eliminate the development of sulphate-reducing bacteria in drinking water infrastructures through the development of new techniques for the detection of microorganisms and the functionalization of surfaces , reducing the risk of breaks and leaks and increasing the resilience of the drinking water distribution system . ”

Project b description

“[…] Optimization of the hydraulic performance of the drinking water network by means of optical fibre with the aim of detecting possible leaks generated in the supply network , as well as locating them throughout the system . The main objective of the project is the development of a system for detecting leaks and structural failures in drinking water pipes , accurate and economical , operating continuously , based on photonic technologies , and more specifically on "Distributed Acoustic Sensing" (DAS) , which can be implemented in pipes in service and that its installation serves as a primary structural element for the implementation of future fiber optic sensors , specifically of water quality , without the need for new wiring . ”

From the excerpts we can see that the two projects rely on different technologies, but the area of intervention is clearly similar: improving the resilience of the drinking water distribution system. Next, we compare a second pair of projects with a low similarity score (0.02), selected at random.

“The main objective of this project is the research and development of an intelligent tool for dermatological exploration that assists in the detection and delimitation of the main types of skin cancer and does so in real time without the need for biopsy and through an automated and contactless technique […]”
“The aim of this project is […] to improve the management of artificial wetlands for wastewater treatment , to naturalize their effluents , to minimize the impact on the receiving aquatic environment and to contribute to the mitigation of climate change […]”

Clearly, the two projects deal with two very distinct topics, and the algorithm is accurately assigning a low similarity score to this particular pair. Note that the cosine similarity method itself does not tell us if project partners employ similar technologies, as such kind of information would be difficult to extract and requires more detailed textual data. Nevertheless, we consider that the project abstracts we work with are sufficiently descriptive to allow for meaningful comparisons of topical content.

4.2. Independent variables

In the first stage of the analysis, where we want to check the similarity of projects in cases of repeated collaborations, we introduce a dummy variable, called SharedDyad , which is equal to 1 if the two project teams have at least 2 organizations in common. In other words, for a pair of project abstracts i 1 -i 2 , we compare the team of partners T i1 to the team of partners T i2 . Assume that project i 1 was carried out by T i1 = [A, B, C], while project i 2 was executed by T i2 = [A, B, D, E], where A, B, C, D and E are five unique organizations. Since the tie between A and B has persisted in both projects, the variable SharedDyad would assume the value of 1 even though the two teams are not completely identical and contain additional partners [E, C and D]. SharedDyad is equal to 0 when the two teams have only 1 or no partners in common. This approach is consistent with other studies on team recurrence, which also set a minimum threshold of one repetition to qualify strong ties [ 66 , 67 ]. Fig 3 provides an illustrated example to further clarify the operationalization of SharedDyad . We opted for a dichotomous variable, rather than a categorical one, because in our sample instances of 3 shared partners were extremely rare. Therefore, we cannot make a strong distinction between repeated dyads vs. repeated triads, but we believe that comparing the two cases may yield interesting insights.

thumbnail

https://doi.org/10.1371/journal.pone.0303912.g003

In the second stage of the analysis, where we concentrate exclusively on repeated collaborations with SharedDyad = 1, we want to test how the joint access to diverse knowledge and resources for recurring partners influences the observed thematic orientation (exploitative vs explorative) of their repeated engagements.

Our primary explanatory variable is thus the social capital of both partners, accrued in the time between the first and the second collaboration and reflected in the measure NewAlters. NewAlters is a continuous variable which for each pair of organizations A-B counts how many new distinct entities did A and B connect to since their first engagement, excluding any of the partners in the projects where A and B jointly participate. Note that we only consider first-order direct connections. While indirect links may also benefit the recipient’s knowledge production, it is direct relationships that collect and process the indirect information and deliver it to the focal node, or in our case–the pair of nodes [ 24 ]. The assumption here is that the total number of unique pipelines A and B can draw upon for external knowledge will influence the extent to which the pair may explore new topics when re-engaging again.

We also introduce several controls. First, we construct a dummy variable ExtraPartners, which takes the value of 1 when at least one of the repeated collaborations involved additional team members. This means that unless both projects i 1 and i 2 were carried out exclusively by the same pair of organizations, ExtraPartners will be equal to 1. Assuming that extra partners can bring in a unique set of knowledge to the collaboration, their presence in the consortium can reasonably influence the degree of thematic exploration in repeated engagements.

Fig 4 illustrates the operationalization of NewAlters and ExtraPartners, using concrete examples. In the case of NewAlters , we can see that at the time of the second collaboration between A and B (at t = 1), the pair is connected to 2 new organizations [D, E], to whom neither A nor B had a connection at t = 0. Therefore, in the example provided NewAlters is equal to 2. In the case of ExtraPartners in Fig 4 , since one of the collaborations between A and B involves an additional partner C, the dummy variable takes the value of 1.

thumbnail

https://doi.org/10.1371/journal.pone.0303912.g004

We further control for the institutional characteristics of the two partners in the shared dyad. If the pair involves one public research organization (PRO) (technological institute, university-affiliated or independent research center) and a firm, PRO-Firm takes the value of 1, and 0 otherwise. If both members of the shared dyad are PROs, the dummy variable PRO-PRO takes the value of 1, and 0 otherwise. This allows us to distinguish between the behavior of a firm re-engaging with a PRO, as opposed to two PROs collaborating again. Finally, we also control for the time lag between the first and second collaboration. Since the public calls are launched on an annual basis, the variable TimeLag is a simple count of the number of years passed between the first and second engagement. The model to be estimated is given by:

Abstract Similarity Score = NewAlters + ExtraPartners + PRO-Firm + PRO-PRO + TimeLag

5. Results and discussion

In this section we present the results of the two-stage analysis.

5.1. Repeated engagement and thematic orientation: Descriptive results

We begin by exploring the distribution of Abstract Similarity Score and SharedDyad ( Table 2 ). One immediate observation is that the majority of abstract pairs show no significant overlap in textual content. The distribution is highly skewed, with only a small fraction of pairs being very closely related. On average, abstract pairs have a lexical similarity of 0.029. This is not surprising since the open calls we considered target R&D collaborations from a range of sectors and are very thematically diverse. As for team pairs, we can see that a small fraction of project pairs contains a recurrent dyad of partners (i.e., 75 out of 18721, or 0.4%).

thumbnail

https://doi.org/10.1371/journal.pone.0303912.t002

The two variables are positively correlated. Since one of them is dichotomous, we apply a special case of the Pearson correlation coefficient, namely the Point-Biserial correlation coefficient. Its calculated value is 0.26, significant at the 1%. We also explore the distribution of the Abstract Similarity Score for pairs of projects with no dyad vs. those with at least 1 dyad in common. Fig 5 shows the resulting plot.

thumbnail

https://doi.org/10.1371/journal.pone.0303912.g005

What is visible from Fig 5 is that, on the whole, project pairs which have at least one repeating pair of partners (a shared dyad), show higher median scores of thematic similarity than projects which share only one or no partners at all. However, this generally positive relationship between repeated engagements and abstract similarity is far from straight-forward. In fact, we observe a great degree of variation across project pairs with a shared dyad. The interquartile range, which accounts for the middle 50% of scores, goes between 0.05 and 0.2, while the maximum abstract similarity score (excluding outliers) is as high as 0.4. At the same time, the bar on the left-hand side contains multiple outliers: pairs of projects which exhibit relatively high similarity but were carried out by completely different teams. This may be attributed to unobserved geographical proximity (i.e. organizations belong to the same cluster and therefore work on similar topics) or the presence of a common third-party, which links two distinct consortia (triadic closure) [ 68 ]. Hence, in response to our first research question, we find that repeated collaborations are not univocally associated with topic exploitation and other factors seem to influence the thematic orientation of strong ties. It appears that in some instances, collaborating pairs use subsequent R&D partnerships to extend prior work along the same topic, adopting an exploitative approach, while others do not. This provides original support to the argument put forward in the theoretical section, namely that inter-organizational network links are far from homogeneous and that only some, but not all, instances of repeated coupling between actors are associated with the exploitation of the same research topic. This implies that “getting caught up” in one type of activity after several collaborations may not necessarily be a product of the structural setting alone and the existence of strong coupling, as much as it is a product of organizations’ strategic choices about how they use their strong ties.

Fig 6 shows an aspatial map of the R&D network and highlights the structural location of all repeated ties.

thumbnail

Repeated ties are marked in red. Nodes legend: grey circle (firm), dark blue square (technological institute), light blue square (independent or university-affiliated research center), white circle (other).

https://doi.org/10.1371/journal.pone.0303912.g006

The resulting network appears centralized around the main technological institutes and several other PROs. This is consistent with studies of regional and extra-regional networks, where PROs were found to serve as intermediaries, and thus appear as frequent partners in regional collaborations [ 69 , 70 ].

Although instances of repeated ties are relatively scarce, they appear both in the core and in the periphery. Given the positive effect of strong bonds on inter-organizational trust, their “balanced” distribution is beneficial for the flow of tacit complex knowledge across the network architecture. Fig 6 also showcases the institutional heterogeneity of actors involved in recurrent collaborations, which further motivates the second part of our analysis, where we explore how a dyad’s access to new distinct alters may moderate the displayed thematic orientation (explorative vs exploitative) of repeated ties.

5.2. The role of partners’ social capital

In this section, we concentrate exclusively on pairs of projects which have at least one repeated dyad (in total 75 pairs). We begin by presenting descriptive statistics ( Table 3 ) followed by a Point-Biserial correlation matrix ( Table 4 ), chosen for its applicability to data that include both dichotomous and continuous variables.

thumbnail

https://doi.org/10.1371/journal.pone.0303912.t003

thumbnail

https://doi.org/10.1371/journal.pone.0303912.t004

The high mean value of ExtraPartners implies that for most project pairs, the two repeating partners were not the only members in the consortium. The values for NewAlters vary between 0 and 64. This suggests that in some cases, the repeating partners built an extensive network of direct relationships after the first collaboration and by the time of executing the second one, the dyad had collectively accumulated 68 new unique alters in their ego network. The average time lag between the two collaborations is 1.6 years.

Examining the matrix of correlations, we note that the number of new alters a pair of partners gains, is negatively correlated with the observed Abstract Similarity Score, implying that greater social capital is positively associated with exploration of distinct research topics. Furthermore, most project pairs, where the repeated dyad is embedded in a rich network of alters, also include extra partners in the consortium. Connectivity of the repeated dyad seems to correlate positively with certain institutional characteristics of the organizations. This can be expected since PROs tend to have a disproportionately high degree centrality. They establish numerous links with other nodes, and have sufficient human, administrative and financial capacity to maintain them [ 71 ].

Table 5 shows the results of the beta regression. We first run a base Model 0, where we include only controls, followed by Model 1 including only the primary explanatory variable and a third model where all relevant variables are featured.

thumbnail

https://doi.org/10.1371/journal.pone.0303912.t005

Both control variables, reflecting the presence of one (PRO-Firm) or two (PRO-PRO) public research organizations in the dyad, have a negative coefficient, which is significant in Model 0 but not in Model 2, suggesting that PROs are generally associated with a more exploratory strategy in repeated ties. Similarly, the coefficient for NewAlters is negative and remains significant when controlling for the institutional heterogeneity of organizations (Model 2). This means that organizations which built an extensive network of connections are also more likely to explore a different topic when re-engaging with a previous partner. This behavior can be attributed to their enhanced access to new knowledge and ideas, which is likely to inspire greater creativity and the exploration of diverse research avenues. This rationale is grounded in the principles of social network theory. Connectivity facilitates the flow of information, which benefits the knowledge base of the organization, but also equips it with the capacity to identify new research trajectories. Conversely, organizations which lack sufficient network connectivity, may have limited access to novel knowledge and ideas, and will therefore be more inclined to adopt an exploitation strategy when engaging repeatedly with the same partner. Our additional control variable ExtraPartners and TimeLag do not seem to exert a significant influence on the thematic orientation of strong ties.

The results of the regression analysis suggest that the range of two partners’ combined ego network favors the exploration of new research trajectories in recurrent collaborations. The social capital available to the pair appears to promote access to diverse sources of knowledge and facilitate the shift from old research trajectories into new ones. In other words, when organizations with high number of new alters build strong ties, these ties exhibit more topic diversity, than strong links between isolated nodes with little connection to the rest of the network. Because of the relatively high correlation between NewAlters and PRO-PRO, and the fact that most central actors tend to be PROs, we cannot unequivocally attribute the “diversifying” effect to one factor alone.

On a theoretical level, the results showcase that social capital embedded in a particular linkage cannot be treated as a static asset. If one or both of the participating organizations in the dyad has a rich network of external contacts and is capable of renewing its knowledge base over time, the value of the established contact may persist longer and cycles of topic exploitation can be followed by the exploration of new research avenues. In other words, the value of strong ties may not necessarily “wear off” in an inverted U-shape the way conventional theory suggests. Moreover, these findings have implications for the framing of the proximity paradox, which seems to consider dyadic relations in isolation of the surrounding environment. When two organizations build strong ties, they do not automatically detach themselves from third parties. The conceptualization of the proximity paradox can therefore benefit from adopting a triadic approach. This will allow researchers to better understand the depreciating value of strong bonds over time. Of course, further research is needed to analyze when exactly third-party links enrich the knowledge base of a particular organization and how this influences the value of a node’s persistent ties.

6. Conclusion

This study aims to investigate empirically the relationship between the strength of collaborative inter-organizational ties and their thematic orientation (explorative vs exploitative) in the context of Valencia’s policy-induced R&D network. Moreover, it examines how this relationship plays out for partners with different levels of connectivity. Thus, the paper responds to recent calls for greater focus on networks’ relational aspects and interaction processes [ 19 , 21 ], by examining specifically how organizations approach repeated collaborations. The study delivers several important insights.

First, it demonstrates that recurring partnerships between organizations in an R&D network are not always associated with the exploitation of the same topic. Building strong bonds may also involve the exploration of new topics and the mobilization of new knowledge domains. Nevertheless, in the case of Valencia’s R&D network, the latter scenario appears more likely when the partners involved are connected to a larger network of diverse contacts between the first and second instance of collaboration and can access novel knowledge and ideas. Hence, this study offers original evidence on the heterogeneity of network ties and the importance of considering the function of strong bonds between partners, given that organizations’ approach to repeated collaborations can be evidently distinct. Empirically, this paper introduces a novel approach to measuring thematic orientation in R&D collaborations, which is based on lexical similarity of project abstracts. With regards to policymaking, the analysis is also highly relevant, especially in relation to the smart specialization paradigm, which has become a cornerstone of EU Cohesion Policy [ 72 ]. When efforts are directed toward accumulating competitive advantage in prioritized areas, building strong ties between firms should be stimulated. Conversely, if the network is experiencing stagnation, exploring new themes and mobilizing novel knowledge would be far more critical. A scenario like this calls for investment in partnerships that enroll a broader range of organizations (including PROs) with a rich network of contacts, both local and extra-regional, in order to diversify the thematic focus of R&D collaborations, and avoid deepening the focus on declining industries.

Finally, this study is not without limitations. The most significant one concerns the operationalization of our dependent variable, which builds on the lexical similarity of project abstracts. Since it is plausible that two abstracts describe the same area of research through different terminology, it would be beneficial to repeat the analysis using an advanced semantic similarity method, which is capable of interpreting the meaning of textual information. In addition, when constructing our primary independent variable for network connectivity, we consider only links to other nodes within the same network. In reality, actors may be able to access external knowledge through complementary linkages in parallel unobserved formal or informal networks. Both of these limitations offer promising avenues for further research. More importantly, we believe that doubling down on efforts to examine the nature and content of inter-organizational ties can be particularly beneficial for fleshing out the big questions surrounding the co-evolution of network structures and knowledge flow.

Supporting information

S1 fig. a flowchart of the pre-processing sequence leading to the final document-term matrix..

The analysis was performed in R using the tm (text mining) package ( https://tm.r-forge.r-project.org/ ). The full list of English stopwords can be accessed through the tm reference manual ( https://cran.r-project.org/web/packages/tm/tm.pdf ). In calculating the DTM matrix, no minimum term frequency was set, meaning that all terms were included in the matrix.

https://doi.org/10.1371/journal.pone.0303912.s001

Acknowledgments

The authors acknowledge Ferenc Kruzslicz and Martina Iori for their guidance on various text similarity metrics; Fabrizio Fusillo for his thoughtful feedback on an earlier version of this paper presented at the WICK#10 workshop in Turin (2022); Davide Consoli, Adrián A. Díaz-Faes and Milad Abbasiharofteh, whose input helped enhance the quality of this research. Furthermore, the authors thank Àngels Dasí, Fabio Busicchia and the attendees of Session No. 64 at the 2023 DRUID conference for their insightful comments and suggestions, as well as everyone at the POLISS summer school in Valencia, where the final version of this paper was presented. The shortcomings of the paper are the authors’ responsibility alone.

  • View Article
  • Google Scholar
  • 19. Madhavan R, Prescott J. Chapter 20: The network perspective of alliances: taking stock and looking ahead, Cheltenham, UK: Edward Elgar Publishing; 2017. https://doi.org/10.4337/9781783479580.00033 .
  • PubMed/NCBI
  • 47. Gargiulo M, Benassi M. The Dark Side of Social Capital BT—Corporate Social Capital and Liability. In: Leenders RTAJ Gabbay SM, editors., Boston, MA: Springer US; 1999, p. 298–322. https://doi.org/10.1007/978-1-4615-5027-3_17 .
  • 48. Boschma R, Frenken K. Technological relatedness and regional branching. In: Bathelt H, Feldman M, Kogler DF, editors. Beyond Territ. Dyn. Geogr. Knowl. Creat. Diffus. Innov. 1st edition, Routledge; 2010, p. 64–82.
  • 58. Generalitat-Valenciana. Evaluación intermedia de la estrategia de especialización inteligente para la investigación e innovación en la Comunitat Valenciana 2014–2020 Volumen II. 2019.
  • 63. Sammut C, Webb GI, editors. TF–IDF BT—Encyclopedia of Machine Learning and Data Mining, Boston, MA: Springer US; 2017, p. 1274. https://doi.org/10.1007/978-1-4899-7687-1_832 .
  • 65. Han J, Kamber M, Pei J. Getting to know your data. In: Han J, Kamber M, Pei JBT-DM (Third E, editors. Morgan Kaufmann Ser. Data Manag. Syst., Boston: Morgan Kaufmann; 2012, p. 39–82. https://doi.org/10.1016/B978-0-12-381479-1.00002-2 .
  • 70. Roediger-Schluga T, Barber MJ. The structure of R&D collaboration networks in the European Framework Programmes. United Nations University—Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT); 2006.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Environ Res Public Health

Logo of ijerph

Online Social Networking and Addiction—A Review of the Psychological Literature

Social Networking Sites (SNSs) are virtual communities where users can create individual public profiles, interact with real-life friends, and meet other people based on shared interests. They are seen as a ‘global consumer phenomenon’ with an exponential rise in usage within the last few years. Anecdotal case study evidence suggests that ‘addiction’ to social networks on the Internet may be a potential mental health problem for some users. However, the contemporary scientific literature addressing the addictive qualities of social networks on the Internet is scarce. Therefore, this literature review is intended to provide empirical and conceptual insight into the emerging phenomenon of addiction to SNSs by: (1) outlining SNS usage patterns, (2) examining motivations for SNS usage, (3) examining personalities of SNS users, (4) examining negative consequences of SNS usage, (5) exploring potential SNS addiction, and (6) exploring SNS addiction specificity and comorbidity. The findings indicate that SNSs are predominantly used for social purposes, mostly related to the maintenance of established offline networks. Moreover, extraverts appear to use social networking sites for social enhancement, whereas introverts use it for social compensation, each of which appears to be related to greater usage, as does low conscientiousness and high narcissism. Negative correlates of SNS usage include the decrease in real life social community participation and academic achievement, as well as relationship problems, each of which may be indicative of potential addiction.

1. Introduction

“I’m an addict. I just get lost in Facebook” replies a young mother when asked why she does not see herself able to help her daughter with her homework. Instead of supporting her child, she spends her time chatting and browsing the social networking site [ 1 ]. This case, while extreme, is suggestive of a potential new mental health problem that emerges as Internet social networks proliferate. Newspaper stories have also reported similar cases, suggesting that the popular press was early to discern the potentially addictive qualities of social networking sites (SNS; i.e. , [ 2 , 3 ]). Such media coverage has alleged that women are at greater risk than men for developing addictions to SNSs [ 4 ].

The mass appeal of social networks on the Internet could potentially be a cause for concern, particularly when attending to the gradually increasing amounts of time people spend online [ 5 ]. On the Internet, people engage in a variety of activities some of which may be potentially to be addictive. Rather than becoming addicted to the medium per se, some users may develop an addiction to specific activities they carry out online [ 6 ]. Specifically, Young [ 7 ] argues that there are five different types of internet addiction, namely computer addiction ( i.e. , computer game addiction), information overload ( i.e. , web surfing addiction), net compulsions ( i.e. , online gambling or online shopping addiction), cybersexual addiction ( i.e. , online pornography or online sex addiction), and cyber-relationship addiction ( i.e. , an addiction to online relationships). SNS addiction appears to fall in the last category since the purpose and main motivation to use SNSs is to establish and maintain both on- and offline relationships (for a more detailed discussion of this please refer to the section on motivations for SNS usage). From a clinical psychologist’s perspective, it may be plausible to speak specifically of ‘ Facebook Addiction Disorder’ (or more generally ‘SNS Addiction Disorder’) because addiction criteria, such as neglect of personal life, mental preoccupation, escapism, mood modifying experiences, tolerance, and concealing the addictive behavior, appear to be present in some people who use SNSs excessively [ 8 ].

Social Networking Sites are virtual communities where users can create individual public profiles, interact with real-life friends, and meet other people based on shared interests. SNSs are “web-based services that allow individuals to: (1) construct a public or semi-public profile within a bounded system, (2) articulate a list of other users with whom they share a connection, and (3) view and traverse their list of connections and those made by others within the system” [ 9 ]. The focus is placed on established networks, rather than on networking, which implies the construction of new networks. SNSs offer individuals the possibilities of networking and sharing media content, therefore embracing the main Web 2.0 attributes [ 10 ], against the framework of their respective structural characteristics.

In terms of SNS history, the first social networking site ( SixDegrees ) was launched in 1997, based on the idea that everybody is linked with everybody else via six degrees of separation [ 9 ], and initially referred to as the “small world problem” [ 11 ]. In 2004, the most successful current SNS, Facebook , was established as a closed virtual community for Harvard students. The site expanded very quickly and Facebook currently has more than 500 million users, of whom fifty percent log on to it every day. Furthermore, the overall time spent on Facebook increased by 566% from 2007 to 2008 [ 12 ]. This statistic alone indicates the exponential appeal of SNSs and also suggests a reason for a rise in potential SNS addiction. Hypothetically, the appeal of SNSs may be traced back to its reflection of today’s individualist culture. Unlike traditional virtual communities that emerged during the 1990s based on shared interests of their members [ 13 ], social networking sites are egocentric sites. It is the individual rather than the community that is the focus of attention [ 9 ].

Egocentrism has been linked to Internet addiction [ 14 ]. Supposedly, the egocentric construction of SNSs may facilitate the engagement in addictive behaviors and may thus serve as a factor that attracts people to using it in a potentially excessive way. This hypothesis is in line with the PACE Framework for the etiology of addiction specificity [ 15 ]. Attraction is one of the four key components that may predispose individuals to becoming addicted to specific behaviors or substances rather than specific others. Accordingly, due to their egocentric construction, SNSs allow individuals to present themselves positively that may “raise their spirits” ( i.e. , enhance their mood state) because it is experienced as pleasurable. This may lead to positive experiences that can potentially cultivate and facilitate learning experiences that drive the development of SNS addiction.

A behavioral addiction such as SNS addiction may thus be seen from a biopsychosocial perspective [ 16 ]. Just like substance-related addictions, SNS addiction incorporates the experience of the ‘classic’ addiction symptoms, namely mood modification ( i.e. , engagement in SNSs leads to a favourable change in emotional states), salience ( i.e. , behavioral, cognitive, and emotional preoccupation with the SNS usage), tolerance ( i.e. , ever increasing use of SNSs over time), withdrawal symptoms ( i.e. , experiencing unpleasant physical and emotional symptoms when SNS use is restricted or stopped), conflict ( i.e. , interpersonal and intrapsychic problems ensue because of SNS usage), and relapse ( i.e. , addicts quickly revert back in their excessive SNS usage after an abstinence period).

Moreover, scholars have suggested that a combination of biological, psychological and social factors contributes to the etiology of addictions [ 16 , 17 ], that may also hold true for SNS addiction. From this it follows that SNS addiction shares a common underlying etiological framework with other substance-related and behavioral addictions. However, due to the fact that the engagement in SNSs is different in terms of the actual expression of (Internet) addiction ( i.e. , pathological use of social networking sites rather than other Internet applications), the phenomenon appears worthy of individual consideration, particularly when considering the potentially detrimental effects of both substance-related and behavioral addictions on individuals who experience a variety of negative consequences because of their addiction [ 18 ].

To date, the scientific literature addressing the addictive qualities of social networks on the Internet is scarce. Therefore, with this literature review, it is intended to provide empirical insight into the emerging phenomenon of Internet social network usage and potential addiction by (1) outlining SNS usage patterns, (2) examining motivations for SNS usage, (3) examining personalities of SNS users, (4) examining negative consequences of SNSs, (5) exploring potential SNS addiction, and (6) exploring SNS addiction specificity and comorbidity.

An extensive literature search was conducted using the academic database Web of Knowledge as well as Google Scholar . The following search terms as well as their derivatives were entered: social network, online network, addiction, compulsive, excessive, use, abuse, motivation, personality, and comorbidity. Studies were included if they: (i) included empirical data, (ii) made reference to usage patterns, (iii) motivations for usage, (iv) personality traits of users, (v) negative consequences of use, (vi) addiction, (vii) and/or comorbidity and specificity. A total of 43 empirical studies were identified from the literature, five of which specifically assessed SNS addiction.

Social networking sites are seen as a ‘global consumer phenomenon’ and, as already noted, have experienced an exponential rise in usage within the last few years [ 12 ]. Of all Internet users, approximately one-third participate in SNSs and ten percent of the total time spent online is spent on SNSs [ 12 ]. In terms of usage, the results of the Parents and Teens 2006 Survey with a random sample of 935 participants in America revealed that 55% of youths used SNSs in that year [ 19 ]. The main reasons reported for this usage were staying in touch with friends (endorsed by 91%), and using them to make new friends (49%). This was more common among boys than girls. Girls preferred to use these sites in order to maintain contacts with actual friends rather than making new ones. Furthermore, half of the teenagers in this sample visited their SNS at least once a day which is indicative of the fact that in order to keep an attractive profile, frequent visits are necessary and this is a factor that facilitates potential excessive use [ 19 ]. Moreover, based on the results of consumer research, the overall usage of SNSs increased by two hours per month to 5.5 hours and active participation increased by 30% from 2009 to 2010 [ 5 ].

The findings of an online survey of 131 psychology students in the US [ 20 ] indicated that 78% used SNSs, and that 82% of males and 75% of females had SNS profiles. Of those, 57% used their SNS on a daily basis. The activities most often engaged in on SNSs were reading/responding to comments on their SNS page and/or posts to one’s wall (endorsed by 60%; the “wall” is a special profile feature in Facebook , where people can post comments, pictures, and links, that can be responded to), sending/responding to messages/invites (14%), and browsing friends’ profiles/walls/pages (13%; [ 20 ]). These results correspond with findings from a different study including another university student sample [ 21 ].

Empirical research has also suggested gender differences in SNS usage patterns. Some studies claim that men tend to have more friends on SNSs than women [ 22 ], whereas others have found the opposite [ 23 ]. In addition, men were found to take more risks with regards to disclosure of personal information [ 24 , 25 ]. Furthermore, one study reported that slightly more females used MySpace specifically ( i.e. , 55% compared to 45% of males) [ 26 ].

Usage of SNSs has also been found to differ with regards to age group. A study comparing 50 teenagers (13–19 years) and the same number of older MySpace users (60 years and above) revealed that teenagers’ friends’ networks were larger and that their friends were more similar to themselves with regards to age [ 23 ]. Furthermore, older users’ networks were smaller and more dispersed age-wise. Additionally, teenagers made more use of MySpace web 2.0 features ( i.e. , sharing video and music, and blogging) relative to older people [ 23 ].

With regards to how people react to using SNSs, a recent study [ 27 ] using psychophysiological measures (skin conductance and facial electromyography) found that social searching ( i.e. , extracting information from friends’ profiles), was more pleasurable than social browsing ( i.e. , passively reading newsfeeds) [ 27 ]. This finding indicates that the goal-directed activity of social searching may activate the appetitive system, which is related to pleasurable experience, relative to the aversive system [ 28 ]. On a neuroanatomical level, the appetitive system has been found to be activated in Internet game overusers and addicts [ 29 , 30 ], which may be linked back to a genetic deficiency in the addicts’ neurochemical reward system [ 31 ]. Therefore, the activation of the appetitive system in social network users who engage in social searching concurs with the activation of that system in people found to suffer from behavioral addictions. In order to establish this link for SNS specifically, further neurobiological research is required.

In reviewing SNS usage patterns, the findings of both consumer research and empirical research indicate that overall, regular SNS use has increased substantially over the last few years. This supports the availability hypothesis that where there is increased access and opportunity to engage in an activity (in this case SNSs), there is an increase in the numbers of people who engage in the activity [ 32 ]. Moreover, it indicates that individuals become progressively aware of this available supply and become more sophisticated with regards to their usage skills. These factors are associated with the pragmatics factor of addiction specificity etiology [ 15 ]. Pragmatics is one of the four key components of the addiction specificity model and it emphasizes access and habituation variables in the development of specific addictions. Therefore, the pragmatics of SNS usage appears to be a factor related to potential SNS addiction.

In addition to this, the findings of the presented studies indicate that compared to the general population, teenagers and students make most use of SNSs by utilizing the inherent Web 2.0 features. Additionally, there appear to be gender differences in usage, the specifics of which are only vaguely defined and thus require further empirical investigation. In addition, SNSs tend to be used mostly for social purposes of which extracting further information from friends’ pages appears particularly pleasurable. This, in turn, may be linked to the activation of the appetitive system, which indicates that engaging in this particular activity may stimulate the neurological pathways known to be related to addiction experience.

3.2. Motivations

Studies suggest that SNS usage in general, and Facebook in particular, differs as a function of motivation ( i.e. , [ 33 ]). Drawing on uses and gratification theory, media are used in a goal-directed way for the purpose of gratification and need satisfaction [ 34 ] which have similarities with addiction. Therefore, it is essential to understand the motivations that underlie SNS usage. Persons with higher social identity ( i.e. , solidarity to and conformity with their own social group), higher altruism (related to both, kin and reciprocal altruism) and higher telepresence ( i.e. , feeling present in the virtual environment) tend to use SNSs because they perceive encouragement for participation from the social network [ 35 ]. Similarly, the results of a survey comprising 170 US university students indicated that social factors were more important motivations for SNS usage than individual factors [ 36 ]. More specifically, these participants’ interdependent self-construal ( i.e. , the endorsement of collectivist cultural values), led to SNS usage that in turn resulted in higher levels of satisfaction, relative to independent self-construal, which refers to the adoption of individualist values. The latter were not related to motivations for using SNSs [ 36 ].

Another study by Barker [ 37 ] presented similar results, and found that collective self-esteem and group identification positively correlated with peer group communication via SNSs. Cheung, Chiu and Lee [ 38 ] assessed social presence ( i.e. , the recognition that other persons share the same virtual realm, the endorsement of group norms, maintaining interpersonal interconnectivity and social enhancement with regards to SNS usage motivations). More specifically, they investigated the We-intention to use Facebook ( i.e. , the decision to continue using a SNS together in the future). The results of their study indicated that We-intention positively correlated with the other variables [ 38 ].

Similarly, social reasons appeared as the most important motives for using SNSs in another study [ 20 ]. The following motivations were endorsed by the participating university student sample: keeping in touch with friends they do not see often (81%), using them because all their friends had accounts (61%), keeping in touch with relatives and family (48%), and making plans with friends they see often (35%). A further study found that a large majority of students used SNSs for the maintenance of offline relationships, whereas some preferred to use this type of Internet application for communication rather than face-to-face interaction [ 39 ].

The particular forms of virtual communication in SNSs include both asynchronous ( i.e. , personal messages sent within the SNS) and synchronous modes ( i.e. , embedded chat functions within the SNS) [ 40 ]. On behalf of the users, these communication modes require learning differential vocabularies, namely Internet language [ 41 , 42 ]. The idiosyncratic form of communication via SNSs is another factor that may fuel potential SNS addiction because communication has been identified as a component of the addiction specificity etiology framework [ 15 ]. Therefore, it can be hypothesized that users who prefer communication via SNSs (as compared to face-to-face communication) are more likely to develop an addiction to using SNSs. However, further empirical research is needed to confirm such a speculation.

Moreover, research suggests that SNSs are used for the formation and maintenance of different forms of social capital [ 43 ]. Social capital is broadly defined as “the sum of the resources, actual or virtual, that accrue to an individual or a group by virtue of possessing a durable network of more or less institutionalized relationships of mutual acquaintance and recognition” [ 44 ]. Putnam [ 45 ] differentiates bridging and bonding social capital from one another. Bridging social capital refers to weak connections between people that are based on information-sharing rather than emotional support. These ties are beneficial in that they offer a wide range of opportunities and access to broad knowledge because of the heterogeneity of the respective network’s members [ 46 ]. Alternatively, bonding social capital indicates strong ties usually between family members and close friends [ 45 ].

SNSs are thought to increase the size of potential networks because of the large number of possible weak social ties among members, which is enabled via the structural characteristics of digital technology [ 47 ]. Therefore, SNSs do not function as communities in the traditional sense. They do not include membership, shared influence, and an equal power allocation. Instead, they can be conceptualized as networked individualism, allowing the establishment of numerous self-perpetuating connections that appear advantageous for users [ 48 ]. This is supported by research that was carried out on a sample of undergraduate students [ 43 ]. More specifically, this study found that maintaining bridging social capital via participation in SNSs appeared to be beneficial for students with regards to potential employment opportunities in addition to sustaining ties with old friends. Overall, the benefits of bridging social capital formed via participation in SNSs appeared to be particularly advantageous for individuals with low-self esteem [ 49 ]. However, the ease of establishing and maintaining bridging social capital may become one of the reasons why people with low self-esteem are drawn to using SNSs in a potentially excessive manner. Lower self-esteem, in turn, has been linked to Internet addiction [ 50 , 51 ].

Furthermore, SNS usage has been found to differ between people and cultures. A recent study [ 52 ] including samples from the US, Korea and China demonstrated that the usage of different Facebook functions was associated with the creation and maintenance of either bridging or bonding social capital. People in the US used the ‘Communication’ function ( i.e. , conversation and opinion sharing) in order to bond with their peers. However, Koreans and Chinese used ‘Expert Search’ ( i.e. , searching for associated professionals online) and ‘Connection’ ( i.e. , maintaining offline relationships) for the formation and sustaining of both bonding and bridging social capital [ 52 ]. These findings indicate that due to cultural differences in SNS usage patterns, it appears necessary to investigate and contrast SNS addiction in different cultures in order to discern both similarities and differences.

Additionally, the results of an online survey with a student convenience sample of 387 participants [ 53 ] indicated that several factors significantly predicted the intention to use SNSs as well as their actual usage. The identified predictive factors were (i) playfulness ( i.e. , enjoyment and pleasure), (ii) the critical mass of the users who endorsed the technology, (iii) trust in the site, (iv) perceived ease of use, and (v) perceived usefulness. Moreover, normative pressure ( i.e. , the expectations of other people with regards to one’s behavior) had a negative relationship with SNS usage. These results suggest that it is particularly the enjoyment associated with SNS use in a hedonic context (which has some similarities to addictions), as well as the recognition that a critical mass uses SNSs that motivates people to make use of those SNSs themselves [ 53 ].

Another study [ 54 ] used a qualitative methodology to investigate why teenagers use SNSs. Interviews were conducted with 16 adolescents aged 13 to 16 years. The results indicated that the sample used SNSs in order to express and actualize their identities either via self-display of personal information (which was true for the younger sample) or via connections (which was true for the older participants). Each of these motivations was found to necessitate a trade-off between potential opportunities for self-expression and risks with regards to compromising privacy on behalf of the teenagers [ 54 ].

A study by Barker [ 37 ] also suggested there may be differences in motivations for SNS use between men and women. Females used SNSs for communication with peer group members, entertainment and passing time, whereas men used it in an instrumental way for social compensation, learning, and social identity gratifications ( i.e. , the possibility to identify with group members who share similar characteristics). Seeking friends, social support, information, and entertainment were found to be the most significant motivations for SNS usage in a sample of 589 undergraduate students [ 55 ]. In addition to this, endorsement of these motivations was found to differ across cultures. Kim et al. [ 55 ] found that Korean college students sought social support from already established relationships via SNSs, whereas American college students looked for entertainment. Similarly, Americans had significantly more online friends than Koreans, suggesting that the development and maintenance of social relationships on SNSs was influenced by cultural artefacts [ 55 ]. Furthermore, technology-relevant motivations were related to SNS use. The competence in using computer-mediated communication ( i.e. , the motivation to, knowledge of, and efficacy in using electronic forms of communication) was found to be significantly associated with spending more time on Facebook and checking one’s wall significantly more often [ 33 ].

Overall, the results of these studies indicate that SNSs are predominantly used for social purposes, mostly related to the maintenance of established offline networks, relative to individual ones. In line with this, people may feel compelled to maintaining their social networks on the Internet which may lead to using SNSs excessively. The maintenance of already established offline networks itself can therefore be seen as an attraction factor, which according to Sussman et al. [ 15 ] is related to the etiology of specific addictions. Furthermore, viewed from a cultural perspective, it appears that motivations for usage differ between members of Asian and Western countries as well as between genders and age groups. However, in general, the results of the reported studies suggest that the manifold ties pursued online are indicative, for the most part, of bridging rather than bonding social capital. This appears to show that SNSs are primarily used as a tool for staying connected.

Staying connected is beneficial to such individuals because it offers them a variety of potential academic and professional opportunities, as well as access to a large knowledge base. As the users’ expectations of connectivity are met through their SNS usage, the potential for developing SNS addiction may increase as a consequence. This is in accordance with the expectation factor that drives the etiology of addiction to a specific behavior [ 15 ]. Accordingly, the supposed expectations and benefits of SNS use may go awry particularly for people with low self-esteem. They may feel encouraged to spend excessive amounts of time on SNSs because they perceive it as advantageous. This, in turn, may potentially develop into an addiction to using SNSs. Clearly, future research is necessary in order to establish this link empirically.

Moreover, there appear certain limitations to the studies presented. Many studies included small convenience samples, teenagers or university students as participants, therefore severely limiting the generalizability of findings. Thus, researchers are advised to take this into consideration and amend their sampling frameworks by using more representative samples and thus improve the external validity of the research.

3.3. Personality

A number of personality traits appear to be associated with the extent of SNS use. The findings of some studies (e.g., [ 33 , 56 ]) indicate that people with large offline social networks, who are more extroverted, and who have higher self-esteem, use Facebook for social enhancement, supporting the principle of ‘the rich get richer’. Correspondingly, the size of people’s online social networks correlates positively with life satisfaction and well-being [ 57 ], but does neither have an effect on the size of the offline network nor on emotional closeness to people in real life networks [ 58 ].

However, people with only a few offline contacts compensate for their introversion, low-self esteem, and low life-satisfaction by using Facebook for online popularity, thus corroborating the principle of ‘the poor get richer’ ( i.e. , the social compensation hypothesis) [ 37 , 43 , 56 , 59 ]. Likewise, people higher in narcissistic personality traits tend to be more active on Facebook and other SNSs in order to present themselves favourably online because the virtual environment empowers them to construct their ideal selves [ 59 – 62 ]. The relationship between narcissism and Facebook activity may be related to the fact that narcissists have an imbalanced sense of self, fluctuating between grandiosity with regards to explicit agency and low self-esteem concerning implicit communion and vulnerability [ 63 , 64 ]. Narcissistic personality, in turn, has been found to be associated with addiction [ 65 ]. This finding will be discussed in more detail in the section on addiction.

Moreover, it appears that people with different personality traits differ in their usage of SNSs [ 66 ] and prefer to use distinct functions of Facebook [ 33 ]. People high in extraversion and openness to experience use SNSs more frequently, with the former being true for mature and the latter for young people [ 66 ]. Furthermore, extraverts and people open to experiences are members of significantly more groups on Facebook, use socializing functions more [ 33 ], and have more Facebook friends than introverts [ 67 ], which delineates the former’s higher sociability in general [ 68 ]. Introverts, on the other hand, disclose more personal information on their pages [ 67 ]. Additionally, it appears that particularly shy people spend large amounts of time on Facebook and have large amounts of friends on this SNS [ 69 ]. Therefore, SNSs may appear beneficial for those whose real-life networks are limited because of the possibility of easy access to peers without the demands of real-life proximity and intimacy. This ease of access entails a higher time commitment for this group, which may possibly result in excessive and/or potentially addictive use.

Likewise, men with neurotic traits use SNSs more frequently than women with neurotic traits [ 66 ]. Furthermore, neurotics (in general) tend to use Facebook’s wall function, where they can receive and post comments, whereas people with low neuroticism scores prefer posting photos [ 33 ]. This may be due to the neurotic individual’s greater control over emotional content with regards to text-based posts rather than visual displays [ 33 ]. However, another study [ 67 ] found the opposite, namely that people scoring high on neuroticism were more inclined to post their photographs on their page. In general, the findings for neuroticism imply that those scoring high on this trait disclose information because they seek self-assurance online, whereas those scoring low are emotionally secure and thus share information in order to express themselves [ 67 ]. High self-disclosure on SNSs, in turn, was found to positively correlate with measures of subjective well-being [ 57 ]. It remains questionable whether this implies that low self-disclosure on SNSs may be related to higher risk for potential addiction. By disclosing more personal information on their pages, users put themselves at risk for negative feedback, which has been linked to lower well-being [ 70 ]. Therefore, the association between self-disclosure on SNSs and addiction needs to be addressed empirically in future studies.

With regards to agreeableness, it was found that females scoring high on this trait upload significantly more pictures than females scoring low, with the opposite being true for males [ 67 ]. In addition to this, people with high conscientiousness were found to have significantly more friends and to upload significantly less pictures than those scoring low on this personality trait [ 67 ]. An explanation for this finding may be that conscientious people tend to cultivate their online and offline contacts more without the necessity to share too much personal information publicly.

Overall, the results of these studies suggest that extraverts use SNSs for social enhancement, whereas introverts use it for social compensation, each of which appears to be related to greater SNS usage. With regards to addiction, both groups could potentially develop addictive tendencies for different reasons, namely social enhancement and social compensation. In addition, the dissimilar findings of studies with regards to the number of friends introverts have online deserve closer scrutiny in future research. The same applies for the results with regards to neuroticism. On the one hand, neurotics use SNSs frequently. On the other hand, studies indicate different usage preferences for people who score high on neuroticism, which calls for further investigation. Furthermore, the structural characteristics of these Internet applications, ( i.e. , their egocentric construction) appear to allow favourable self-disclosure, which draws narcissists to use it. Finally, agreeableness and conscientiousness appear to be related to the extent of SNS usage. Higher usage associated with narcissistic, neurotic, extravert and introvert personality characteristics may implicate that each of these groups is particularly at risk for developing an addiction to using SNSs.

3.4. Negative Correlates

Some studies have highlighted a number of potential negative correlates of extensive SNS usage. For instance, the results of an online survey of 184 Internet users indicated that people who use SNS more in terms of time spent on usage were perceived to be less involved with their real life communities [ 71 ]. This is similar to the finding that people who do not feel secure about their real-life connections to peers and thus have a negative social identity tend to use SNSs more in order to compensate for this [ 37 ]. Moreover, it seems that the nature of the feedback from peers that is received on a person’s SNS profile determines the effects of SNS usage on wellbeing and self-esteem.

More specifically, Dutch adolescents aged 10 to 19 years who received predominantly negative feedback had low self-esteem which in turn led to low wellbeing [ 70 ]. Given that people tend to be disinhibited when they are online [ 72 ], giving and receiving negative feedback may be more common on the Internet than in real life. This may entail negative consequences particularly for people with low self-esteem who tend to use SNSs as compensation for real-life social network paucity because they are dependent upon the feedback they receive via these sites [ 43 ]. Therefore, potentially, people with lower self-esteem are a population at risk for developing an addiction to using SNSs.

According to a more recent study assessing the relationships between Facebook usage and academic performance in a sample of 219 university students [ 73 ], Facebook users had lower Grade Point Averages and spent less time studying than students who did not use this SNS. Of the 26% of students reporting an impact of their usage on their lives, three-quarters (74%) claimed that it had a negative impact, namely procrastination, distraction, and poor time-management. A potential explanation for this may be that students who used the Internet to study may have been distracted by simultaneous engagement in SNSs, implying that this form of multitasking is detrimental to academic achievement [ 73 ].

In addition to this, it appears that the usage of Facebook may in some circumstances have negative consequences for romantic relationships. The disclosure of rich private information on one’s Facebook page including status updates, comments, pictures, and new friends, can result in jealous cyberstalking [ 74 ], including interpersonal electronic surveillance (IES; [ 75 ]) by one’s partner. This was reported to lead to jealousy [ 76 , 77 ] and, in the most extreme cases, divorce and associated legal action [ 78 ].

These few existent studies highlight that in some circumstances, SNS usage can lead to a variety of negative consequences that imply a potential decrease in involvement in real-life communities and worse academic performance, as well as relationship problems. Reducing and jeopardizing academic, social and recreational activities are considered as criteria for substance dependence [ 18 ] and may thus be considered as valid criteria for behavioral addictions [ 79 ], such as SNS addiction. In light of this, endorsing these criteria appears to put people at risk for developing addiction and the scientific research base outlined in the preceding paragraphs supports the potentially addictive quality of SNSs.

Notwithstanding these findings, due to the lack of longitudinal designs used in the presented studies, no causal inferences can be drawn with regards to whether the excessive use of SNSs is the causal factor for the reported negative consequences. Moreover, potential confounders need to be taken into consideration. For instance, the aspect of university students’ multi-tasking when studying appears to be an important factor related to poor academic achievement. Moreover, pre-existent relationship difficulties in the case of romantic partners may potentially be exacerbated by SNS use, whereas the latter does not necessarily have to be the primary driving force behind the ensuing problems. Nevertheless, the findings support the idea that SNSs are used by some people in order to cope with negative life events. Coping, in turn, has been found to be associated with both substance dependence and behavioral addictions [ 80 ]. Therefore, it appears valid to claim that there is a link between dysfunctional coping ( i.e. , escapism and avoidance) and excessive SNS use/addiction. In order to substantiate this conjecture and to more fully investigate the potential negative correlates associated with SNS usage, further research is needed.

3.5. Addiction

Researchers have suggested that the excessive use of new technologies (and especially online social networking) may be particularly addictive to young people [ 81 ]. In accordance with the biopsychosocial framework for the etiology of addictions [ 16 ] and the syndrome model of addiction [ 17 ], it is claimed that those people addicted to using SNSs experience symptoms similar to those experienced by those who suffer from addictions to substances or other behaviors [ 81 ]. This has significant implications for clinical practice because unlike other addictions, the goal of SNS addiction treatment cannot be total abstinence from using the Internet per se since the latter is an integral element of today’s professional and leisure culture. Instead, the ultimate therapy aim is controlled use of the Internet and its respective functions, particularly social networking applications, and relapse prevention using strategies developed within cognitive-behavioral therapies [ 81 ].

In addition to this, scholars have hypothesized that young vulnerable people with narcissistic tendencies are particularly prone to engaging with SNSs in an addictive way [ 65 ]. To date, only three empirical studies have been conducted and published in peer-reviewed journals that have specifically assessed the addictive potential of SNSs [ 82 – 84 ]. In addition to this, two publicly available Master’s theses have analyzed the SNS addiction and will be presented subsequently for the purpose of inclusiveness and the relative lack of data on the topic [ 85 , 86 ]. In the first study [ 83 ], 233 undergraduate university students (64% females, mean age = 19 years, SD = 2 years) were surveyed using a prospective design in order to predict high level use intentions and actual high-level usage of SNSs via an extended model of the theory of planned behavior (TPB; [ 87 ]). High-level usage was defined as using SNSs at least four times per day. TPB variables included measures of intention for usage, attitude, subjective norm, and perceived behavioral control (PBC). Furthermore, self-identity (adapted from [ 88 ]), belongingness [ 89 ], as well as past and potential future usage of SNSs were investigated. Finally, addictive tendencies were assessed using eight questions scored on Likert scales (based on [ 90 ]).

One week after completion of the first questionnaire, participants were asked to indicate on how many days during the last week they had visited SNSs at least four times a day. The results of this study indicated that past behavior, subjective norm, attitude, and self-identity significantly predicted both behavioral intention as well as actual behavior. Additionally, addictive tendencies with regards to SNS use were significantly predicted by self-identity and belongingness [ 83 ]. Therefore, those who identified themselves as SNS users and those who looked for a sense of belongingness on SNSs appeared to be at risk for developing an addiction to SNSs.

In the second study [ 82 ], an Australian university student sample of 201 participants (76% female, mean age = 19, SD = 2) was drawn upon in order to assess personality factors via the short version of the NEO Personality Inventory (NEO-FFI; [ 91 ]), the Self-Esteem Inventory (SEI; [ 92 ]), time spent using SNSs, and an Addictive Tendencies Scale (based on [ 90 , 93 ]). The Addictive Tendencies Scale included three items measuring salience, loss of control, and withdrawal. The results of a multiple regression analysis indicated that high extraversion and low conscientiousness scores significantly predicted both addictive tendencies and the time spent using an SNS. The researchers suggested that the relationship between extraversion and addictive tendencies could be explained by the fact that using SNSs satisfies the extraverts’ need to socialize [ 82 ]. The findings with regards to lack of conscientiousness appear to be in line with previous research on the frequency of general Internet use in that people who score low on conscientiousness tend to use the Internet more frequently than those who score high on this personality trait [ 94 ].

In the third study, Karaiskos et al. [ 84 ] report the case of a 24-year old female who used SNS to such an extent that her behavior significantly interfered with her professional and private life. As a consequence, she was referred to a psychiatric clinic. She used Facebook excessively for at least five hours a day and was dismissed from her job because she continuously checked her SNS instead of working. Even during the clinical interview, she used her mobile phone to access Facebook . In addition to excessive use that led to significant impairment in a variety of areas in the woman’s life, she developed anxiety symptoms as well as insomnia, which suggestively points to the clinical relevance of SNS addiction. Such extreme cases have led to some researchers to conceptualize SNS addiction as Internet spectrum addiction disorder [ 84 ]. This indicates that first, SNS addiction can be classified within the larger framework of Internet addictions, and second, that it is a specific Internet addiction, alongside other addictive Internet applications such as Internet gaming addiction [ 95 ], Internet gambling addiction [ 96 ], and Internet sex addiction [ 97 ].

In the fourth study [ 85 ], SNS game addiction was assessed via the Internet Addiction Test [ 98 ] using 342 Chinese college students aged 18 to 22 years. In this study, SNS game addiction referred specifically to being addicted to the SNS game Happy Farm . Students were defined as addicted to using this SNS game when they endorsed a minimum of five out of eight total items of the IAT. Using this cut-off, 24% of the sample were identified as addicted [ 85 ].

Moreover, the author investigated gratifications of SNS game use, loneliness [ 99 ], leisure boredom [ 100 ], and self esteem [ 101 ]. The findings indicated that there was a weak positive correlation between loneliness and SNS game addiction and a moderate positive correlation between leisure boredom and SNS game addiction. Moreover, the gratifications “inclusion” (in a social group) and “achievement” (in game), leisure boredom, and male gender significantly predicted SNS game addiction [ 85 ].

In the fifth study [ 86 ], SNS addiction was assessed in a sample of 335 Chinese college students aged 19 to 28 years using Young’s Internet Addiction Test [ 98 ] modified to specifically assess the addiction to a common Chinese SNS, namely Xiaonei.com . Users were classified as addicted when they endorsed five or more of the eight addiction items specified in the IAT. Moreover, the author assessed loneliness [ 99 ], user gratifications (based on the results of a previous focus group interview), usage attributes and patterns of SNS website use [ 86 ].

The results indicated that of the total sample, 34% were classified as addicted. Moreover, loneliness significantly and positively correlated with frequency and session length of using Xiaonei.com as well as SNS addiction. Likewise, social activities and relationship building were found to predict SNS addiction [ 86 ].

Unfortunately, when viewed from a critical perspective, the quantitative studies reviewed here suffer from a variety of limitations. Initially, the mere assessment of addiction tendencies does not suffice to demarcate real pathology. In addition, the samples were small, specific, and skewed with regards to female gender. This may have led to the very high addiction prevalence rates (up to 34%) reported [ 86 ]. Clearly, it needs to be ensured that rather than assessing excessive use and/or preoccupation, addiction specifically needs to be assessed.

Wilson et al. ’s study [ 82 ] suffered from endorsing only three potential addiction criteria which is not sufficient for establishing addiction status clinically. Similarly, significant impairment and negative consequences that discriminate addiction from mere abuse [ 18 ] were not assessed in this study at all. Thus, future studies have great potential in addressing the emergent phenomenon of addiction to using social networks on the Internet by means of applying better methodological designs, including more representative samples, and using more reliable and valid addiction scales so that current gaps in empirical knowledge can be filled.

Furthermore, research must address the presence of specific addiction symptoms beyond negative consequences. These might be adapted from the DSM-IV TR criteria for substance dependence [ 18 ] and the ICD-10 criteria for a dependence syndrome [ 102 ], including (i) tolerance, (ii) withdrawal, (iii) increased use, (iv) loss of control, (v) extended recovery periods, (vi) sacrificing social, occupational and recreational activities, and (vii) continued use despite of negative consequences. These have been found to be adequate criteria for diagnosing behavioral addictions [ 79 ] and thus appear sufficient to be applied to SNS addiction. In order to be diagnosed with SNS addiction, at least three (but preferably more) of the above mentioned criteria should be met in the same 12-month period and they must cause significant impairment to the individual [ 18 ].

In light of this qualitative case study, it appears that from a clinical perspective, SNS addiction is a mental health problem that may require professional treatment. Unlike the quantitative studies, the case study emphasizes the significant individual impairment that is experienced by individuals that spans a variety of life domains, including their professional life as well as their psychosomatic condition. Future researchers are therefore advised to not only investigate SNS addiction in a quantitative way, but to further our understanding of this new mental health problem by analyzing cases of individuals who suffer from excessive SNS usage.

3.6. Specificity and Comorbidity

It appears essential to pay adequate attention to (i) the specificity of SNS addiction and (ii) potential comorbidity. Hall et al. [ 103 ] outline three reasons why it is necessary to address comorbidity between mental disorders, such as addictions. First, a large number of mental disorders feature additional (sub)clinical problems/disorders. Second, comorbid conditions must be addressed in clinical practice in order to improve treatment outcomes. Third, specific prevention programs may be developed which incorporate different dimensions and treatment modalities that particularly target associated mental health problems. From this it follows that assessing the specificity and potential comorbidities of SNS addiction is important. However, to date, research addressing this topic is virtually non-existent. There has been almost no research on the co-occurrence of SNS addiction with other types of addictive behavior, mainly because there have been so few studies examining SNS addiction as highlighted in the previous section. However, based on the small empirical base, there are a number of speculative assumptions that can be made about co-addiction co-morbidity in relation to SNS addiction.

Firstly, for some individuals, their SNS addiction takes up such a large amount of available time that it is highly unlikely that it would co-occur with other behavioral addictions unless the other behavioral addiction(s) can find an outlet via social networking sites (e.g., gambling addiction, gaming addiction). Put simply, there would be little face validity in the same individual being, for example, both a workaholic and a social networking addict, or an exercise addict and a social networking addict, mainly because the amount of daily time available to engage in two behavioral addictions simultaneously would be highly unlikely. Still, it is necessary to pinpoint the respective addictive behaviors because some of these behaviors may in fact co-occur. In one study that included a clinical sample diagnosed with substance dependencies, Malat and colleagues [ 104 ] found that 61% pursued at least one and 31% engaged in two or more problematic behaviors, such as overeating, unhealthy relationships and excessive Internet use. Therefore, although a simultaneous addiction to behaviors such as working and using SNS is relatively unlikely, SNS addiction may potentially co-occur with overeating and other excessive sedentary behaviors.

Thus, secondly, it is theoretically possible for a social networking addict to have an additional drug addiction, as it is perfectly feasible to engage in both a behavioral and chemical addiction simultaneously [ 16 ]. It may also make sense from a motivational perspective. For instance, if one of the primary reasons social network addicts are engaging in the behavior is because of their low self-esteem, it makes intuitive sense that some chemical addictions may serve the same purpose. Accordingly, studies suggest that the engagement in addictive behaviors is relatively common among persons who suffer from substance dependence. In one study, Black et al. [ 105 ] found that 38% of problematic computer users in their sample had a substance use disorder in addition to their behavioral problems/addiction. Apparently, research indicates that some persons who suffer from Internet addiction experience other addictions at the same time.

Of a patient sample including 1,826 individuals treated for substance addictions (mainly cannabis addiction), 4.1% were found to suffer from Internet addiction [ 106 ]. Moreover, the findings of further research [ 107 ] indicated that Internet addiction and substance use experience in adolescents share common family factors, namely higher parent-adolescent conflict, habitual alcohol use of siblings, perceived parents’ positive attitude to adolescent substance use, and lower family functioning. Moreover, Lam et al. [ 108 ] assessed Internet addiction and associated factors in a sample of 1,392 adolescents aged 13–18 years. In terms of potential comorbidity, they found that drinking behavior was a risk factor for being diagnosed with Internet addiction using the Internet Addiction Test [ 109 ]. This implies that potentially, alcohol abuse/dependence can be associated with SNS addiction. Support for this comes from Kuntsche et al. [ 110 ]. They found that in Swiss adolescents, the expectancy of social approval was associated with problem drinking. Since SNSs are inherently social platforms that are used by people for social purposes, it appears reasonable to deduce that there may indeed be people who suffer from comorbid addictions, namely SNS addiction and alcohol dependence.

Thirdly, it appears that there may be a relationship between SNS addiction specificity and personality traits. Ko et al. [ 111 ] found that Internet addiction (IA) was predicted by high novelty seeking (NS), high harm avoidance (HA), and low reward dependence (RD) in adolescents. Those adolescents who were addicted to the Internet and who had experience of substance use scored significantly higher on NS and lower on HA than the IA group. Therefore, it appears that HA particularly impacts Internet addiction specificity because high HA discriminates Internet addicts from individuals who are not only addicted to the Internet, but who use substances. Therefore, it seems plausible to hypothesize that persons with low harm avoidance are in danger of developing comorbid addictions to SNSs and substances. Accordingly, research needs to address this difference specifically for those who are addicted to using SNSs in order to demarcate this potential disorder from comorbid conditions.

In addition to this, it seems reasonable to specifically address the respective activities people can engage in on their SNS. There have already been a number of researchers who have begun to examine the possible relationship between social networking and gambling [ 112 – 116 ], and social networking and gaming [ 113 , 116 , 117 ]. All of these writings have noted how the social networking medium can be used for gambling and/or gaming. For instance, online poker applications and online poker groups on social networking sites are among the most popular [ 115 ], and others have noted the press reports surrounding addiction to social networking games such as Farmville [ 117 ]. Although there have been no empirical studies to date examining addiction to gambling or gaming via social networking, there is no reason to suspect that those playing in the social networking medium are any less likely than those playing other online or offline media to become addicted to gambling and/or gaming.

Synoptically, addressing the specificity of SNS addiction and comorbidities with other addictions is necessary for (i) comprehending this disorder as distinct mental health problem while (ii) paying respect to associated conditions, which will (iii) aid treatment and (iv) prevention efforts. From the reported studies, it appears that the individual’s upbringing and psychosocial context are influential factors with regards to potential comorbidity between Internet addiction and substance dependence, which is supported by scientific models of addictions and their etiology [ 16 , 17 ]. Moreover, alcohol and cannabis dependence were outlined as potential co-occurring problems. Nonetheless, apart from this, the presented studies do not specifically address the discrete relationships between particular substance dependencies and individual addictive behaviors, such as addiction to using SNSs. Therefore, future empirical research is needed in order to shed more light upon SNS addiction specificity and comorbidity.

4. Discussion and Conclusions

The aim of this literature review was to present an overview of the emergent empirical research relating to usage of and addiction to social networks on the Internet. Initially, SNSs were defined as virtual communities offering their members the possibility to make use of their inherent Web 2.0 features, namely networking and sharing media content. The history of SNSs dates back to the late 1990s, suggesting that they are not as new as they may appear in the first place. With the emergence of SNSs such as Facebook , overall SNS usage has accelerated in such a way that they are considered a global consumer phenomenon. Today, more than 500 million users are active participants in the Facebook community alone and studies suggest that between 55% and 82% of teenagers and young adults use SNSs on a regular basis. Extracting information from peers’ SNS pages is an activity that is experienced as especially enjoyable and it has been linked with the activation of the appetitive system, which in turn is related to addiction experience.

In terms of sociodemographics, the studies presented indicate that overall, SNS usage patterns differ. Females appear to use SNS in order to communicate with members of their peer group, whereas males appear to use them for the purposes of social compensation, learning, and social identity gratifications [ 37 ]. Furthermore, men tend to disclose more personal information on SNS sites relative to women [ 25 , 118 ]. Also, more women were found to use MySpace specifically relative to men [ 26 ]. Moreover, usage patterns were found to differ between genders as a function of personality. Unlike women with neurotic traits, men with neurotic traits were found to be more frequent SNS users [ 66 ]. In addition to this, it was found that males were more likely to be addicted to SNS games specifically relative to females [ 85 ]. This is in line with the finding that males in general are a population at risk for developing an addiction to playing online games [ 95 ].

The only study that assessed age differences in usage [ 23 ] indicated that the latter in fact varies as a function of age. Specifically, “silver surfers” ( i.e. , those over the age of 60 years) have a smaller circle of online friends that differs in age relative to younger SNS users. Based on the current empirical knowledge that has predominantly assessed young teenage and student samples, it appears unclear whether older people use SNSs excessively and whether they potentially become addicted to using them. Therefore, future research must aim at filling this gap in knowledge.

Next, the motivations for using SNSs were reviewed on the basis of needs and gratifications theory. In general, research suggests that SNSs are used for social purposes. Overall, the maintenance of connections to offline network members was emphasized rather than the establishment of new ties. With regards to this, SNS users sustain bridging social capital through a variety of heterogeneous connections to other SNS users. This appeared to be beneficial for them with regards to sharing knowledge and potential future possibilities related to employment and related areas. In effect, the knowledge that is available to individuals via their social network can be thought of as “collective intelligence” [ 119 ].

Collective intelligence extends the mere idea of shared knowledge because it is not restricted to knowledge shared by all members of a particular community. Instead, it denotes the aggregation of each individual member’s knowledge that can be accessed by other members of the respective community. In this regard, the pursuit of weak ties on SNSs is of great benefit and thus coincides with the satisfaction of the members’ needs. At the same time, it is experienced as gratifying. Therefore, rather than seeking emotional support, individuals make use of SNSs in order to communicate and stay in touch not only with family and friends, but also with more distant acquaintances, therefore sustaining weak ties with potentially advantageous environments. The benefits of large online social networks may potentially lead people to excessively engage in using them, which, in turn, may purport addictive behaviors.

As regards personality psychology, certain personality traits were found to be associated with higher usage frequency that may be associated with potential abuse and/or addiction. Of those, extraversion and introversion stand out because each of these is related to more habitual participation in social networks on the Internet. However, the motivations of extraverts and introverts differ in that extraverts enhance their social networks, whereas introverts compensate for the lack of real life social networks. Presumably, the motivations for higher SNS usage of people who are agreeable and conscientious may be related to those shared by extraverts, indicating a need for staying connected and socializing with their communities. Nevertheless, of those, high extraversion was found to be related to potential addiction to using SNS, in accordance with low conscientiousness [ 82 ].

The dissimilar motivations for usage found for members scoring high on the respective personality trait can inform future research into potential addiction to SNSs. Hypothetically, people who compensate for scarce ties with their real life communities may be at greater risk to develop addiction. In effect, in one study, addictive SNS usage was predicted by looking for a sense of belongingness in this community [ 83 ], which supports this conjecture. Presumably, the same may hold true for people who score high on neuroticism and narcissism, assuming that members of both groups tend to have low self-esteem. This supposition is informed by research indicating that people use the Internet excessively in order to cope with everyday stressors [ 120 , 121 ]. This may serve as a preliminary explanation for the findings regarding the negative correlates that were found to be associated with more frequent SNS usage.

Overall, the engagement in particular activities on SNSs, such as social searching, and the personality traits that were found to be associated with greater extents of SNS usage may serve as an anchor point for future studies in terms of defining populations who are at risk for developing addiction to using social networks on the Internet. Furthermore, it is recommended that researchers assess factors that are specific to SNS addiction, including the pragmatics, attraction, communication and expectations of SNS use because these may predict the etiology of SNS addiction as based on the addiction specificity etiology framework [ 15 ]. Due to the scarcity of research in this domain with a specific focus on SNS addiction specificity and comorbidity, further empirical research is necessary. Moreover, researchers are encouraged to pay close attention to the different motivations of introverts and extraverts because each of those appears to be related to higher usage frequency. What is more, investigating the relationship of potential addiction with narcissism seems to be a fruitful area for empirical research. In addition to this, motivations for usage as well as a wider variety of negative correlates related to excessive SNS use need to be addressed.

In addition to the above mentioned implications and suggestions for future research, specific attention needs to be paid to selecting larger samples which are representative of a broader population in order to increase the respective study’s external validity. The generalizability of results is essential in order to demarcate populations at risk for developing addiction to SNSs. Similarly, it appears necessary to conduct further psychophysiological studies in order to assess the phenomenon from a biological perspective. Furthermore, clear-cut and validated addiction criteria need to be assessed. It is insufficient to limit studies into addiction to assessing just a few criteria. The demarcation of pathology from high frequency and problematic usage necessitates adopting frameworks that have been established by the international classification manuals [ 18 , 102 ]. Moreover, in light of clinical evidence and practice, it appears essential to pay attention to the significant impairment that SNS addicts experience in a variety of life domains as a consequence of their abusive and/or addictive behaviors.

Similarly, the results of data based on self-reports are not sufficient for diagnosis because research suggests that they may be inaccurate [ 122 ]. Conceivably, self-reports may be supplemented with structured clinical interviews [ 123 ] and further case study evidence as well as supplementary reports from the users’ significant others. In conclusion, social networks on the Internet are iridescent Web 2.0 phenomena that offer the potential to become part of, and make use of, collective intelligence. However, the latent mental health consequences of excessive and addictive use are yet to be explored using the most rigorous scientific methods.

  • All Solutions
  • Audience measurement
  • Media planning
  • Marketing optimization
  • Content metadata
  • Nielsen One
  • All Insights
  • Case Studies
  • Perspectives
  • Data Center
  • The Gauge TM – U.S.
  • Top 10 – U.S.
  • Top Trends – Denmark
  • Top Trends – Germany
  • Women’s World Cup
  • Men’s World Cup
  • Big Data + Panel
  • News Center

Client Login

2024 Annual Marketing Report

2024 Annual Marketing Report

Discover how global marketers are allocating budgets, maximizing ROI and what these trends mean for your own impact…

Need to Know: The basic of TV media buying

Are you investing in performance marketing for the right reasons.

A look at how CTV reach and viewership trends shift across generations

A look at how CTV reach and viewership trends shift across generations

See how different CTV trends vary across Gen Z, millennials and baby boomer generations.

Influencer marketing: The obvious approach

Influencer marketing: The obvious approach

Working with Nielsen’s Brand Impact solution has been a valuable partnership for Obviously,” says Heather at…

‘Data driven’ is no longer enough for your ROI strategy

ROI strategies hinge on capturing the right data at every stage of the customer journey. Just because data is easy to…

Reaching Asian American Audiences 2024

Understanding the media preferences of Asian American, Native Hawaiian and Pacific Islanders is critical to resonating in…

Reaching Asian American Audiences 2024

Featured reports

Metadata matters: Powering future FAST channel success

Metadata matters: Powering future FAST channel success

This guide will help FAST channels prepare for the future, when search and discovery features within individual services…

Explore all insights

where can i find empirical research articles

Find the right solution for your business

In an ever-changing world, we’re here to help you stay ahead of what’s to come with the tools to measure, connect with, and engage your audiences.

How can we help?

IMAGES

  1. Empirical Research Articles

    where can i find empirical research articles

  2. Definition, Types and Examples of Empirical Research

    where can i find empirical research articles

  3. 15 Empirical Evidence Examples (2024)

    where can i find empirical research articles

  4. How do I know if a research article is empirical?

    where can i find empirical research articles

  5. Empirical Research

    where can i find empirical research articles

  6. Empirical Research

    where can i find empirical research articles

VIDEO

  1. How to Calculate Empirical Formula|Super Trick|#Shorts

  2. How to Quickly Find Scholarly Articles for your RESEARCH PAPER/ESSAY

  3. Searching GALILEO's EBSCOhost databases for empirical research articles

  4. How to Find Empirical Articles in the Library Databases

  5. Searching GALILEO's ProQuest Databases for Empirical Research Articles

  6. First Year Doctoral Students

COMMENTS

  1. Google Scholar

    Find articles. with all of the words. with the exact phrase. with at least one of the words. without the words. where my words occur. anywhere in the article. in the title of the article. Return articles authored by. e.g., "PJ Hayes" or McCarthy. Return articles published in. e.g., J Biol Chem or Nature.

  2. Finding Empirical Articles

    You can use similar strateg to find empirical articles. You may also add specific statistical terms to your search, such as chi, t-test, p-value, or standard deviation. Try searching with terms used in the scientific method: method, results, discussion, or conclusion.

  3. Identifying Empirical Articles

    Identifying Empirical Research Articles. Look for the IMRaD layout in the article to help identify empirical research.Sometimes the sections will be labeled differently, but the content will be similar. Introduction: why the article was written, research question or questions, hypothesis, literature review; Methods: the overall research design and implementation, description of sample ...

  4. Empirical Research in the Social Sciences and Education

    Empirical research is published in books and in scholarly, peer-reviewed journals. However, most library databases do not offer straightforward ways to locate empirical research. ... There are 2 ways to find empirical articles in PubMed (NIH version): One technique is to limit your search results after you perform a search: Type in your ...

  5. Identify Empirical Articles

    Empirical articles will include charts, graphs, or statistical analysis. Empirical research articles are usually substantial, maybe from 8-30 pages long. There is always a bibliography found at the end of the article. Type of publications that publish empirical studies: Empirical research articles are published in scholarly or academic journals.

  6. Searching for Empirical Research

    Finding Peer-Reviewed Articles. You can find peer-reviewed articles in a general web search along with a lot of other types of sources. However, these specialized tools are more likely to find peer-reviewed articles: ... Note: empirical research articles will have a literature review section as part of the Introduction, but in an empirical ...

  7. Searching for Empirical Research Articles

    Finding Empirical Research. When searching for empirical research, it can be helpful to use terms that relate to the method used in empirical research in addition to keywords that describe your topic. For example: (generalized anxiety AND treatment*) AND (randomized clinical trial* OR clinical trial*)

  8. LibGuides: Psychology: Find Empirical Research Articles

    The method for finding empirical research articles varies depending upon the database* being used. 1. The PsycARTICLES and PsycInfo databases (both from the APA) includes a Methodology filter that can be used to identify empirical studies. Look for the filter on the Advanced Search screen. To see a list and description of all of the of ...

  9. Experimental (Empirical) Research Articles

    Many of the recommended databases in this research guide contain scholarly experimental articles (also known as empirical articles or research studies or primary research). Search in databases like: ... it's really easy to find experimental/empirical articles, once you know what you're looking for. Just in case, though, here is a shortcut that ...

  10. How to tell and find quality articles

    1) Look through the results, scan the abstract for clues to recognize it's empirical. 2) Try searching with words that describe types of empirical studies (list not exhaustive): empirical OR qualitative OR quantitative OR "action research" OR "case study" OR "controlled trial" OR "focus group" 3) Enter other terms you'd expect to see in an ...

  11. Empirical Research

    This book introduces readers to methods and strategies for research and provides them with enough knowledge to become discerning, confident consumers of research in writing. Topics covered include: library research, empirical methodology, quantitative research, experimental research, surveys, focus groups, ethnographies, and much more.

  12. Empirical Research: Defining, Identifying, & Finding

    Once you know the characteristics of empirical research, the next question is how to find those characteristics when reading a scholarly, peer-reviewed journal article.Knowing the basic structure of an article will help you identify those characteristics quickly. The IMRaD Layout. Many scholarly, peer-reviewed journal articles, especially empirical articles, are structured according to the ...

  13. LibGuides: Empirical Research: Search for Empirical Articles

    Limit search by source type ("Academic Journals" or "Scholarly Journals") Limit search by document type (such as "study", "comparative study", or "case study") Some databases have specific content codes assigned to empirical research articles that can be searched - for example: cc (9130) in ProQuest Sociology.

  14. Subject Guides: Identify Empirical Research Articles: Home

    An empirical research article is an article which reports research based on actual observations or experiments. The research may use quantitative research methods, which generate numerical data and seek to establish causal relationships between two or more variables. (1) Empirical research articles may use qualitative research methods, which ...

  15. APA PsycArticles

    The citation footprint of APA's journals (PDF, 91KB) is more than double our article output, demonstrating our commitment and focus on editorial excellence.Research published in APA PsycArticles provides global, diverse perspectives on the field of psychology. The database is updated bi-weekly, ensuring your patrons are connected to articles revealing the latest psychological findings.

  16. How do I find an empirical research article?

    Empirical research articles are also known as experimental or primary research articles. Empirical articles are written by scientists reporting on an experiment or similar research that they conducted. You'll find empirical articles in scholarly journals (also known as academic or peer-reviewed journals) within the library databases.

  17. How do I know if a research article is empirical?

    Empirical research articles are considered original, primary research. In these types of articles, readers will generally find the following sections organized by IMRaD format (Introduction, Method, Results, and Discussion). (I)ntroduction: Includes the research hypotheses and the literature review (current research on or related to the topic).

  18. Empirical Research in the Social Sciences and Education

    Another hint: some scholarly journals use a specific layout, called the "IMRaD" format, to communicate empirical research findings. Such articles typically have 4 components: Introduction : sometimes called "literature review" -- what is currently known about the topic -- usually includes a theoretical framework and/or discussion of previous ...

  19. Research Guides: *Education: Find Empirical Studies

    An empirical study reports the findings from a study that uses data derived from an actual experiment or observation. Key components of an empirical study: Abstract - Provides a brief overview of the research.; Introduction - The introduction contextualizes the research by providing a review of previous research on the topic.It also is the section where the hypothesis is stated.

  20. How do I find empirical articles?

    ERIC. Enter your search terms in the search box at top of the screen. Scroll down the screen and locate the Publication Types dropdown box. Select Numerical/Quantitative Data or Reports-Research. 3. Academic Search Complete. Select Advanced Search. In the first line of the search box, enter your search terms.

  21. Free APA Journal Articles

    Recently published articles from subdisciplines of psychology covered by more than 90 APA Journals™ publications. For additional free resources (such as article summaries, podcasts, and more), please visit the Highlights in Psychological Research page.

  22. How to Find Articles Based on Experimental Research/FCS

    Another hint: some scholarly journals use a specific layout, called the "IMRaD" format, to communicate empirical research findings. Such articles typically have 4 components: Introduction : sometimes called "literature review" -- what is currently known about the topic -- usually includes a theoretical framework and/or discussion of previous ...

  23. Empirical modeling

    This Collection welcomes original research on developing more adaptable, interpretable, and predictive approaches via the integration of advanced statistical methods, machine learning algorithms ...

  24. Position: Why We Must Rethink Empirical Research in Machine Learning

    It is important to emphasize that there have been encouraging first steps in terms of empirical ML research recently. This includes the newly created publication formats Transactions on Machine Learning Research (TMLR), Journal of Data-centric Machine Learning Research (DMLR), or the NeurIPS Datasets and Benchmarks Track launched in 2021. These venues explicitly include in their scope, e.g ...

  25. Does repetition equal more of the same? tie strength and thematic

    1. Introduction. The literature on inter-organizational networks has demonstrated organizations' proclivity to repeat research interactions with prior partners, resulting in stronger ties and a reinforcement of existing network structures [1,2].Empirical studies have documented this type of organizational inertia in partner selection within both national and international R&D networks [3-7].

  26. Online Social Networking and Addiction—A Review of the Psychological

    The aim of this literature review was to present an overview of the emergent empirical research relating to usage of and addiction to social networks on the Internet. Initially, SNSs were defined as virtual communities offering their members the possibility to make use of their inherent Web 2.0 features, namely networking and sharing media content.

  27. Science

    Science is a rigorous, systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the world. Modern science is typically divided into three major branches: the natural sciences (e.g., physics, chemistry, and biology), which study the physical world; the social sciences (e.g., economics, psychology, and sociology), which study individuals ...

  28. Evaluating Supply Chain Performance through Financial ...

    This empirical study is based on primary data emphasizing the importance of evaluating supply chain performance through financial metrics, each activity within the supply chain directly impacts ...

  29. Insights

    Discover the latest Nielsen insights based on our robust data and analytics to connect and engage with today's audiences.