A Comprehensive Study of Dijkstra's Algorithm

10 Pages Posted: 25 Sep 2023

Muhammad Ahsan Khan

Independent

Date Written: April 16, 2020

Dijkstra's technique, named after E.W. Dijkstra, presents a solution for calculating the most direct path from a starting point in a graph (source) to a destination location. However, because it may determine the shortest pathways from one source to all other points in the graph at the same time, it is also known as the single-source shortest path issue. This article will explain the fundamentals of Dijkstra's algorithm using simple examples and illustrations.

Keywords: Dijkstra's Algorithm, Shortest Path, Graph Theory, Network Optimization, Pathfinding Algorithms

Suggested Citation: Suggested Citation

Muhammad Ahsan Khan (Contact Author)

Independent ( email ), do you have a job opening that you would like to promote on ssrn, paper statistics.

dijkstra’s algorithm Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Implementation of Dijkstra Algorithm with React Native to Determine Covid-19 Distribution

Since Covid-19 was declared a global pandemic because it has spread throughout the world, every effort has been made to help prevent and tackle the transmission of Covid-19, including information technology. Information technology developed to determine the shortest distance for Covid-19 cases around us needs to be developed. This research implements Dijkstra's Algorithm written in the React Native programming language to build a Covid-19 tracking application. The system can display the closest distance with a radius of at least one meter, and the test results can map the nearest radius of 41 meters and the most immediate radius of 147 meters. This system is built for the compatibility of Android OS and iOS applications with React Native programming.

The Simulation of Traffic Signal Preemption using GPS and Dijkstra Algorithm for Emergency Fire Handling at Makassar City Fire Service

The Makassar City Fire Department often faces obstacles in handling fires. Problems that often hinder such as congestion at crossroads, panic residents, and others. The result of this research is a system that can assist firefighters when handling fire cases in terms of accelerating the firefighting team to the location of the fire. Dijkstra's algorithm will be used to find the shortest path to the fire location and the travel time. Then the traffic signal preemption simulation adjusts the color of the lights when the GPS vehicle approaches the traffic lights on the path to be traversed. The simulation results show that the use of traffic signal preemption in collaboration with Dijkstra's algorithm and GPS can help the performance of the Makassar City Fire Department, especially for handling fires that require fast time.

The Mathematical Model for searching the Shortest Route for TB Patients with the help of Dijkstra’s Algorithm

In this research paper, we have studied TB (Tuberculosis) patients who come from different traffic routes in order to seek medical help and treatment in Karachi, Sindh, Pakistan. In this research work, we have focused on the transportation problems of the TB patients. These TB patients can travel on the paths having minimum distance as found out in this paper using Dijktra’s Algorithm. People hope that they have better treatment opportunities and financial medical relief in the government and private hospitals in Karachi. There are many private hospitals in the city but unfortunately, they provide expensive treatments. As a consequence, people belonging to the poor or lower and middle classes approach government hospitals adequately. Among them, Nazimabad Chest Hospital for TB patients (under the supervision of Dow University of Health Sciences) is providing better facilities as compared to the other hospitals involved in providing medical treatments for the similar medical issues. Nazimabad Chest Hospital for TB patients is renowned for its high quality treatment of TB patients. The hospital is located inside Government Hospital Nazimabad (under the control of Dow University), Karachi. It has latest equipments, competent and qualified staff to treat TB patients. Patients have to visit the hospital on weekly basis from their homes and residences. They use several combinations of traffic routes to reach the hospital as these patients live in different areas like, Malir Cantt, Safari Park, Hassan Square, North-Nazimabad, North Karachi, Gulshan-e-Iqbal, etc.  A path or road is required which takes the least amount of time and subsequently reduces the transportation charges. In this paper, an effort has been made to locate the shortest route for the convenience for these TB patients. In this paper, a mathematical model has been developed by using the method of Dijkstra’s algorithm to attain the desired objective.

Solusi Optimal Pencarian Jalur Tercepat Menggunakan Algoritma Dijkstra Untuk Mencari Lokasi Cafe Di Bumiayu

The purpose of this study are (1) to represent the route of café location in Bumiayu in the form of graph, (2) To find a solution from the application of the Dijkstra’s algorithm to find location of café in Bumiayu, and (3) To find the recommended fastest route. The method used in this research is literature study, data collection, problem solving, and drawing conclusions. The results showed that (1) the route of café location in Bumiayu could be represented in the form of a graph, (2) the solution was found the implementation of Dijkstra’s algorithm to find the fastest route for the café location in Bumiayu, and (3) the recommended fastest route was obtained from the starting point (v32) to 14 café locations in Bumiayu (v1,v2,v3,v5,v7,v8,v9,v10,v12,v16,v19,v23,v27,v30). There are 13 café locations that match the recommended fastest route based on the calculation of Dijkstra’s algorithm. The route from starting point (v32) to due café (v9) is one example that Dijkstra’s algorithm does not always choose the smallest weight on each side but chooses the fastest route based on the total distance traveled. There is a discrepancy in the recommended fastest route from starting point (v32) to ratawit (v19). Keywords: Dijkstra, fastest route, optimal solution

Implementasi Algoritma Dijkstra dalam Pencarian Klinik Hewan Terdekat

The application of technology that is increasingly developing is almost felt to benefit in various aspects of life, one of which is in terms of searching for veterinary clinics. This study aims to help animal owners who have difficulty in finding clinics and shops that sell various kinds of animal needs closest to the user's location. This research uses the Scrum method, which consists of Product Backlog, Sprint Backlog, Sprint and Imcrement. The output from Dijkstra's algorithm can display the location and route of the closest veterinary clinic to application users wherever they are in an average of 4 minutes 29 seconds.

Algorithm for Preventing the Spread of COVID-19 in Airports and Air Routes by Applying Fuzzy Logic and a Markov Chain

Since the start of COVID-19 and its growth into an uncontrollable pandemic, the spread of diseases through airports has become a serious health problem around the world. This study presents an algorithm to determine the risk of spread in airports and air routes. Graphs are applied to model the air transport network and Dijkstra’s algorithm is used for generating routes. Fuzzy logic is applied to evaluate multiple demographics, health, and transport variables and identify the level of spread in each airport. The algorithm applies a Markov chain to determine the probability of the arrival of an infected passenger with the COVID-19 virus to an airport in any country in the world. The results show the optimal performance of the proposed algorithm. In addition, some data are presented that allow for the application of actions in health and mobility policies to prevent the spread of infectious diseases.

Optimizing Heavy Lift Plans for Industrial Construction Sites Using Dijkstra’s Algorithm

Optimization of shortest path problem using dijkstra’s algorithm in imprecise environment.

Dijkstra algorithm is a widely used algorithm to find the shortest path between two specified nodes in a network problem. In this paper, a generalized fuzzy Dijkstra algorithm is proposed to find the shortest path using a new parameterized defuzzification method. Here, we address most important issue like the decision maker’s choice. A numerical example is used to illustrate the efficiency of the proposed algorithm.

AN IMPROVED DIJKSTRA ALGORITHM TO FIND MINIMUM TIME PATHS FOR BUS USERS IN HANOI

In Hanoi, many roads are congested during rush hour. When going through congested roads, the movement of vehicles is very slow. As a result, traveling over a short and congested road may take more time than traveling over a longer and uncongested road. Therefore, in this paper, we study the problem of finding optimal bus routes that take less time, considering the traffic jams. We extend Dijkstra's algorithm to compute waiting time at bus stations and traveling time of buses. The experimental results show that our algorithm is suitable.

Graph Signal Denoising Method Using the K-Nearest Neighbors Found by Dijkstra's Algorithm

Export citation format, share document.

research paper on dijkstra algorithm

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Dijkstra Algorithm Application: Shortest Distance between Buildings

Profile image of Priyank  Kumar

International Journal of Engineering & Technology

The shortest path algorithm is one of the best choices for implementation of data structures. The shortest path (SP) problem involves the problem of finding a suitable path between “two vertices or nodes in a graph” in such a way that the sum of the weights of its component edges is minimal. There are many theories for solving this problem one of the widely used way solution for solving this problem is Dijkstra’s algorithm (DA) which is also widely used in many engineering calculation works also. There are two types of DA one is the basic one and other one is optimized. This paper is focused on the basics one which provides a shortest route between source node and the destination node. Main focus has been kept on keeping the work simple and easy to understand with some basic concepts .Storage space and operational efficiency improvement has been tried to improve.

Related Papers

Mark Karwan

... for the solution of the underlying shortest path problem. The creation of the visibility graph is complex in ... Their division of the work-area method ... 11 Page 12. Table 1: Summary ofExistence of Intersection Point(s) of Circle and Line ∆ incidence ...

research paper on dijkstra algorithm

INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENT

Edward Obeng Amoako

It is becoming difficult for emergence services to find the best route especially in Kumasi to any destination in order to save lives in real time. This study deals with the problem of finding shortest paths in traversing some locations within the Kumasi Metropolis in the Ashanti Region of Ghana. Dijkstra‘s Algorithm was selected to determine the shortest distances from any location to any destination within the Kumasi metropolis. The objective of thesis is to use Dijkstra‘s algorithm in constructing the minimum spanning tree considering the dual carriage ways in the road network of Kumasi metropolis within the shortest possible time for emergence services. The distance between 51 locations of the towns with the major roads was measured and a legend and a matrix were formulated. A visual basic program was prepared using Dijkstra‘s algorithm. The distances were used to prepare an input deck for the visual program. The methodology employed included review of relevant literature of the types of Dijkstra‘s algorithm and methods employed in the solution of the Dijkstra‘s algorithm and to develop computer solutions – ArcGIS and VB.net for faster computation of Dijkstra‘s algorithm. The result shows a remarkable reduction in the actual distance as compared with the ordinary routing. These results indicate, clearly the importance of this type of algorithms in the optimization of network flows. Hence the shortest distance from any area in Kumasi metropolis to another can easily be calculated using this thesis so as to minimize the average loss of lives in case emergences.

IJTES Journal

International Journal of Advance Research in Computer Science and Management Studies [IJARCSMS] ijarcsms.com

Finding the shortest paths to reach a place is one of the main transportation problems in modern cities. Therefore, the current study aims to implement two algorithms: Dijkstra and BF to search the best route for reaching different destinations. The case study considered in this study involves finding the shortest path between two terminals in Bandung city. The calculations were performed by using graph and Haversine Formula, which further helped in calculating Dijkstra and Bellman-Ford Algorithm. The comparison of both the algorithms showed that Dijkstra's algorithm was able to produce a shorter path, as compared to the Bellman-Ford Algorithm.

International Transactions in Operational Research

Leo Liberti

Malaysian Journal of Computer Science

Dr Tehseen Zia

IOSR Journals publish within 3 days

To minimize the cost of routing in computer networks, the shortest path problem is greatly used in order to find the minimum distance path from source to destination. The shortest path problem is sometimes called the min-delay path problem. The main objective of this paper is to evaluate and discuss the general aspect of Dijkstra's Algorithm in solving the shortest path problem. Explanation and evaluation of the algorithm is illustrated in graphical forms in order to exhibit the functionality of the algorithm.

Dr.G. Suseendran

Kaveh Shahabi

In the face of a natural or man-made disaster, evacuation planning refers to the process of reallocating the endangered population to a set of safe places, often in a hurry. Such a task needs proper preparation, execution, and most definitely a post-disaster response. We have contributed a new taxonomy of the evacuation planning problem and categorized available solutions. Evacuation routing is part of the bigger problem that finds the best routes to relocate the endangered population to safety. Given circumstances, even the tiniest improvement in evacuation routing during execution can save many lives. Therefore, different research communities are looking at this particular problem from their own viewpoints hoping to design a better practical solution. We propose a new method to perform evacuation routing efficiently under capacity constraints. Traditionally, simulation software or shortest path routing combined with zonal scheduling have been used to solve routing problems. Our method utilizes a state-of-the-art algorithm to connect each source node to its nearest destination. It also intelligently takes into account transportation network capacity and traffic flow to minimize congestion and system-wide transportation times. We have compared our method with previous routing algorithms and a common simulation method in a static environment. We show that our algorithm generates reliable and realistic routes and decreases transportation time by at least an order of magnitude, without any loss of performance. We also define the dynamic evacuation routing problem and propose a solution. The dynamic solution is capable of updating routes if the network topology is changed during the evacuation process. Effectively, it can solve the evacuation problem for a xiii moving disaster. We argue that an ideal evacuation routing algorithm should be able to generate realistic and efficient routes in a dynamic environment because changes to the road network are likely to happen after natural disasters. For example if a main road is blocked during a flood, the evacuation routing algorithm updates the plan based on this change in the road network and pushes the changed routes to the corresponding evacuees. In this dissertation we discuss evacuation routing and how it is connected to different aspects of the evacuation planning problem. Major works in this field have been studied and a better algorithm has been developed. The new algorithm’s performance and running time is iteratively improved and reported along with a comparison with previous works. The algorithm is extended to also solve the problem in a dynamic environment. Together these new developments pave the path for future researchers to study the evacuation problem and to integrate it into urban transportation services. Hopefully one day we can save more lives than before when future disasters occur.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

2011 Nirma University International Conference on Engineering

Prof Usha Mehta

International Journal of Engineering Research and Technology (IJERT)

IJERT Journal

The Ninth DIMACS …

Douglas Gregor , Nick Edmonds

Engineering, Technology & Applied …

Dr. Biswajit R Bhowmik

Náyade Sharon

Fuzzy Optimization and Decision Making

Anthony Finn

Funny Stuff

International Journal of Applied Information Technology

Syaiful Ahdan

Applied Mathematical Sciences

Airin Abu Samah

International Journal of Recent Research Aspects ISSN 2349-7688

Computer Networks

Mohammad Aminian , Dong Yao

Sudip Sahana

International Journal of Latest Technology in Engineering, Management & Applied Science -IJLTEMAS (www.ijltemas.in)

Pattern Analysis and Applications

ACM Journal of Experimental Algorithms

Umberto Nanni

Engineering Research Publication and IJEAS

Journal of Experimental …

Proceedings 11th International Parallel Processing Symposium

Christos Zaroliagis

Dimitris Kalles

Pesquisa Operacional

Tommaso Pastore

Lecture Notes in Computer Science

Catherine McGeoch

Youri Tamitegama

Hahsmat Noori

2014 International Conference on Advanced Logistics and Transport (ICALT)

Azedine Boulmakoul

IAEME Publication

Edwin Mit , Syahrul N Junaini

Muhammad Maqbool

Mohammad Abdel Rahim

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

AIP Publishing Logo

Research on Quadrotor UAV control and path planning based on PID controller and Dijkstra algorithm

  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Reprints and Permissions
  • Cite Icon Cite
  • Search Site

Wangsheng Xushi; Research on Quadrotor UAV control and path planning based on PID controller and Dijkstra algorithm. AIP Conf. Proc. 26 June 2024; 3144 (1): 030015. https://doi.org/10.1063/5.0214314

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Unmanned Aerial Vehicles have evolved from their military origins to a wide range of civilian and commercial uses. However, commercial uses have a higher occupation. This means that civilian uses have high exploitability. The low number of civilian Unmanned Aerial Vehicles is largely dependent on the fact that they are expensive and have few counterparts. However, they have the advantage of being fast and unaffected by urban traffic. An Unmanned Aerial Vehicle with path planning modules and flight controllers makes it possible to transport goods within cities. According to this situation, this paper introduces a possible application to the logistics industry. Urban areas usually have a high demand for the logistics industry. The security and efficiency of delivery are the most significant issues for people in urban areas. However, the traditional control method of UAVs is time-consuming and laborious. Therefore, a new combination method to control a UAV is proposed in this paper. The functions and simulation in this paper provide the feasibility of using the new combination method to control a UAV. In the simulation, a UAV can fly from the start point to the destination steadily. All the obstacles on the map are avoided. This means a UAV can deliver packages with this technology. This study gives a new potential for UAVs in civilian uses, which also improves the existing delivery method.

Sign in via your Institution

Citing articles via, publish with us - request a quote.

research paper on dijkstra algorithm

Sign up for alerts

  • Online ISSN 1551-7616
  • Print ISSN 0094-243X
  • For Researchers
  • For Librarians
  • For Advertisers
  • Our Publishing Partners  
  • Physics Today
  • Conference Proceedings
  • Special Topics

pubs.aip.org

  • Privacy Policy
  • Terms of Use

Connect with AIP Publishing

This feature is available to subscribers only.

Sign In or Create an Account

Help | Advanced Search

Computer Science > Data Structures and Algorithms

Title: a comparison of dijkstra's algorithm using fibonacci heaps, binary heaps, and self-balancing binary trees.

Abstract: This paper describes the shortest path problem in weighted graphs and examines the differences in efficiency that occur when using Dijkstra's algorithm with a Fibonacci heap, binary heap, and self-balancing binary tree. Using C++ implementations of these algorithm variants, we find that the fastest method is not always the one that has the lowest asymptotic complexity. Reasons for this are discussed and backed with empirical evidence.
Comments: 15 pages, 5 figures, 3 algorithms, 2 tables, source code listing
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
Cite as: [cs.DS]
  (or [cs.DS] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

[al-jabr means "restoring", referring to the process of moving a subtracted quantity to the other side of an equation; al-muqabala is "comparing" and refers to subtracting equal quantities from both sides of an equation.]

  • Introduction
  • Mathematics for Algorithmic
  • Functions and Relations
  • Vectors and Matrices
  • Linear Inequalities and Linear Equations
  • O-I Knapsack
  • Fractional Knapsack
  • Activity Selection Problem
  • Huffman's Codes
  • Minimum Spanning Tree
  • Kruskal's Algorithm
  • Prim's Algorithm
  • Dijkstra's Algorithm
  • Divide & Conquer Algorithms
  • Matrix-chain Multiplication
  • Knapsack Problem DP Solution
  • Activity Selection Problem DP Solution
  • Aggregate Method
  • Accounting Method
  • Potential Method
  • Dynamic Table
  • Binary Search Tree
  • Breadth First Search (BFS)
  • Depth First Search (DFS)
  • Topological Sort
  • Strongly Connected Components
  • Generic Minimum Spanning Tree
  • Bellman-Ford Algorithm
  • Na ï ve String Matching
  • Knuth-Morris-Pratt Algorithm
  • Boyer-Moore Algorithm
  • Bubble Sort
  • Insertion Sort
  • Selection Sort
  • Counting Sort
  • Bucket Sort
  • Computational Geometry
  • Information-Theoretic Argument
  • Adversary Argument
  • NP-Completeness And Reduction
  • Vertex Cover
  • The Traveling Salesman Problem
  • Linear Programming
  • Algorithm Design - Foundations, Analysis & Internet Examples by Michael T. Goodrich and Roberto Tamassia
  • Data Structures and Algorithms in Java E-Version --> by Michael T. Goodrich and Roberto Tamassia
  • Data Structures and Algorithms in C++ by Michael T. Goodrich , Roberto Tamassia and David M. Mount
  • Online Learning Center   -- CLR chapters overview and PowerPoint slides
  • Algorithmica: Home | Electronic Access via OhioLink (Dec.98 -Present) | Print Issues at Math lib.
  • Journal of Algorithms: Home | Electronic Access via OhioLink (Jan.93 - Present)
  • Journal of Graph Algorithms & Applications -- An electronic journal available via WWW. All papers freely available in PostScript and PDF.
  • Algorithmist, The -- dedicated to anything algorithms - from the practical realm, to the theoretical realm.
  • Algorithms Course Material on the Net
  • Algorithms in the Real World   course by Guy E. Blelloch
  • Algorithmic Solutions (formerly LEDA Library) -- a library of the data types and algorithms ( number types and linear algebra, basic data types, dictionaries, graphs, geometry, graphics).
  • Analysis of Algorithms Lectures at Princeton -- Applets & Demos based on CLR.
  • Collected Algorithms(CALG) of the ACM
  • Complete Collection of Algorithm Animations (CCAA)
  • Data Structures And Number Systems  -- by Brian Brown.
  • Function Calculator by Xiao Gang
  • FAQ - Com.graphics.algorithms -- maintained by Joseph O'Rourke
  • Game Theory Net
  • Grail Project  -- A symbolic computation environment for finite-state machines, regular expressions, and finite languages.
  • Java Applets Center by R.Mukundan
  • Lecture Notes by Diane Cook
  • Lecture Notes   for Graduate Algorithms by   Samir Khuller
  • Maze classification and algorithms  -- A short description of mazes and how to create them. Definition of different mazetypes and their algorithms.
  • Priority Queues  -- Electronic bibliography on priority queues (heaps). Links to downloadable reports, researchers' home pages, and software.
  • Softpanorama Vitual Library /Algorithms
  • Ternary Search Trees  -- Algorithm for search. PDF file and examples in C.
  • Traveling Salesman -- bibliography and software links.

Computatability

  • Algorithms and Complexity  -- A downloadable textbook by Herbert S. Wilf.
  • Blackbox - a SAT Technology Planning System  -- Blackbox is a planning system that works by converting problems specified in STRIPS notation into Boolean satisfiability problems, and then solving the problems with a variety of state-of-the-art satisfiability engines.
  • Bibliographic Database for Computability Theory  -- Extensive bibliography on computability and recursion theory, maintained by Peter Cholak.
  • Compendium of NP Optimization Problems  -- This is a preliminary version of the catalog of NP optimization problems.
  • Computability and Complexity  -- An online course on complexity.
  • Computational Complexity and Statistical Physics  -- Santa Fe, New Mexico, USA; 4--6 September 2001.
  • Complexity International -- journal for scientific papers dealing with any area of complex systems research.
  • Computability Theory  -- Directory of researchers working in computability theory, and list of open problems.
  • ECCC - Electronic Colloquium on Computational Complexity  -- The Electronic Colloquium on Computational Complexity is a new forum for the rapid and widespread interchange of ideas, techniques, and research in computational complexity. The Electronic Colloquium on Computational Complexity (ECCC) welcomes papers, short notes and surveys with relevance to the theory of computation.
  • Hypercomputation Research Network  -- The study of computation beyond that defined by the Turing machine, also known as super-Turing, non-standard or non-recursive computation. Links to people, resources and discussions.
  • IEEE Conference on Computational Complexity  -- This conference started as "Structure in Complexity Theory" in 1986. It recently acquired the new name "Conference on Computational Complexity", which was used for the first time in 1996. CTI, DePaul University, Chicago IL; 18--21 June 2001.
  • SAT Live!  -- A collection of up-to-date links about the satisfiability problem (solvers, benchmarks, articles). A discussion forum is available as well.
  • Roberto Bayardo's Resources  -- Includes the relsat SAT solver and related papers.
  • Problem Solving Environments Home Page  -- This site contains information about Problem Solving Environments (PSEs), research, publications, and information on topics related to PSEs.
  • SATLIB - The Satisfiability Library  -- A collection of benchmark problems, solvers, and tools. One strong motivation for creating SATLIB is to provide a uniform test-bed for SAT solvers as well as a site for collecting SAT problem instances, algorithms, and empirical characterisations of the algorithms' performance.
  • Stas Busygin's NP-Completeness Page  -- A proposal for solving NP-hard problems.

Quantum Computing

  • Centre for Quantum Computation  -- Based at Oxford University. Well designed site, with a large amount of information available.
  • D-Wave Systems, Inc.  -- D-Wave Systems (dwavesys.com) is a portal to the state of the art in the design of quantum computers, operating systems, algorithms, hardware, superconductors, and quantum physics.
  • id Quantique  -- Site Of id Quantique, Inc. Products include a quantum random number generator,and a quantum cryptography system.
  • MagicQ Technologies Inc.  -- The home site of the first start up company devoted entirely to quantum computing. No patents or products to date, but of interest by virtue of being first off the block.
  • Quantum Architecture Research Center  -- The home page of a team formed by Frederic Chong, Isaac Chuang, and John Kubiatowicz, the three top experimentalists in quantum computing.
  • Quantum Computation Archive This site contains both technical papers and links to QC reports in the media.
  • Quantum Computer Emulator (QCE)  -- A Windows based simulator of quantum computer hardware. Provides an environment to execute quantum algorithms under realistic experimental conditions.
  • Quantum Computer Physics Laboratory of IPT Russian Academy of Sciences  -- "Quantum Computer" seminar program. Staff, contact info, research papers. 
  • Quantum Computing At The Max Plank Institute  -- Provides an overview of quantum computer related research taking place at the Max Plank Institute. The primary focus is ion trap based computing. Selected reprints are available.
  • Quantum Computing with Electron Spins in Quantum Dots  -- A detailled study of using electron spins for quantum computation. Several possible implementations are discussed.
  • Quantum Informatics at the University of Aarhus  -- Performs research on quantum computing with an emphasis on quantum cryptography.
  • Graham's scan (Convex Hull Algorithm) (Applet)
  • Line Sweeping Algorithm   (Applet)
  • Max Flow   (Applet)
  • SkipList   (Applet)
  • Stable Marriage (Applet)

Graph Algorithms

Societies and organizations.

  • Numerical Algorithms Group (NAG)
  • Line Sweeping Algorithm
  • Graham Scan and Gift Wrapping
  • Graham's Scan (Convex Hull Algorithm)
  • Qhull -- The QuickHull Algorithm.

A Low-Cost Indoor Navigation and Tracking System Based on Wi-Fi-RSSI

  • Published: 27 June 2024

Cite this article

research paper on dijkstra algorithm

  • Nina Siti Aminah   ORCID: orcid.org/0000-0003-4725-0130 1 ,
  • Arsharizka Syahadati Ichwanda 1 ,
  • Daryanda Dwiammardi Djamal 1 ,
  • Yohanes Baptista Wijaya Budiharto 1 &
  • Maman Budiman 1  

In the recent years, the number of smartphone users has increased dramatically every year. Smartphones produce a variety of services including indoor navigation and tracking using the Received Signal Strength Indicator (RSSI) value of the Wi-Fi (Wireless Fidelity) routers to estimate user position. In this research, we developed a navigation and tracking system using a Fingerprint map and k-Nearest Neighbor (k-NN) algorithm. In that way, we can help the user to go through the nearest path to user destination by using Dijkstra’s algorithm. These features are displayed in the form of an RSSI-based navigation application on an Android smartphone. At the same time, estimated position of user of this navigation app will be sent to server and viewed in a real time website application. This system helps to assist visitors in finding their way in a complex building and at the same time it allows building owners record and analyze visitor movement. One key benefit of the system is its low initial cost. It only utilizes the existing Wi-Fi infrastructure. Experimental results show that this system can reach an accuracy up to 78% and distance errors less than 3 m.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research paper on dijkstra algorithm

Data availability

The data used to support the findings of this study are included within the article.

Spachos, P., & Plataniotis, K. N. (2020). BLE beacons for indoor positioning at an interactive IoT-based smart museum. IEEE System Journal Advance, 14 (3), 3483–3493.

Article   Google Scholar  

Raj, R., Saxena, K., & Dixit, A. (2020). Passive optical identifiers for VLC-based indoor positioning systems: design, hardware simulation, and performance analysis. IEEE System Journal Advance, 15 (3), 3208–3219.

Chen, Z., Zou, H., Yang, J., Jiang, H., & Xie, L. (2020). WiFi fingerprinting indoor localization using local feature-based deep LSTM. IEEE System Journal Advance, 14 (2), 3001–3010.

Han, S., Li, Y., Meng, W., Li, C., Liu, T., & Zhang, Y. (2019). Indoor Localization with a single Wi-Fi access point based on OFDM-MIMO. IEEE System Journal. Advance., 13 (1), 964–972.

Uradzinksi, M., Guo, H., Liu, X., & Yu, M. (2017). Advanced indoor positioning using zigbee wireless technology. Wireless Personal Communications Advance, 97 (4), 6509–6518.

Chung, C.-K., Chung, I.-Q., Wang, Y.-H., and Chang, C.-T. (2016). The integrated applications of WIFI and APP used in the shopping mall environment for menber card E-marketing. In 2016 International Conference on Machine Learning and Cybernetics (ICMLC). https://doi.org/10.1109/ICMLC.2016.7872968 .

Potts, J. (2014). Economics of public WiFi. Retrieved January 7, 2023, from Retrieved January 7, 2020, from https://telsoc.org/journal/ajtde-v2-n1/a20 .

Holst, A. (2019). Number of smartphone users worldwide from 2016 to 2021. Retrieved January 7, 2022 from https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide .

Dea, S. O’. (2020). Market share of mobile operating systems worldwide 2012–2019. Retrieved July 10, 2020 from https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009 .

Zhuang, Y., Yang, J., Li, Y., Qi, L., & El-Sheimy, N. (2016). Smartphone-based indoor localization with Bluetooth low energy beacons. Sensors . https://doi.org/10.3390/s16050596

Wang, Y., Ye, Q., Cheng, J., and Wang, L. (2015). RSSI-based bluetooth indoor localization. In 11th International conference on mobile ad-hoc and sensor networks (MSN). https://doi.org/10.1109/MSN.2015.14 .

Yoon, S., Lee, K., Yun, Y., & Rhee, I. (2016). ACMI: FM-based indoor localization via autonomous fingerprinting. IEEE Transactions on Mobile Computing . https://doi.org/10.1109/TMC.2015.2465372

Dian, Z., Kezhong, L., & Rui, M. (2015). A precise RFID indoor localization system with sensor network assistance. China Communications, 12 (4), 13–22. https://doi.org/10.1109/CC.2015.7114062

Seco, F., & Jiménez, A. R. (2018). Smartphone-based cooperative indoor localization with RFID technology. Sensors, 18 (1), 266. https://doi.org/10.3390/s18010266

He, S., & Chan, S.-G. (2016). Wi-Fi fingerprint-based indoor positioning: Recent advances and comparisons. IEEE Communications Survey Tutorials, 18 (1), 466–490. https://doi.org/10.1109/COMST.2015.2464084

RSSI-Based indoor localization with the internet of things: IEEE Journals & Magazine. Retrieved December 22, 2020. https://ieeexplore.ieee.org/document/8371230 .

Zayets, A. and Steinbach, E. (2017). Robust WiFi-based indoor localization using multipath component analysis, In 2017 International conference on indoor positioning and indoor navigation (IPIN), (pp. 1–8), https://doi.org/10.1109/IPIN.2017.8115943 .

Li, Z., Braun, T., Dimitrova, D. C. (2015). A passive WiFi source localization system based on fine-grained power-based trilateration, In 2015 IEEE 16th International symposium on a world of wireless, mobile and multimedia networks (WoWMoM), (pp. 1–9), https://doi.org/10.1109/WoWMoM.2015.7158147 .

Xingbin Ge and Zhiyi Qu, (2016). Optimization WIFI indoor positioning KNN algorithm location-based fingerprint, In 2016 7th IEEE International conference on software engineering and service science (ICSESS), (pp. 135–137), https://doi.org/10.1109/ICSESS.2016.7883033 .

Kasantikul, K., Xiu, C., Yang, D., Yang, M. (2015). An enhanced technique for indoor navigation system based on WIFI-RSSI. In 2015 Seventh international conference on ubiquitous and future networks (pp. 513-518). IEEE

Wang, D., Zhao, F., Wang, T., and Zhang, X., (2018). WiFi fingerprint based indoor localization with iterative weighted KNN for WiFi AP missing, In 2018 IEEE 88th vehicular technology conference (VTC-Fall), (pp. 1–5), https://doi.org/10.1109/VTCFall.2018.8690648 .

Danalet, A., Farooq, B., & Bierlaire, M. (2014). A Bayesian approach to detect pedestrian destination-sequences from WiFi signatures. Transportation Research Part C: Emerging Technology, 44 , 146–170. https://doi.org/10.1016/j.trc.2014.03.015

WiFi based indoor localization: application and comparison of machine learning algorithms: IEEE conference publication. Retrieved December 22, 2020. https://ieeexplore.ieee.org/abstract/document/8543125 .

Jang, J. and Hong, S., (2018). Indoor localization with WiFi fingerprinting using convolutional neural network, In 2018 Tenth international conference on ubiquitous and future networks (ICUFN), (pp. 753–758), https://doi.org/10.1109/ICUFN.2018.8436598 .

Wang, X., Yu, Z., and Mao, S. (2018). DeepML: Deep LSTM for indoor localization with smartphone magnetic and light sensors, In 2018 IEEE International conference on communications (ICC), (pp. 1–6), https://doi.org/10.1109/ICC.2018.8422562 .

“Sensors|Free Full-Text|Smart CEI Moncloa: An IoT-based platform for people flow and environmental monitoring on a smart university campus|HTML. Retrieved December 22, 2020. https://www.mdpi.com/1424-8220/17/12/2856/htm .

Jana, S. and Chattopadhyay, M. (2015). An event-driven university campus navigation system on android platform, In 2015 Applications and innovations in mobile computing (AIMoC), (pp. 182–187), https://doi.org/10.1109/AIMOC.2015.7083850 .

Huang, J., Zhan, Y., Cui, W., Yuan, Y., and Qi, P. (2010). Development of a campus information navigation system based on GIS, In 2010 International conference on computer design and applications, (vol. 5, pp. V5–491-V5–494), https://doi.org/10.1109/ICCDA.2010.5541049 .

Kasantikul, K., Xiu, C., Yang, D. and Yang, M. (2015). An enhanced technique for indoor navigation system based on WIFI-RSSI, In 2015 Seventh international conference on ubiquitous and future networks, (pp. 513–518), https://doi.org/10.1109/ICUFN.2015.7182597 .

Li, D., Zhang, B., Yao, Z. and Li, C. (2014). A feature scaling based k-nearest neighbor algorithm for indoor positioning system, In 2014 IEEE global communications conference, (pp. 436–441), https://doi.org/10.1109/GLOCOM.2014.7036847 .

Chuang Ruan, Jianping Luo, and Yu Wu. (2014). Map navigation system based on optimal Dijkstra algorithm, In 2014 IEEE 3rd International conference on cloud computing and intelligence systems, (pp. 559–564), https://doi.org/10.1109/CCIS.2014.7175798 .

Colter, J. A. (2016). Evaluating and improving the usability of MIT App Inventor, Doctoral dissertation, Massachusetts Institute of Technology, Massachusetts.

Adiono, T., Anindya, S. F., Fuada, S., Afifah, K., & Purwanda, I. G. (2019). Efficient android software development using MIT app inventor 2 for bluetooth-based smart home. Wireless Personal Communications, 105 (1), 233–256. https://doi.org/10.1007/s11277-018-6110-x

Aminah, N. S., Ichwanda, A. S., Djamal, D. D., Budiharto, Y. B. W., and Budiman, M. (2020). Improving indoor positioning precision by using modified weighted centroid localization, In: International Conference on Science, Infrastructure Technology and Regional Development (ICoSITeR).

Rahman, K., Ghani, N. A., Kamil, A. A., & Mustafa, A. (2020). Analysis of pedestrian free flow walking speed in a least developing country: A factorial design study”. Research Journal of Applied Sciences, Engineering and Technology., 4 (21), 4299–4304.

Google Scholar  

A Real-time indoor tracking system in smartphones. In Proceedings of the 19th ACM international conference on modeling, analysis and simulation of wireless and mobile systems.

Zou, H., Chen, Z., Jiang, H., Xie, L. and Spanos, C. (2017). Accurate indoor localization and tracking using mobile phone inertial sensors, WiFi and iBeacon,” In 2017 IEEE International symposium on inertial sensors and systems (INERTIAL), Kauai, HI, USA, (pp. 1–4), https://doi.org/10.1109/ISISS.2017.7935650 .

Download references

This research is fully funded by the Indonesian Ministry of Research and Technology/National Agency for Research and Innovation, and Indonesian Ministry of Education and Culture, under World Class University Program managed by Institut Teknologi Bandung. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and affiliations.

Internet of Things Laboratory, Physics Program Study, Institut Teknologi Bandung, Jl. Ganesha 10, Bandung, 40132, Indonesia

Nina Siti Aminah, Arsharizka Syahadati Ichwanda, Daryanda Dwiammardi Djamal, Yohanes Baptista Wijaya Budiharto & Maman Budiman

You can also search for this author in PubMed   Google Scholar

Contributions

A.S. Ichwanda, D.D. Djamal, and Y.B.W. Budiharto contributed for Interpretation of results preparation of figures, N.S. Aminah wrote the main manuscript, M. Budiman reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Nina Siti Aminah .

Ethics declarations

Conflict of interest.

No, I declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Consent for Publication

The authors certify that this material or similar material has not been and will not be submitted to or published in any other publication before. Furthermore, the authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Aminah, N.S., Ichwanda, A.S., Djamal, D.D. et al. A Low-Cost Indoor Navigation and Tracking System Based on Wi-Fi-RSSI. Wireless Pers Commun (2024). https://doi.org/10.1007/s11277-024-11361-3

Download citation

Accepted : 13 June 2024

Published : 27 June 2024

DOI : https://doi.org/10.1007/s11277-024-11361-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • k-NN algorithm
  • Find a journal
  • Publish with us
  • Track your research

ACM Digital Library home

  • Advanced Search

A Lagrangian relaxation algorithm for stochastic fixed interval scheduling problem with non-identical machines and job classes

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, recommendations, lagrangian relaxation for parallel machine batch scheduling with deteriorating jobs.

We investigate the problem of scheduling jobs with batch production on the identical parallel machines with respect to time-deteriorating processing time in which the processing time of a job is a piecewise linear deterioration function of its starting ...

A new Lagrangian relaxation algorithm for hybrid flowshop scheduling to minimize total weighted completion time

We investigate the problem of scheduling n jobs in s -stage hybrid flowshops with parallel identical machines at each stage. The objective is to find a schedule that minimizes the sum of weighted completion times of the jobs. This problem has been proven ...

Lagrangian Relaxation Algorithm for a Single Machine Scheduling with Release Dates

The paper considers a single machine scheduling problem with release dates to minimize the total completion times of jobs. The problem belongs to the class of NP-hard. We present a mixed-integer linear programming formulation based on slot idea. The ...

Information

Published in.

Elsevier Science Ltd.

United Kingdom

Publication History

Author tags.

  • Fixed interval scheduling
  • Uncertain delays
  • Non-identical machines
  • Job classes
  • Lagrangian relaxation
  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 19 June 2024

Detecting hallucinations in large language models using semantic entropy

  • Sebastian Farquhar   ORCID: orcid.org/0000-0002-9185-6415 1   na1 ,
  • Jannik Kossen 1   na1 ,
  • Lorenz Kuhn 1   na1 &
  • Yarin Gal   ORCID: orcid.org/0000-0002-2733-2078 1  

Nature volume  630 ,  pages 625–630 ( 2024 ) Cite this article

64k Accesses

1457 Altmetric

Metrics details

  • Computer science
  • Information technology

Large language model (LLM) systems, such as ChatGPT 1 or Gemini 2 , can show impressive reasoning and question-answering capabilities but often ‘hallucinate’ false outputs and unsubstantiated answers 3 , 4 . Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents 5 or untrue facts in news articles 6 and even posing a risk to human life in medical domains such as radiology 7 . Encouraging truthfulness through supervision or reinforcement has been only partially successful 8 . Researchers need a general method for detecting hallucinations in LLMs that works even with new and unseen questions to which humans might not know the answer. Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations—confabulations—which are arbitrary and incorrect generations. Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words. Our method works across datasets and tasks without a priori knowledge of the task, requires no task-specific data and robustly generalizes to new tasks not seen before. By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability.

Similar content being viewed by others

research paper on dijkstra algorithm

Testing theory of mind in large language models and humans

research paper on dijkstra algorithm

Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT

research paper on dijkstra algorithm

ThoughtSource: A central hub for large language model reasoning data

‘Hallucinations’ are a critical problem 9 for natural language generation systems using large language models (LLMs), such as ChatGPT 1 or Gemini 2 , because users cannot trust that any given output is correct.

Hallucinations are often defined as LLMs generating “content that is nonsensical or unfaithful to the provided source content” 9 , 10 , 11 but they have come to include a vast array of failures of faithfulness and factuality. We focus on a subset of hallucinations which we call ‘confabulations’ 12 for which LLMs fluently make claims that are both wrong and arbitrary—by which we mean that the answer is sensitive to irrelevant details such as random seed. For example, when asked a medical question “What is the target of Sotorasib?” an LLM confabulates by sometimes answering KRASG12 ‘C’ (correct) and other times KRASG12 ‘D’ (incorrect) despite identical instructions. We distinguish this from cases in which a similar ‘symptom’ is caused by the following different mechanisms: when LLMs are consistently wrong as a result of being trained on erroneous data such as common misconceptions 13 ; when the LLM ‘lies’ in pursuit of a reward 14 ; or systematic failures of reasoning or generalization. We believe that combining these distinct mechanisms in the broad category hallucination is unhelpful. Our method makes progress on a portion of the problem of providing scalable oversight 15 by detecting confabulations that people might otherwise find plausible. However, it does not guarantee factuality because it does not help when LLM outputs are systematically bad. Nevertheless, we significantly improve question-answering accuracy for state-of-the-art LLMs, revealing that confabulations are a great source of error at present.

We show how to detect confabulations by developing a quantitative measure of when an input is likely to cause an LLM to generate arbitrary and ungrounded answers. Detecting confabulations allows systems built on LLMs to avoid answering questions likely to cause confabulations, to make users aware of the unreliability of answers to a question or to supplement the LLM with more grounded search or retrieval. This is essential for the critical emerging field of free-form generation in which naive approaches, suited to closed vocabulary and multiple choice, fail. Past work on uncertainty for LLMs has focused on simpler settings, such as classifiers 16 , 17 and regressors 18 , 19 , whereas the most exciting applications of LLMs relate to free-form generations.

The term hallucination in the context of machine learning originally comes from filling in ungrounded details, either as a deliberate strategy 20 or as a reliability problem 4 . The appropriateness of the metaphor has been questioned as promoting undue anthropomorphism 21 . Although we agree that metaphor must be used carefully with LLMs 22 , the widespread adoption of the term hallucination reflects the fact that it points to an important phenomenon. This work represents a step towards making that phenomenon more precise.

To detect confabulations, we use probabilistic tools to define and then measure the ‘semantic’ entropy of the generations of an LLM—an entropy that is computed over meanings of sentences. High entropy corresponds to high uncertainty 23 , 24 , 25 —so semantic entropy is one way to estimate semantic uncertainties. Semantic uncertainty, the broader category of measures we introduce, could be operationalized with other measures of uncertainty, such as mutual information, instead. Entropy in free-form generation is normally hard to measure because answers might mean the same thing (be semantically equivalent) despite being expressed differently (being syntactically or lexically distinct). This causes naive estimates of entropy or other lexical variation scores 26 to be misleadingly high when the same correct answer might be written in many ways without changing its meaning.

By contrast, our semantic entropy moves towards estimating the entropy of the distribution of meanings of free-form answers to questions, insofar as that is possible, rather than the distribution over the ‘tokens’ (words or word-pieces) which LLMs natively represent. This can be seen as a kind of semantic consistency check 27 for random seed variation. An overview of our approach is provided in Fig. 1 and a worked example in Supplementary Table 1 .

figure 1

a , Naive entropy-based uncertainty measures variation in the exact answers, treating ‘Paris’, ‘It’s Paris’ and ‘France’s capital Paris’ as different. But this is unsuitable for language tasks for which sometimes different answers mean the same things. Our semantic entropy clusters answers which share meanings before computing the entropy. A low semantic entropy shows that the LLM is confident about the meaning. b , Semantic entropy can also detect confabulations in longer passages. We automatically decompose a long generated answer into factoids. For each factoid, an LLM generates questions to which that factoid might have been the answer. The original LLM then samples  M possible answers to these questions. Finally, we compute the semantic entropy over the answers to each specific question, including the original factoid. Confabulations are indicated by high average semantic entropy for questions associated with that factoid. Here, semantic entropy classifies Fact 1 as probably not a confabulation because generations often mean the same thing, despite very different wordings, which a naive entropy would have missed.

Intuitively, our method works by sampling several possible answers to each question and clustering them algorithmically into answers that have similar meanings, which we determine on the basis of whether answers in the same cluster entail each other bidirectionally 28 . That is, if sentence A entails that sentence B is true and vice versa, then we consider them to be in the same semantic cluster. We measure entailment using both general-purpose LLMs and natural language inference (NLI) tools developed specifically for detecting entailment for which we show direct evaluations in Supplementary Tables 2 and 3 and Supplementary Fig. 1 . Textual entailment has previously been shown to correlate with faithfulness 10 in the context of factual consistency 29 as well as being used to measure factuality in abstractive summarization 30 , especially when applied at the right granularity 31 .

Semantic entropy detects confabulations in free-form text generation across a range of language models and domains, without previous domain knowledge. Our evaluations cover question answering in trivia knowledge (TriviaQA 32 ), general knowledge (SQuAD 1.1; ref. 33 ), life sciences (BioASQ 34 ) and open-domain natural questions (NQ-Open 35 ) derived from actual queries to Google Search 36 . In addition, semantic entropy detects confabulations in mathematical word problems (SVAMP 37 ) and in a biography-generation dataset, FactualBio, accompanying this paper.

Our results for TriviaQA, SQuAD, BioASQ, NQ-Open and SVAMP are all evaluated context-free and involve sentence-length answers (96 ± 70 characters, mean ± s.d.) and use LLaMA 2 Chat (7B, 13B and 70B parameters) 38 , Falcon Instruct (7B and 40B) 39 and Mistral Instruct (7B) 40 . In the Supplementary Information , we further consider short-phrase-length answers. Results for FactualBio (442 ± 122 characters) use GPT-4 (ref. 1 ). At the time of writing, GPT-4 (ref. 1 ) did not expose output probabilities 41 or hidden states, although it does now. As a result, we propose a discrete approximation of our estimator for semantic entropy which allows us to run experiments without access to output probabilities, which we use for all GPT-4 results in this paper and which performs similarly well.

Our confabulation detection with semantic entropy is more robust to user inputs from previously unseen domains than methods which aim to ‘learn’ how to detect confabulations from a set of example demonstrations. Our method is unsupervised, meaning that we do not need labelled examples of confabulations. By contrast, supervised methods detect confabulations by learning patterns behind examples of confabulations, assuming that future questions preserve these patterns. But this assumption is often untrue in new situations or with confabulations that human overseers are unable to identify (compare Fig. 17 of ref. 24 ). As a strong supervised baseline, we compare to an embedding regression method inspired by ref. 24 which trains a logistic regression classifier to predict whether the model correctly answered a question on the basis of the final ‘embedding’ (hidden state) of the LLM. We also use the P (True) method 24 which looks at the probability with which an LLM predicts that the next token is ‘True’ when few-shot prompted to compare a main answer with ‘brainstormed’ alternatives.

Confabulations contribute substantially to incorrect answers given by language models. We show that semantic entropy can be used to predict many incorrect model answers and to improve question-answering accuracy by refusing to answer those questions the model is uncertain about. Corresponding to these two uses, we evaluate two main metrics. First, the widely used area under the receiver operating characteristic (AUROC) curve for the binary event that a given answer is incorrect. This measure captures both precision and recall and ranges from 0 to 1, with 1 representing a perfect classifier and 0.5 representing an un-informative classifier. We also show a new measure, the area under the ‘rejection accuracy’ curve (AURAC). This studies the case in which the confabulation detection score is used to refuse to answer the questions judged most likely to cause confabulations. Rejection accuracy is the accuracy of the answers of the model on the remaining questions and the area under this curve is a summary statistic over many thresholds (representative threshold accuracies are provided in Supplementary Material ). The AURAC captures the accuracy improvement which users would experience if semantic entropy was used to filter out questions causing the highest entropy.

Detecting confabulations in QA and math

In Fig. 2 , we show that both semantic entropy and its discrete approximation outperform our best baselines for sentence-length generations. These results are averaged across datasets and provide the actual scores on the held-out evaluation dataset. We report the raw average score across held-out evaluation datasets without standard error because the distributional characteristics are more a property of the models and datasets selected than the method. Consistency of relative results across different datasets is a stronger indicator of variation in this case.

figure 2

Semantic entropy outperforms leading baselines and naive entropy. AUROC (scored on the y -axes) measures how well methods predict LLM mistakes, which correlate with confabulations. AURAC (likewise scored on the y -axes) measures the performance improvement of a system that refuses to answer questions which are judged likely to cause confabulations. Results are an average over five datasets, with individual metrics provided in the Supplementary Information .

Semantic entropy greatly outperforms the naive estimation of uncertainty using entropy: computing the entropy of the length-normalized joint probability of the token sequences. Naive entropy estimation ignores the fact that token probabilities also express the uncertainty of the model over phrasings that do not change the meaning of an output.

Our methods also outperform the supervised embedding regression method both in- and out-of-distribution. In pale-yellow bars we show that embedding regression performance deteriorates when its training data do not match the deployment distribution—which mirrors the common real-world case in which there is a distribution shift between training and deployment 42 —the plotted value is the average metric for embedding regression trained on one of the four ‘off-distribution’ datasets for that evaluation. This is critical because reliable uncertainty is most important when the data distribution shifts. Semantic entropy also outperforms P (True) which is supervised ‘in-context’; that is, it is adapted to the deployment task with a few training examples provided in the LLM prompt itself. The discrete variant of semantic entropy performs similarly to our standard estimator, despite not requiring exact output probabilities.

Averaged across the 30 combinations of tasks and models we study, semantic entropy achieves the best AUROC value of 0.790 whereas naive entropy (0.691), P (True) (0.698) and the embedding regression baseline (0.687) lag behind it. Semantic entropy performs well consistently, with stable performance (between 0.78 and 0.81 AUROC) across the different model families (LLaMA, Falcon and Mistral) and scales (from 7B to 70B parameters) which we study (we report summary statistics for each dataset and model as before). Although semantic entropy outperforms the baselines across all model sizes, P (True) seems to improve with model size, suggesting that it might become more competitive for very capable honest models in settings that the model understands well (which are, however, not the most important cases to have good uncertainty). We use ten generations to compute entropy, selected using analysis in Supplementary Fig. 2 . Further results for short-phrase generations are described in Supplementary Figs. 7 – 10 .

The results in Fig. 2 offer a lower bound on the effectiveness of semantic entropy at detecting confabulations. These evaluations determine whether semantic entropy and baseline methods can detect when the answers of the model are incorrect (which we validate against human correctness evaluations in Supplementary Table 4 ). In addition to errors from confabulations (arbitrary incorrectness), this also includes other types of mistakes for which semantic entropy is not suited, such as consistent errors learned from the training data. The fact that methods such as embedding regression are able to spot other kinds of errors, not just confabulations, but still are outperformed by semantic entropy, suggests that confabulations are a principal category of errors for actual generations.

Examples of questions and answers from TriviaQA, SQuAD and BioASQ, for LLaMA 2 Chat 70B, are shown in Table 1 . These illustrate how only semantic entropy detects when the meaning is constant but the form varies (the first row of the table) whereas semantic entropy and naive entropy both correctly predict the presence of confabulations when the form and meaning vary together (second row) and predict the absence of confabulations when the form and meaning are both constant across several resampled generations (third row). In the final row, we give an example in which semantic entropy is erroneously high as a result of overly sensitive semantic clustering relative to the reference answer. Our clustering method distinguishes the answers which provide a precise date from those which only provide a year. For some contexts that would have been correct but in this context the distinction between the specific day and the year is probably irrelevant. This highlights the importance of context and judgement in clustering, especially in subtle cases, as well as the shortcomings of evaluating against fixed reference answers which do not capture the open-ended flexibility of conversational deployments of LLMs.

Detecting confabulations in biographies

Semantic entropy is most natural for sentences that express a single proposition but the idea of semantic equivalence is trickier to apply to longer passages which express many propositions which might only agree partially 43 . Nevertheless, we can use semantic entropy to detect confabulations in longer generations, such as entire paragraphs of text. To show this, we develop a dataset of biographical generations from GPT-4 (v.0613) for 21 individuals notable enough to have their own Wikipedia page but without extensive online biographies. From each biography generated by GPT-4, we automatically extract propositional factual claims about the individual (150 factual claims in total), which we manually label as true or false.

Applying semantic entropy to this problem is challenging. Naively, one might simply regenerate each sentence (conditioned on the text so far) and then compute semantic entropy over these regenerations. However, the resampled sentences often target different aspects of the biography: for example, one time describing family and the next time profession. This is analogous to the original problem semantic entropy was designed to resolve: the model is uncertain about the right ordering of facts, not about the facts themselves. To address this, we break down the entire paragraph into factual claims and reconstruct questions which might have been answered by those claims. Only then do we apply semantic entropy (Fig. 1 ) by generating three new answers to each question (selected with analysis in Supplementary Figs. 3 and 4 ) and computing the semantic entropy over those generations plus the original factual claim. We aggregate these by averaging the semantic entropy over all the questions to get an uncertainty score for each proposition, which we use to detect confabulations. Unaggregated results are shown in Supplementary Figs. 5 and 6 .

As GPT-4 did not allow access to the probability of the generation at the time of writing, we use a discrete variant of semantic entropy which makes the further approximation that we can infer a discrete empirical distribution over semantic meaning clusters from only the generations ( Methods ). This allows us to compute semantic entropy using only the black-box outputs of an LLM. However, we were unable to compute the naive entropy baseline, the standard semantic entropy estimator or the embedding regression baseline for GPT-4 without output probabilities and embeddings.

In Fig. 3 we show that the discrete variant of semantic entropy effectively detects confabulations on this dataset. Its AUROC and AURAC are higher than either a simple ‘self-check’ baseline—which just asks the LLM whether the factoid is likely to be true—or a variant of P (True) which has been adapted to work for the paragraph-length setting. Discrete semantic entropy has better rejection accuracy performance until 20% of the questions have been rejected at which point P (True) has a narrow edge. This indicates that the questions predicted to cause confabulations are indeed more likely to be wrong.

figure 3

The discrete variant of our semantic entropy estimator outperforms baselines both when measured by AUROC and AURAC metrics (scored on the y -axis). The AUROC and AURAC are substantially higher than for both baselines. At above 80% of questions being answered, semantic entropy has the highest accuracy. Only when the top 20% of answers judged most likely to be confabulations are rejected does the answer accuracy on the remainder for the P (True) baseline exceed semantic entropy.

Our probabilistic approach, accounting for semantic equivalence, detects an important class of hallucinations: those that are caused by a lack of LLM knowledge. These are a substantial portion of the failures at present and will continue even as models grow in capabilities because situations and cases that humans cannot reliably supervise will persist. Confabulations are a particularly noteworthy failure mode for question answering but appear in other domains too. Semantic entropy needs no previous domain knowledge and we expect that algorithmic adaptations to other problems will allow similar advances in, for example, abstractive summarization. In addition, extensions to alternative input variations such as rephrasing or counterfactual scenarios would allow a similar method to act as a form of cross-examination 44 for scalable oversight through debate 45 .

The success of semantic entropy at detecting errors suggests that LLMs are even better at “knowing what they don’t know” than was argued by ref. 24 —they just don’t know they know what they don’t know. Our method explicitly does not directly address situations in which LLMs are confidently wrong because they have been trained with objectives that systematically produce dangerous behaviour, cause systematic reasoning errors or are systematically misleading the user. We believe that these represent different underlying mechanisms—despite similar ‘symptoms’—and need to be handled separately.

One exciting aspect of our approach is the way it makes use of classical probabilistic machine learning methods and adapts them to the unique properties of modern LLMs and free-form language generation. We hope to inspire a fruitful exchange of well-studied methods and emerging new problems by highlighting the importance of meaning when addressing language-based machine learning problems.

Semantic entropy as a strategy for overcoming confabulation builds on probabilistic tools for uncertainty estimation. It can be applied directly to any LLM or similar foundation model without requiring any modifications to the architecture. Our ‘discrete’ variant of semantic uncertainty can be applied even when the predicted probabilities for the generations are not available, for example, because access to the internals of the model is limited.

In this section we introduce background on probabilistic methods and uncertainty in machine learning, discuss how it applies to language models and then discuss our contribution, semantic entropy, in detail.

Uncertainty and machine learning

We aim to detect confabulations in LLMs, using the principle that the model will be uncertain about generations for which its output is going to be arbitrary.

One measure of uncertainty is the predictive entropy of the output distribution, which measures the information one has about the output given the input 25 . The predictive entropy (PE) for an input sentence x is the conditional entropy ( H ) of the output random variable Y with realization y given x ,

A low predictive entropy indicates an output distribution which is heavily concentrated whereas a high predictive entropy indicates that many possible outputs are similarly likely.

Aleatoric and epistemic uncertainty

We do not distinguish between aleatoric and epistemic uncertainty in our analysis. Researchers sometimes separate aleatoric uncertainty (uncertainty in the underlying data distribution) from epistemic uncertainty (caused by having only limited information) 46 . Further advances in uncertainty estimation which separate these kinds of uncertainty would enhance the potential for our semantic uncertainty approach by allowing extensions beyond entropy.

Joint probabilities of sequences of tokens

Generative LLMs produce strings of text by selecting tokens in sequence. Each token is a wordpiece that often represents three or four characters (though especially common sequences and important words such as numbers typically get their own token). To compute entropies, we need access to the probabilities the LLM assigns to the generated sequence of tokens. The probability of the entire sequence, s , conditioned on the context, x , is the product of the conditional probabilities of new tokens given past tokens, whose resulting log-probability is \(\log P({\bf{s}}| {\boldsymbol{x}})={\sum }_{i}\log P({s}_{i}| {{\bf{s}}}_{ < i},{\boldsymbol{x}})\) , where s i is the i th output token and s < i denotes the set of previous tokens.

Length normalization

When comparing the log-probabilities of generated sequences, we use ‘length normalization’, that is, we use an arithmetic mean log-probability, \(\frac{1}{N}{\sum }_{i}^{N}\log P({s}_{i}| {{\bf{s}}}_{ < i},{\boldsymbol{x}})\) , instead of the sum. In expectation, longer sequences have lower joint likelihoods because of the conditional independence of the token probabilities 47 . The joint likelihood of a sequence of length N shrinks exponentially in N . Its negative log-probability therefore grows linearly in N , so longer sentences tend to contribute more to entropy. We therefore interpret length-normalizing the log-probabilities when estimating the entropy as asserting that the expected uncertainty of generations is independent of sentence length. Length normalization has some empirical success 48 , including in our own preliminary experiments, but little theoretical justification in the literature.

Principles of semantic uncertainty

If we naively calculate the predictive entropy directly from the probabilities of the generated sequence of tokens, we conflate the uncertainty of the model over the meaning of its answer with the uncertainty over the exact tokens used to express that meaning. For example, even if the model is confident in the meaning of a generation, there are still usually many different ways for phrasing that generation without changing its meaning. For the purposes of detecting confabulations, the uncertainty of the LLM over meanings is more important than the uncertainty over the exact tokens used to express those meanings.

Our semantic uncertainty method therefore seeks to estimate only the uncertainty the LLM has over the meaning of its generation, not the choice of words. To do this, we introduce an algorithm that clusters model generations by meaning and subsequently calculates semantic uncertainty. At a high level this involves three steps:

Generation: sample output sequences of tokens from the predictive distribution of a LLM given a context x .

Clustering: cluster sequences by their meaning using our clustering algorithm based on bidirectional entailment.

Entropy estimation: estimate semantic entropy by summing probabilities of sequences that share a meaning following equation ( 2 ) and compute their entropy.

Generating a set of answers from the model

Given some context x as input to the LLM, we sample M sequences, { s (1) , …,  s ( M ) } and record their token probabilities, { P ( s (1) ∣ x ), …,  P ( s ( M ) ∣ x )}. We sample all our generations from a single model, varying only the random seed used for sampling from the token probabilities. We do not observe the method to be particularly sensitive to details of the sampling scheme. In our implementation, we sample at temperature 1 using nucleus sampling ( P  = 0.9) (ref. 49 ) and top- K sampling ( K  = 50) (ref. 50 ). We also sample a single generation at low temperature (0.1) as an estimate of the ‘best generation’ of the model to the context, which we use to assess the accuracy of the model. (A lower sampling temperature increases the probability of sampling the most likely tokens).

Clustering by semantic equivalence

To estimate semantic entropy we need to cluster generated outputs from the model into groups of outputs that mean the same thing as each other.

This can be described using ‘semantic equivalence’ which is the relation that holds between two sentences when they mean the same thing. We can formalize semantic equivalence mathematically. Let the space of tokens in a language be \({\mathcal{T}}\) . The space of all possible sequences of tokens of length N is then \({{\mathcal{S}}}_{N}\equiv {{\mathcal{T}}}^{N}\) . Note that N can be made arbitrarily large to accommodate whatever size of sentence one can imagine and one of the tokens can be a ‘padding’ token which occurs with certainty for each token after the end-of-sequence token. For some sentence \({\bf{s}}\in {{\mathcal{S}}}_{N}\) , composed of a sequence of tokens, \({s}_{i}\in {\mathcal{T}}\) , there is an associated meaning. Theories of meaning are contested 51 . However, for specific models and deployment contexts many considerations can be set aside. Care should be taken comparing very different models and contexts.

Let us introduce a semantic equivalence relation, E (  ⋅  ,  ⋅  ), which holds for any two sentences that mean the same thing—we will operationalize this presently. Recall that an equivalence relation is any reflexive, symmetric and transitive relation and that any equivalence relation on a set corresponds to a set of equivalence classes. Each semantic equivalence class captures outputs that can be considered to express the same meaning. That is, for the space of semantic equivalence classes \({\mathcal{C}}\) the sentences in the set \(c\in {\mathcal{C}}\) can be regarded in many settings as expressing a similar meaning such that \(\forall {\bf{s}},{{\bf{s}}}^{{\prime} }\in c:E({\bf{s}},{{\bf{s}}}^{{\prime} })\) . So we can build up these classes of semantically equivalent sentences by checking if new sentences share a meaning with any sentences we have already clustered and, if so, adding them into that class.

We operationalize E (  ⋅  ,  ⋅  ) using the idea of bidirectional entailment, which has a long history in linguistics 52 and natural language processing 28 , 53 , 54 . A sequence, s , means the same thing as a second sequence, s ′, only if the sequences entail (that is, logically imply) each other. For example, ‘The capital of France is Paris’ entails ‘Paris is the capital of France’ and vice versa because they mean the same thing. (See later for a discussion of soft equivalence and cases in which bidirectional entailment does not guarantee equivalent meanings).

Importantly, we require that the sequences mean the same thing with respect to the context—key meaning is sometimes contained in the context. For example, ‘Paris’ does not entail ‘The capital of France is Paris’ because ‘Paris’ is not a declarative sentence without context. But in the context of the question ‘What is the capital of France?’, the one-word answer does entail the longer answer.

Detecting entailment has been the object of study of a great deal of research in NLI 55 . We rely on language models to predict entailment, such as DeBERTa-Large-MNLI 56 , which has been trained to predict entailment, or general-purpose LLMs such as GPT-3.5 (ref. 57 ), which can predict entailment given suitable prompts.

We then cluster sentences according to whether they bidirectionally entail each other using the algorithm presented in Extended Data Fig. 1 . Note that, to check if a sequence should be added to an existing cluster, it is sufficient to check if the sequence bidirectionally entails any of the existing sequences in that cluster (we arbitrarily pick the first one), given the transitivity of semantic equivalence. If a sequence does not share meaning with any existing cluster, we assign it its own cluster.

Computing the semantic entropy

Having determined the classes of generated sequences that mean the same thing, we can estimate the likelihood that a sequence generated by the LLM belongs to a given class by computing the sum of the probabilities of all the possible sequences of tokens which can be considered to express the same meaning as

Formally, this treats the output as a random variable whose event-space is the space of all possible meaning-classes, C , a sub- σ -algebra of the standard event-space S . We can then estimate the semantic entropy (SE) as the entropy over the meaning-distribution,

There is a complication which prevents direct computation: we do not have access to every possible meaning-class c . Instead, we can only sample c from the sequence-generating distribution induced by the model. To handle this, we estimate the expectation in equation ( 3 ) using a Rao–Blackwellized Monte Carlo integration over the semantic equivalence classes C ,

where \(P({C}_{i}| {\boldsymbol{x}})=\frac{P({c}_{i}| {\boldsymbol{x}})}{{\sum }_{c}P(c| {\boldsymbol{x}})}\) estimates a categorical distribution over the cluster meanings, that is, ∑ i P ( C i ∣ x ) = 1. Without this normalization step cluster ‘probabilities’ could exceed one because of length normalization, resulting in degeneracies. Equation ( 5 ) is the estimator giving our main method that we refer to as semantic entropy throughout the text.

For scenarios in which the sequence probabilities are not available, we propose a variant of semantic entropy which we call ‘discrete’ semantic entropy. Discrete semantic entropy approximates P ( C i ∣ x ) directly from the number of generations in each cluster, disregarding the token probabilities. That is, we approximate P ( C i ∣ x ) as \({\sum }_{1}^{M}\frac{{I}_{c={C}_{i}}}{M}\) , the proportion of all the sampled answers which belong to that cluster. Effectively, this just assumes that each output that was actually generated was equally probable—estimating the underlying distribution as the categorical empirical distribution. In the limit of M the estimator converges to equation ( 5 ) by the law of large numbers. We find that discrete semantic entropy results in similar performance empirically.

We provide a worked example of the computation of semantic entropy in Supplementary Note  1 .

Semantic entropy is designed to detect confabulations, that is, model outputs with arbitrary meaning. In our experiments, we use semantic uncertainty to predict model accuracy, demonstrating that confabulations make up a notable fraction of model mistakes. We further show that semantic uncertainty can be used to improve model accuracy by refusing to answer questions when semantic uncertainty is high. Last, semantic uncertainty can be used to give users a way to know when model generations are probably unreliable.

We use the datasets BioASQ 34 , SQuAD 33 , TriviaQA 32 , SVAMP 37 and NQ-Open 35 . BioASQ is a life-sciences question-answering dataset based on the annual challenge of the same name. The specific dataset we use is based on the QA dataset from Task B of the 2023 BioASQ challenge (11B). SQuAD is a reading comprehension dataset whose context passages are drawn from Wikipedia and for which the answers to questions can be found in these passages. We use SQuAD 1.1 which excludes the unanswerable questions added in v.2.0 that are deliberately constructed to induce mistakes so they do not in practice cause confabulations to occur. TriviaQA is a trivia question-answering dataset. SVAMP is a word-problem maths dataset containing elementary-school mathematical reasoning tasks. NQ-Open is a dataset of realistic questions aggregated from Google Search which have been chosen to be answerable without reference to a source text. For each dataset, we use 400 train examples and 400 test examples randomly sampled from the original larger dataset. Note that only some of the methods require training, for example semantic entropy does not use the training data. If the datasets themselves are already split into train and test (or validation) samples, we sample our examples from within the corresponding split.

All these datasets are free-form, rather than multiple choice, because this better captures the opportunities created by LLMs to produce free-form sentences as answers. We refer to this default scenario as our ‘sentence-length’ experiments. In Supplementary Note  7 , we also present results for confabulation detection in a ‘short-phrase’ scenario, in which we constrain model answers on these datasets to be as concise as possible.

To make the problems more difficult and induce confabulations, we do not provide the context passages for any of the datasets. When the context passages are provided, the accuracy rate is too high for these datasets for the latest generations of models to meaningfully study confabulations.

For sentence-length generations we use: Falcon 39 Instruct (7B and 40B), LLaMA 2 Chat 38 (7B, 13B and 70B) and Mistral 40 Instruct (7B).

In addition to reporting results for semantic entropy, discrete semantic entropy and naive entropy, we consider two strong baselines.

Embedding regression is a supervised baseline inspired by the P (IK) method 24 . In that paper, the authors fine-tune their proprietary LLM on a dataset of questions to predict whether the model would have been correct. This requires access to a dataset of ground-truth answers to the questions. Rather than fine-tuning the entire LLM in this way, we simply take the final hidden units and train a logistic regression classifier to make the same prediction. By contrast to their method, this is much simpler because it does not require fine-tuning the entire language model, as well as being more reproducible because the solution to the logistic regression optimization problem is not as seed-dependent as the fine-tuning procedure. As expected, this supervised approach performs well in-distribution but fails when the distribution of questions is different from that on which the classifier is trained.

The second baseline we consider is the P (True) method 24 , in which the model first samples M answers (identically to our semantic entropy approach) and then is prompted with the list of all answers generated followed by the highest probability answer and a question whether this answer is “(a) True” or “(b) False”. The confidence score is then taken to be the probability with which the LLM responds with ‘a’ to the multiple-choice question. The performance of this method is boosted with a few-shot prompt, in which up to 20 examples from the training set are randomly chosen, filled in as above, but then provided with the actual ground truth of whether the proposed answer was true or false. In this way, the method can be considered as supervised ‘in-context’ because it makes use of some ground-truth training labels but can be used without retraining the model. Because of context-size constraints, this method cannot fit a full 20 few-shot examples in the context when input questions are long or large numbers of generations are used. As a result, we sometimes have to reduce the number of few-shot examples to suit the context size and we note this in the  Supplementary Material .

Entailment estimator

Any NLI classification system could be used for our bidirectional entailment clustering algorithm. We consider two different kinds of entailment detector.

One option is to use an instruction-tuned LLM such as LLaMA 2, GPT-3.5 (Turbo 1106) or GPT-4 to predict entailment between generations. We use the following prompt:

We are evaluating answers to the question {question} Here are two possible answers: Possible Answer 1: {text1} Possible Answer 2: {text2} Does Possible Answer 1 semantically entail Possible Answer 2? Respond with entailment, contradiction, or neutral.

Alternatively, we consider using a language model trained for entailment prediction, specifically the DeBERTa-large model 56 fine-tuned on the NLI dataset MNLI 58 . This builds on past work towards paraphrase identification based on embedding similarity 59 , 60 and BERT-style models 61 , 62 . We template more simply, checking if DeBERTa predicts entailment between the concatenation of the question and one answer and the concatenation of the question and another answer. Note that DeBERTa-large is a relatively lightweight model with only 1.5B parameters which is much less powerful than most of the LLMs under study.

In Supplementary Note 2 , we carefully evaluate the benefits and drawbacks of these methods for entailment prediction. We settle on using GPT-3.5 with the above prompt, as its entailment predictions agree well with human raters and lead to good confabulation detection performance.

In Supplementary Note  3 , we provide a discussion of the computational cost and choosing the number of generations for reliable clustering.

Prompting templates

We use a simple generation template for all sentence-length answer datasets:

Answer the following question in a single brief but complete sentence. Question: {question} Answer:

Metrics and accuracy measurements

We use three main metrics to evaluate our method: AUROC, rejection accuracy and AURAC. Each of these is grounded in an automated factuality estimation measurement relative to the reference answers provided by the datasets that we use.

AUROC, rejection accuracy and AURAC

First, we use the AUROC curve, which measures the reliability of a classifier accounting for both precision and recall. The AUROC can be interpreted as the probability that a randomly chosen correct answer has been assigned a higher confidence score than a randomly chosen incorrect answer. For a perfect classifier, this is 1.

Second, we compute the ‘rejection accuracy at X %’, which is the question-answering accuracy of the model on the most-confident X % of the inputs as identified by the respective uncertainty method. If an uncertainty method works well, predictions on the confident subset should be more accurate than predictions on the excluded subset and the rejection accuracy should increase as we reject more inputs.

To summarize this statistic we compute the AURAC—the total area enclosed by the accuracies at all cut-off percentages X %. This should increase towards 1 as given uncertainty method becomes more accurate and better at detecting likely-inaccurate responses but it is more sensitive to the overall accuracy of the model than the AUROC metric.

In Supplementary Note  5 , we provide the unaggregated rejection accuracies for sentence-length generations.

Assessing accuracy

For the short-phrase-length generation setting presented in Supplementary Note  7 , we simply assess the accuracy of the generations by checking if the F1 score of the commonly used SQuAD metric exceeds 0.5. There are limitations to such simple scoring rules 63 but this method is widely used in practice and its error is comparatively small on these standard datasets.

For our default scenario, the longer sentence-length generations, this measure fails, as the overlap between the short reference answer and our long model answer is invariably too small. For sentence-length generations, we therefore automatically determine whether an answer to the question is correct or incorrect by using GPT-4 to compare the given answer to the reference answer. We use the template:

We are assessing the quality of answers to the following question: {question} The expected answer is: {reference answer} The proposed answer is: {predicted answer} Within the context of the question, does the proposed answer mean the same as the expected answer? Respond only with yes or no.

We make a small modification for datasets with several reference answers: line two becomes “The following are expected answers to this question:” and the final line asks “does the proposed answer mean the same as any of the expected answers?”.

In Supplementary Note 6 , we check the quality of our automated ground-truth evaluations against human judgement by hand. We find that GPT-4 gives the best results for determining model accuracy and thus use it in all our sentence-length experiments.

In this section we describe the application of semantic entropy to confabulation detection in longer model generations, specifically paragraph-length biographies.

We introduce a biography-generation dataset—FactualBio—available alongside this paper. FactualBio is a collection of biographies of individuals who are notable enough to have Wikipedia pages but not notable enough to have large amounts of detailed coverage, generated by GPT-4 (v.0613). To generate the dataset, we randomly sampled 21 individuals from the WikiBio dataset 64 . For each biography, we generated a list of factual claims contained in each biography using GPT-4, with 150 total factual claims (the total number is only coincidentally a round number). For each of these factual claims, we manually determined whether the claim was correct or incorrect. Out of 150 claims, 45 were incorrect. As before, we apply confabulation detection to detect incorrect model predictions, even though there may be model errors which are not confabulations.

Prompting and generation

Given a paragraph-length piece of LLM-generated text, we apply the following sequence of steps:

Automatically decompose the paragraph into specific factual claims using an LLM (not necessarily the same as the original).

For each factual claim, use an LLM to automatically construct Q questions which might have produced that claim.

For each question, prompt the original LLM to generate M answers.

For each question, compute the semantic entropy of the answers, including the original factual claim.

Average the semantic entropies over the questions to arrive at a score for the original factual claim.

We pursue this slightly indirect way of generating answers because we find that simply resampling each sentence creates variation unrelated to the uncertainty of the model about the factual claim, such as differences in paragraph structure.

We decompose the paragraph into factual claims using the following prompt:

Please list the specific factual propositions included in the answer above. Be complete and do not leave any factual claims out. Provide each claim as a separate sentence in a separate bullet point.

We found that we agreed with the decompositions in all cases in the dataset.

We then generate six questions for each of the facts from the decomposition. We generate these questions by prompting the model twice with the following:

Following this text: {text so far} You see the sentence: {proposition} Generate a list of three questions, that might have generated the sentence in the context of the preceding original text, as well as their answers. Please do not use specific facts that appear in the follow-up sentence when formulating the question. Make the questions and answers diverse. Avoid yes-no questions. The answers should not be a full sentence and as short as possible, e.g. only a name, place, or thing. Use the format “1. {question} – {answer}”.

These questions are not necessarily well-targeted and the difficulty of this step is the main source of errors in the procedure. We generate three questions with each prompt, as this encourages diversity of the questions, each question targeting a different aspect of the fact. However, we observed that the generated questions will sometimes miss obvious aspects of the fact. Executing the above prompt twice (for a total of six questions) can improve coverage. We also ask for brief answers because the current version of GPT-4 tends to give long, convoluted and highly hedged answers unless explicitly told not to.

Then, for each question, we generate three new answers using the following prompt:

We are writing an answer to the question “{user question}”. So far we have written: {text so far} The next sentence should be the answer to the following question: {question} Please answer this question. Do not answer in a full sentence. Answer with as few words as possible, e.g. only a name, place, or thing.

We then compute the semantic entropy over these answers plus the original factual claim. Including the original fact ensures that the estimator remains grounded in the original claim and helps detect situations in which the question has been interpreted completely differently from the original context. We make a small modification to handle the fact that GPT-4 generations often include refusals to answer questions. These refusals were not something we commonly observe in our experiments with LLaMA 2, Falcon or Mistral models. If more than half of the answers include one of the strings ‘not available’, ‘not provided’, ‘unknown’ or ‘unclear’ then we treat the semantic uncertainty as maximal.

We then average the semantic entropies for each question corresponding to the factual claim to get an entropy for this factual claim.

Despite the extra assumptions and complexity, we find that this method greatly outperforms the baselines.

To compute semantic entailment between the original claim and regenerated answers, we rely on the DeBERTa entailment prediction model as we find empirically that DeBERTa predictions result in higher train-set AUROC than other methods. Because DeBERTa has slightly lower recall than GPT-3.5/4, we use a modified set-up for which we say the answers mean the same as each other if at least one of them entails the other and neither is seen to contradict the other—a kind of ‘non-defeating’ bidirectional entailment check rather than true bidirectional entailment. The good performance of DeBERTa in this scenario is not surprising as both factual claims and regenerated answers are relatively short. We refer to Supplementary Notes 2 and 3 for ablations and experiments regarding our choice of entailment estimator for paragraph-length generations.

We implement two baselines. First, we implement a variant of the P (True) method, which is adapted to the new setting. For each factoid, we generate a question with answers in the same way as for semantic entropy. We then use the following prompt:

Question: {question} Here are some brainstormed ideas: {list of regenerated answers} Possible answer: {original answer} Is the possible answer true? Respond with “yes” or “no”.

As we cannot access the probabilities GPT-4 assigns to predicting ‘yes’ and ‘no’ as the next token, we approximate this using Monte Carlo samples. Concretely, we execute the above prompt ten times (at temperature 1) and then take the fraction of answers which was ‘yes’ as our unbiased Monte Carlo estimate of the token probability GPT-4 assigns to ‘yes’.

As a second, simpler, baseline we check if the model thinks the answer is true. We simply ask:

Following this text: {text so far} You see this statement: {proposition} Is it likely that the statement is true? Respond with ‘yes’ or ‘no’.

It is interesting that this method ought to perform very well if we think that the model has good ‘self-knowledge’ (that is, if “models mostly know what they don’t know” 24 ) but in fact semantic entropy is much better at detecting confabulations.

Data availability

The data used for the short-phrase and sentence-length generations are publicly available and the released code details how to access it. We release a public version of the FactualBio dataset as part of the code base for reproducing the paragraph-length experiments.

Code availability

We release all code used to produce the main experiments. The code for short-phrase and sentence-length experiments can be found at github.com/jlko/semantic_uncertainty and https://doi.org/10.5281/zenodo.10964366 (ref. 65 ). The code for paragraph-length experiments can be found at github.com/jlko/long_hallucinations and https://doi.org/10.5281/zenodo.10964366 (ref. 65 ).

GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

Gemini: a family of highly capable multimodal models. Preprint at https://arxiv.org/abs/2312.11805 (2023).

Xiao, Y. & Wang, W. Y. On hallucination and predictive uncertainty in conditional language generation. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics 2734–2744 (Association for Computational Linguistics, 2021).

Rohrbach, A., Hendricks, L. A., Burns, K., Darrell, T. & Saenko, K. Object hallucination in image captioning. In Proc. 2018 Conference on Empirical Methods in Natural Language Processing (eds Riloff, E., Chiang, D., Hockenmaier, J. & Tsujii, J.) 4035–4045 (Association for Computational Linguistics, 2018).

Weiser, B. Lawyer who used ChatGPT faces penalty for made up citations. The New York Times (8 Jun 2023).

Opdahl, A. L. et al. Trustworthy journalism through AI. Data Knowl. Eng . 146 , 102182 (2023).

Shen, Y. et al. ChatGPT and other large language models are double-edged swords. Radiology 307 , e230163 (2023).

Article   PubMed   Google Scholar  

Schulman, J. Reinforcement learning from human feedback: progress and challenges. Presented at the Berkeley EECS Colloquium. YouTube www.youtube.com/watch?v=hhiLw5Q_UFg (2023).

Ji, Z. et al. Survey of hallucination in natural language generation. ACM Comput. Surv. 55 , 248 (2023).

Maynez, J., Narayan, S., Bohnet, B. & McDonald, R. On faithfulness and factuality in abstractive summarization. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D., Chai, J., Schluter, N. & Tetreault, J.) 1906–1919 (Association for Computational Linguistics, 2020).

Filippova, K. Controlled hallucinations: learning to generate faithfully from noisy data. In Findings of the Association for Computational Linguistics: EMNLP 2020 (eds Webber, B., Cohn, T., He, Y. & Liu, Y.) 864–870 (Association for Computational Linguistics, 2020).

Berrios, G. Confabulations: a conceptual history. J. Hist. Neurosci. 7 , 225–241 (1998).

Article   CAS   PubMed   Google Scholar  

Lin, S., Hilton, J. & Evans, O. Teaching models to express their uncertainty in words. Transact. Mach. Learn. Res. (2022).

Evans, O. et al. Truthful AI: developing and governing AI that does not lie. Preprint at https://arxiv.org/abs/2110.06674 (2021).

Amodei, D. et al. Concrete problems in AI safety. Preprint at https://arxiv.org/abs/1606.06565 (2016).

Jiang, Z., Araki, J., Ding, H. & Neubig, G. How can we know when language models know? On the calibration of language models for question answering. Transact. Assoc. Comput. Linguist. 9 , 962–977 (2021).

Article   Google Scholar  

Desai, S. & Durrett, G. Calibration of pre-trained transformers. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B., Cohn, T., He, Y. & Liu, Y.) 295–302 (Association for Computational Linguistics, 2020).

Glushkova, T., Zerva, C., Rei, R. & Martins, A. F. Uncertainty-aware machine translation evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2021 (eds Moens, M-F., Huang, X., Specia, L. & Yih, S.) 3920–3938 (Association for Computational Linguistics, 2021).

Wang, Y., Beck, D., Baldwin, T. & Verspoor, K. Uncertainty estimation and reduction of pre-trained models for text regression. Transact. Assoc. Comput. Linguist. 10 , 680–696 (2022).

Baker, S. & Kanade, T. Hallucinating faces. In Proc. Fourth IEEE International Conference on Automatic Face and Gesture Recognition . 83–88 (IEEE, Catalogue no PR00580, 2002).

Eliot, L. AI ethics lucidly questioning this whole hallucinating AI popularized trend that has got to stop. Forbes Magazine (24 August 2022).

Shanahan, M. Talking about large language models. Commun. Assoc. Comp. Machinery 67 , 68–79 (2024).

MacKay, D. J. C. Information-based objective functions for active data selection. Neural Comput. 4 , 590–604 (1992).

Kadavath, S. et al. Language models (mostly) know what they know. Preprint at https://arxiv.org/abs/2207.05221 (2022).

Lindley, D. V. On a measure of the information provided by an experiment. Ann. Math. Stat. 27 , 986–1005 (1956).

Article   MathSciNet   Google Scholar  

Xiao, T. Z., Gomez, A. N. & Gal, Y. Wat zei je? Detecting out-of-distribution translations with variational transformers. In Workshop on Bayesian Deep Learning at the Conference on Neural Information Processing Systems (NeurIPS, Vancouver, 2019).

Christiano, P., Cotra, A. & Xu, M. Eliciting Latent Knowledge (Alignment Research Center, 2021); https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit .

Negri, M., Bentivogli, L., Mehdad, Y., Giampiccolo, D. & Marchetti, A. Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora. In Proc. 2011 Conference on Empirical Methods in Natural Language Processing 670–679 (Association for Computational Linguistics, 2011).

Honovich, O. et al. TRUE: Re-evaluating factual consistency evaluation. In Proc. Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering 161–175 (Association for Computational Linguistics, 2022).

Falke, T., Ribeiro, L. F. R., Utama, P. A., Dagan, I. & Gurevych, I. Ranking generated summaries by correctness: an interesting but challenging application for natural language inference. In Proc. 57th Annual Meeting of the Association for Computational Linguistics 2214–2220 (Association for Computational Linguistics, 2019).

Laban, P., Schnabel, T., Bennett, P. N. & Hearst, M. A. SummaC: re-visiting NLI-based models for inconsistency detection in summarization. Trans. Assoc. Comput. Linguist. 10 , 163–177 (2022).

Joshi, M., Choi, E., Weld, D. S. & Zettlemoyer, L. TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. In Proc. 55th Annual Meeting of the Association for Computational Linguistics 1601–1611 (Association for Computational Linguistics. 2017).

Rajpurkar, P., Zhang, J., Lopyrev, K. & Liang, P. SQuAD: 100,000+ questions for machine compression of text. In Proc. 2016 Conference on Empirical Methods in Natural Language Processing (eds Su, J., Duh, K. & Carreras, X.) 2383–2392 (Association for Computational Linguistics, 2016).

Tsatsaronis, G. et al. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16 , 138 (2015).

Article   PubMed   PubMed Central   Google Scholar  

Lee, K., Chang, M.-W. & Toutanova, K. Latent retrieval for weakly supervised open domain question answering. In Proc. 57th Annual Meeting of the Association for Computational Linguistics 6086–6096 (Association for Computational Linguistics, 2019).

Kwiatkowski, T. et al. Natural questions: a benchmark for question answering research. Transact. Assoc. Comput. Linguist. 7 , 452–466 (2019).

Patel, A., Bhattamishra, S. & Goyal, N. Are NLP models really able to solve simple math word problems? In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Toutanova, K. et al.) 2080–2094 (Assoc. Comp. Linguistics, 2021).

Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).

Penedo, G. et al. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. In Proc. 36th Conference on Neural Information Processing Systems (eds Oh, A. et al.) 79155–79172 (Curran Associates, 2023)

Jiang, A. Q. et al. Mistral 7B. Preprint at https://arxiv.org/abs/2310.06825 (2023).

Manakul, P., Liusie, A. & Gales, M. J. F. SelfCheckGPT: Zero-Resource Black-Box hallucination detection for generative large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023 (eds Bouamor, H., Pino, J. & Bali, K.) 9004–9017 (Assoc. Comp. Linguistics, 2023).

Mukhoti, J., Kirsch, A., van Amersfoort, J., Torr, P. H. & Gal, Y. Deep deterministic uncertainty: a new simple baseline. In IEEE/CVF Conference on Computer Vision and Pattern Recognition 24384–24394 (Computer Vision Foundation, 2023).

Schuster, T., Chen, S., Buthpitiya, S., Fabrikant, A. & Metzler, D. Stretching sentence-pair NLI models to reason over long documents and clusters. In Findings of the Association for Computational Linguistics: EMNLP 2022 (eds Goldberg, Y. et al.) 394–412 (Association for Computational Linguistics, 2022).

Barnes, B. & Christiano, P. Progress on AI Safety via Debate. AI Alignment Forum www.alignmentforum.org/posts/Br4xDbYu4Frwrb64a/writeup-progress-on-ai-safety-via-debate-1 (2020).

Irving, G., Christiano, P. & Amodei, D. AI safety via debate. Preprint at https://arxiv.org/abs/1805.00899 (2018).

Der Kiureghian, A. & Ditlevsen, O. Aleatory or epistemic? Does it matter? Struct. Saf. 31 , 105–112 (2009).

Malinin, A. & Gales, M. Uncertainty estimation in autoregressive structured prediction. In Proceedings of the International Conference on Learning Representations https://openreview.net/forum?id=jN5y-zb5Q7m (2021).

Murray, K. & Chiang, D. Correcting length bias in neural machine translation. In Proc. Third Conference on Machine Translation (eds Bojar, O. et al.) 212–223 (Assoc. Comp. Linguistics, 2018).

Holtzman, A., Buys, J., Du, L., Forbes, M. & Choi, Y. The curious case of neural text degeneration. In Proceedings of the International Conference on Learning Representations https://openreview.net/forum?id=rygGQyrFvH (2020).

Fan, A., Lewis, M. & Dauphin, Y. Hierarchical neural story generation. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (eds Gurevych, I. & Miyao, Y.) 889–898 (Association for Computational Linguistics, 2018).

Speaks, J. in The Stanford Encyclopedia of Philosophy (ed. Zalta, E. N.) (Metaphysics Research Lab, Stanford Univ., 2021).

Culicover, P. W. Paraphrase generation and information retrieval from stored text. Mech. Transl. Comput. Linguist. 11 , 78–88 (1968).

Google Scholar  

Padó, S., Cer, D., Galley, M., Jurafsky, D. & Manning, C. D. Measuring machine translation quality as semantic equivalence: a metric based on entailment features. Mach. Transl. 23 , 181–193 (2009).

Androutsopoulos, I. & Malakasiotis, P. A survey of paraphrasing and textual entailment methods. J. Artif. Intell. Res. 38 , 135–187 (2010).

MacCartney, B. Natural Language Inference (Stanford Univ., 2009).

He, P., Liu, X., Gao, J. & Chen, W. Deberta: decoding-enhanced BERT with disentangled attention. In International Conference on Learning Representations https://openreview.net/forum?id=XPZIaotutsD (2021).

Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33 , 1877–1901 (2020).

Williams, A., Nangia, N. & Bowman, S. R. A broad-coverage challenge corpus for sentence understanding through inference. In Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Walker, M. et al.) 1112–1122 (Assoc. Comp. Linguistics, 2018).

Yu, L., Hermann, K. M., Blunsom, P. & Pulman, S. Deep learning for answer sentence selection. Preprint at https://arxiv.org/abs/1412.1632 (2014).

Socher, R., Huang, E., Pennin, J., Manning, C. D. & Ng, A. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Proceedings of the 24th Conference on Neural Information Processing Systems (eds Shawe-Taylor, J. et al.) (2011)

He, R., Ravula, A., Kanagal, B. & Ainslie, J. Realformer: Transformer likes residual attention. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (eds Zhong, C., et al.) 929–943 (Assoc. Comp. Linguistics, 2021).

Tay, Y. et al. Charformer: fast character transformers via gradient-based subword tokenization. In Proceedings of the International Conference on Learning Representations https://openreview.net/forum?id=JtBRnrlOEFN (2022).

Kane, H., Kocyigit, Y., Abdalla, A., Ajanoh, P. & Coulibali, M. Towards neural similarity evaluators. In Workshop on Document Intelligence at the 32nd conference on Neural Information Processing (2019).

Lebret, R., Grangier, D. & Auli, M. Neural text generation from structured data with application to the biography domain. In Proc. 2016 Conference on Empirical Methods in Natural Language Processing (eds Su, J. et al.) 1203–1213 (Association for Computational Linguistics, 2016).

Kossen, J., jlko/semantic_uncertainty: Initial release v.1.0.0. Zenodo https://doi.org/10.5281/zenodo.10964366 (2024).

Download references

Acknowledgements

We thank G. Irving, K. Perlin, J. Richens, L. Rimell and M. Turpin for their comments or discussion related to this work. We thank K. Handa for his help with the human evaluation of our automated accuracy assessment. We thank F. Bickford Smith and L. Melo for their code review. Y.G. is supported by a Turing AI Fellowship funded by the UK government’s Office for AI, through UK Research and Innovation (grant reference EP/V030302/1), and delivered by the Alan Turing Institute.

Author information

These authors contributed equally: Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn

Authors and Affiliations

OATML, Department of Computer Science, University of Oxford, Oxford, UK

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn & Yarin Gal

You can also search for this author in PubMed   Google Scholar

Contributions

S.F. led the work from conception to completion and proposed using bidirectional entailment to cluster generations as a way of computing entropy in LLMs. He wrote the main text, most of the Methods and Supplementary Information and prepared most of the figures. J.K. improved the mathematical formalization of semantic entropy; led the extension of semantic entropy to sentence- and paragraph-length generations; wrote the code for, and carried out, all the experiments and evaluations; wrote much of the Methods and Supplementary Information and prepared drafts of many figures; and gave critical feedback on the main text. L.K. developed the initial mathematical formalization of semantic entropy; wrote code for, and carried out, the initial experiments around semantic entropy and its variants which demonstrated the promise of the idea and helped narrow down possible research avenues to explore; and gave critical feedback on the main text. Y.G. ideated the project, proposing the idea to differentiate semantic and syntactic diversity as a tool for detecting hallucinations, provided high-level guidance on the research and gave critical feedback on the main text; he runs the research laboratory in which the work was carried out.

Corresponding author

Correspondence to Sebastian Farquhar .

Ethics declarations

Competing interests.

S.F. is currently employed by Google DeepMind and L.K. by OpenAI. For both, this paper was written under their University of Oxford affiliation. The remaining authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Mirella Lapata and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 algorithm outline for bidirectional entailment clustering..

Given a set of outputs in response to a context, the bidirectional entailment answer returns a set of sets of outputs which have been classified as sharing a meaning.

Supplementary information

Supplementary information.

Supplementary Notes 1–7, Figs. 1–10, Tables 1–4 and references. Includes, worked example for semantic entropy calculation, discussion of limitations and computational cost of entailment clustering, ablation of entailment prediction and clustering methods, discussion of automated accuracy assessment, unaggregated results for sentence-length generations and further results for short-phrase generations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Farquhar, S., Kossen, J., Kuhn, L. et al. Detecting hallucinations in large language models using semantic entropy. Nature 630 , 625–630 (2024). https://doi.org/10.1038/s41586-024-07421-0

Download citation

Received : 17 July 2023

Accepted : 12 April 2024

Published : 19 June 2024

Issue Date : 20 June 2024

DOI : https://doi.org/10.1038/s41586-024-07421-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

research paper on dijkstra algorithm

IMAGES

  1. Dijkstra’s algorithm

    research paper on dijkstra algorithm

  2. Dijkstra’s Shortest Path Algorithm

    research paper on dijkstra algorithm

  3. Flowchart of the Dijkstra's algorithm [37]

    research paper on dijkstra algorithm

  4. C++中使用STL的Kraskar最小生成树-yiteyi-C++库

    research paper on dijkstra algorithm

  5. Dijkstra Algorithm

    research paper on dijkstra algorithm

  6. What is Dijkstra’s Algorithm?

    research paper on dijkstra algorithm

VIDEO

  1. ACTIVIDAD 3

  2. Dijkstra's Algorithm in Operational research Important Example in detail

  3. Dijkstra's Algorithm(Shortest Route Algorithm) Example#4 (Lecture:4)

  4. Dijkstra Algorithm

  5. Graph Theory: 22. Dijkstra Algorithm Examples

  6. Dijkstra Shortest path Algorithm

COMMENTS

  1. (PDF) Understanding Dijkstra Algorithm

    Dijkstra's algorithm (named after its discover, E.W. Dijkstra) solves the problem of. finding the shortest path from a point in a graph (the source) to a destination. It turns out. that one can ...

  2. Comparative Study Of Various Approaches Of Dijkstra Algorithm

    Dijkstra algorithm is not improved in terms of space and time-complexity, the algorithm is muddled to apply in network analysis for enormous test cases. To handle this huge time complexity, advanced structures are applied to enhance Dijkstra algorithm. The use of a keen and numerical procedure speeds up the enhanced Dijkstra by multiple times. In this paper four approaches using advanced data ...

  3. [2112.11927] Dijkstras algorithm with predictions to solve the single

    We study the use of machine learning techniques to solve a fundamental shortest path problem, known as the single-source many-targets shortest path problem (SSMTSP). Given a directed graph with non-negative edge weights, our goal is to compute a shortest path from a given source node to any of several target nodes. Basically, our idea is to equip an adapted version of Dijkstras algorithm with ...

  4. Using Machine Learning Predictions to Speed-up Dijkstra's Shortest Path

    In this paper, we contribute to the emerging research agenda of studying how ML-approaches can be used to design efficient algorithms for combinatorial optimisation problems. When designing such ML-based optimisation algorithms we ideally would ... the adapted Dijkstra algorithm, it can potentially save many queue operations addi-tionally ...

  5. Analysis of Dijkstra's Algorithm and A* Algorithm in Shortest Path

    Abstract. Finding the shortest path in direction effective is essential. To solve this shortest path problem, we usually using Dijkstra or A* algorithm. These two algorithms are often used in routing or road networks. This paper's objective is to compare those two algorithms in solving this shortest path problem.

  6. The Improved Dijkstra's Shortest Path Algorithm and Its Application

    Abstract. The shortest path problem exists in variety of areas. A well known shortest path algorithm is Dijkstra's, also called "label algorithm". Experiment results have shown that the "label algorithm" has the following issues: I.. Its exiting mechanism is effective to undigraph but ineffective to digraph, or even gets into an ...

  7. Understanding Dijkstra's Algorithm by Adeel Javaid :: SSRN

    Abstract. Dijkstra's algorithm (named after its discover, E.W. Dijkstra) solves the problem of finding the shortest path from a point in a graph (the source) to a destination. It turns out that one can find the shortest paths from a given source to all points in a graph in the same time, hence this problem is sometimes called the single-source ...

  8. (PDF) Improved Dijkstra Algorithm for Mobile Robot Path ...

    In this paper, an optimal collision-free algorithm is designed and implemented practically based on an improved Dijkstra algorithm. To achieve this research objectives, first, the MR obstacle-free ...

  9. A Comprehensive Study of Dijkstra's Algorithm

    Abstract. Dijkstra's technique, named after E.W. Dijkstra, presents a solution for calculating the most direct path from a starting point in a graph (source) to a destination location. However, because it may determine the shortest pathways from one source to all other points in the graph at the same time, it is also known as the single-source ...

  10. On The Optimization of Dijkstra's Algorithm

    Dijkstra algorithm is 8 (figure 2), this is the result of TORA [1] software: ... In this paper an optimization of Dijkstra algorithm is presented. The main idea is to ... 1. Taha, H. Operations research an introduction, ninth edition. Pearson publisher, 2011. 2. Nar, D. Graph theory with applications to engineering and computer science ...

  11. (PDF) Analysis of Dijkstra's Algorithm and A* Algorithm in Shortest

    Dijkstra's algorithm is one form of the greedy. algorithm. This algorithm includes a graph search algorithm used to solve the shortest path problem. with a single source on a graph that does not ...

  12. dijkstra algorithm Latest Research Papers

    This research implements Dijkstra's Algorithm written in the React Native programming language to build a Covid-19 tracking application. The system can display the closest distance with a radius of at least one meter, and the test results can map the nearest radius of 41 meters and the most immediate radius of 147 meters. ... In this paper, a ...

  13. V &RQI 6HU Simulation and Improved Dijkstra

    Dijkstra's algorithm, because based on the references, bidirectional Dijkstra's is said to have a faster search execution process than Dijkstra's algorithm. Just likes research in previous year bidirectional Dijkstra's algorithm will be implemented based on web framework vue.js, more specifically on material design vuetify.js. The process and ...

  14. An Algorithm of Shortest Path Based on Dijkstra for Huge Data

    This paper introduces the classical Dijkstra algorithm in detail, and illustrates the method of implementation of the algorithm and the disadvantages of the algorithm: the network nodes require square-class memory, so it is difficult to quantify the shortest path of the major nodes. At the same time, it describes the adjacent node algorithm which is an optimization algorithm based on Dijkstra ...

  15. Dijkstra's and A-Star in Finding the Shortest Path: a Tutorial

    As one form of the greedy algorithm, Dijkstra's can handle the shortest path search with optimum result in longer search time. Dijkstra's is contrary to A-Star, a best-first search algorithm, which can handle the shortest path search with a faster time but not always optimum. By looking at the advantages and disadvantages of Dijkstra's and A-Star, this tutorial discusses the implementation of ...

  16. dijkstra's algorithm Latest Research Papers

    Starting Point. The purpose of this study are (1) to represent the route of café location in Bumiayu in the form of graph, (2) To find a solution from the application of the Dijkstra's algorithm to find location of café in Bumiayu, and (3) To find the recommended fastest route. The method used in this research is literature study, data ...

  17. PDF Research on The Optimization of Dijkstra'S

    Matrix algorithm. IV. OPTIMIZED DIJKSTRA ALGORITHM: In view of the problems mentioned above in Dijkstra's algorithm, the selection is optimized for the shortest path node, the data storage and organization in this paper. 4.1. Analysis of Optimization Ideas 4.1.1 The Selection of the Shortest Path Nodes and Nodes Ranking

  18. PDF A Survey of Recent Applications of Improved ijkstra's Shortest Path

    on this algorithm. This paper presents a survey of improved algorithms based on Dijkstra's algorithm that are applied in different application areas. Key Words: Shortest path problem, Dijkstra's algorithm, label, path optimaization, weight, graph, complexity 1. INTRODUCTION The shortest path problem is about finding a path between

  19. Dijkstra's Algorithm Research Papers

    The most widely used algorithm is Dijkstra algorithm. This algorithm has been represented with various structural developments in order to reduce the shortest path cost. This paper highlights the idea of using a storage medium to store the solution path from Dijkstra algorithm, then, uses it to find the implicit path in an ideal time cost.

  20. PDF Shortest Path Search Algorithm in Optimal Two-Dimensional Circulant

    O.1/of the algorithm does not depend on the network size in contrast to other well-known algorithms like Dijkstra's algorithm. Such properties make it a promising solution for the use in NoCs which was con˝rmed by an experimental study while synthesizing NoC communication subsystems and comparing the consumed hardware resources with those

  21. Research on Optimal Path based on Dijkstra Algorithms

    Research on Optimization Strategy of Dijkstra Algorithm[J]. Computer Technology and Development, 2006, 16(9): 73~75. Optimization of Dijkstra optimal path algorithm

  22. (PDF) Dijkstra Algorithm Application: Shortest Distance between

    Conclusion Fig.1: Coordinate of various building represented as numbers from 0 to 13 The above paper is a simple application of shortest path algorithm in our daily life. This paper presents the way how anyone can apply Dijkstra algorithm to get shortest path to reach the destination of any area or locality in which they live if they follow the ...

  23. Research on Quadrotor UAV control and path planning based on PID

    Research on Quadrotor UAV control and path planning based on PID controller and Dijkstra algorithm Wangsheng Xushi. Wangsheng Xushi a) Department of mechanical engineering, University College London ... According to this situation, this paper introduces a possible application to the logistics industry. Urban areas usually have a high demand for ...

  24. A Comparison of Dijkstra's Algorithm Using Fibonacci Heaps, Binary

    This paper describes the shortest path problem in weighted graphs and examines the differences in efficiency that occur when using Dijkstra's algorithm with a Fibonacci heap, binary heap, and self-balancing binary tree. Using C++ implementations of these algorithm variants, we find that the fastest method is not always the one that has the lowest asymptotic complexity. Reasons for this are ...

  25. Research of shortest path algorithm based on the data structure

    The shortest path problem based on the data structure has become one of the hot research topics in graph theory. As the basic theory of solving this problem, Dijkstra algorithm has been widely used in engineering calculations. Aiming at the shortcomings of traditional Dijkstra algorithm, this paper has proposed an optimization method which has mainly improved the nodes selection of the ...

  26. PDF Privacy-Preserving Dijkstra

    minimum spanning tree (MST), and maximum flow. They achieveO(V2) work complexity for BFS, SSSP, and MST, as well as O(V3 ·ElogV) work for maximum flow. Anagreh et al. [7] presented a privacy-preserving implementation of Prim's algorithm to solve MST, with O(VlogV) rounds and O(V2) work. They also generalize their MST algorithm to work for the minimum spanning forests.

  27. Design and Analysis of Computer Algorithms

    Dijkstra's Algorithm; Bellman-Ford Algorithm; ... Staff, contact info, research papers. Quantum Computing At The Max Plank Institute -- Provides an overview of quantum computer related research taking place at the Max Plank Institute. The primary focus is ion trap based computing. Selected reprints are available.

  28. A Low-Cost Indoor Navigation and Tracking System Based on Wi ...

    In the recent years, the number of smartphone users has increased dramatically every year. Smartphones produce a variety of services including indoor navigation and tracking using the Received Signal Strength Indicator (RSSI) value of the Wi-Fi (Wireless Fidelity) routers to estimate user position. In this research, we developed a navigation and tracking system using a Fingerprint map and k ...

  29. A Lagrangian relaxation algorithm for stochastic fixed interval

    This paper deals with operational fixed interval scheduling problems under uncertainty caused by random delays. This stochastic programming problem has a deterministic reformulation based on network flow under the assumption that the machines are identical and the multivariate distribution of random delays follows an Archimedean copula.

  30. Detecting hallucinations in large language models using ...

    Intuitively, our method works by sampling several possible answers to each question and clustering them algorithmically into answers that have similar meanings, which we determine on the basis of ...