- NEWS AND VIEWS
The number of errors produced by an LLM can be reduced by grouping its outputs into semantically similar clusters. Remarkably, this task can be performed by a second LLM, and the method’s efficacy can be evaluated by a third.
- By
- Karin Verspoor0
- Karin Verspoor
-
Karin Verspoor is in the School of Computing Technologies, RMIT University, Melbourne, Victoria 3000, Australia and in the School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria 3010, Australia.
View author publications
You can also search for this author in PubMed Google Scholar
-
Text-generation systems powered by large language models (LLMs) have been enthusiastically embraced by busy executives and programmers alike, because they provide easy access to extensive knowledge through a natural conversational interface. Scientists too have been drawn to both using and evaluating LLMs — finding applications for them in drug discovery1, in materials design2 and in proving mathematical theorems3. A key concern for such uses relates to the problem of ‘hallucinations’, in which the LLM responds to a question (or prompt) with text that seems like a plausible answer, but is factually incorrect or irrelevant4. How often hallucinations are produced, and in what contexts, remains to be determined, but it is clear that they occur regularly and can lead to errors and even harm if undetected. In a paper in Nature, Farquhar et al.5 tackle this problem by developing a method for detecting a specific subclass of hallucinations, termed confabulations.
Access options
Change institution
Buy or subscribe
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 /30days
cancel any time
Learn more
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Learn more
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Learn more
Prices may be subject to local taxes which are calculated during checkout
Nature 630, 569-570 (2024)
doi: https://doi.org/10.1038/d41586-024-01641-0
References
Vert, J.-P. Nature Biotechnol. 41, 750–751 (2023).
Jablonka, K. M. et al. Digit. Discov. 2, 1233–1250 (2023).
Frieder, S. et al. Mathematical capabilities of ChatGPT. In Proc. NeurIPS 36 (eds Oh, A. et al.) (NIPS, 2023).
Hicks, M. T., Humphries, J. & Slater, J. Ethics Inf. Technol. 26, 38 (2024).
Farquhar, S., Kossen, J., Kuhn, L. & Gal, Y. Nature 630, 625–630 (2024).
Firth, J. R. Studies in Linguistic Analysis (Blackwell, 1957).
Landauer, T. K. & Dumais, S. T. Psych. Rev. 104, 211–240 (1997).
Bender, E. M. & Koller, A. in Proc. 58th Ann. Meet. ACL 5185–5198 (Association for Computational Linguistics, 2020).
Mitchell, M. & Krakauer, D. C. Proc. Natl Acad. Sci. USA 120, e2215907120 (2023).
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. & Artzi, Y. In 8th Int.Conf.Learning Represent. (ICLR, 2020); available at https://openreview.net/forum?id=SkeHuCVFDr
Wang, L. L. et al. In Proc. 61st Ann. Meet. ACL Vol. 1, 9871–9889 (Association for Computational Linguistics, 2023).
Sun, T., He, J., Qiu, X. & Huang, X. In Proc. 2022 Conf. Empirical Methods in Natural Language Processing 3726–3739 (Association for Computational Linguistics, 2022).
Koike, R., Kaneko, M. & Okazaki, N. Proc. AAAI Conf. Artificial Intell. 38, 21258–21266 (AAAI, 2024).
Li, Y. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2405.12689 (2024).
Taloni, A., Scorcia, V. & Giannaccare, G. Eye 38, 397–400 (2024).
Zhang, Y. et al. Detection vs. Anti-detection: Is Text Generated by AI Detectable? In Wisdom, Well-Being, Win-Win (eds Sserwanga, I. et al.) Lecture Notes in Computer Science Vol. 14596 (Springer, 2024).
Competing Interests
K.V. has received speaker fees and travel reimbursem*nt for presentations on Artificial Intelligence, Natural Language Processing/LLMs, and AI in Health care; research funding from the Australian Research Council, the Australian National Health and Medical Research Council, and the Medical Research Futures Fund, and has research partnerships with Elsevier BV. K.V. is co-founder and Victoria Node Lead of the Australian Alliance for Artificial Intelligence in Healthcare; and a member of the Standards Australia Committee, IT-014-21, AI in Healthcare.
Related Articles
-
Read the paper: Detecting hallucinations in large language models using semantic entropy
-
Online tools help large language models to solve problems through reasoning
-
Large language models help computer programs to evolve
-
Subjects
- Machine learning
- Computer science
Latest on:
Jobs
-
Research Postdoctoral Fellow - MD
Houston, Texas (US)
Baylor College of Medicine (BCM)
-
Postdoctoral position for EU project ARTiDe: A novel regulatory T celltherapy for type 1 diabetes
Development of TCR-engineered Tregs for T1D. Single-cell analysis, evaluate TCRs. Join INEM's cutting-edge research team.
Paris, Ile-de-France (FR)
French National Institute for Health Research (INSERM)
-
Postdoc or PhD position: the biology of microglia in neuroinflammatory disease
Join Our Team! Investigate microglia in neuroinflammation using scRNAseq and genetic tools. Help us advance CNS disease research at INEM!
Paris, Ile-de-France (FR)
French National Institute for Health Research (INSERM)
-
Postdoctoral Researcher Positions in Host-Microbiota/Pathogen Interaction
Open postdoctoral positions in host microbiota/pathogen interaction requiring expertise in either bioinformatics, immunology or cryoEM.
Paris, Ile-de-France (FR)
French National Institute for Health Research (INSERM)
-
CMU - CIMR Joint Invitation for Global Outstanding Talents
Basic medicine, biology, pharmacy, public health and preventive medicine, nursing, biomedical engineering...
Beijing (CN)
Capital Medical University - Chinese Institutes for Medical Research, Beijing
Change institution
Buy or subscribe
Related Articles
-
Read the paper: Detecting hallucinations in large language models using semantic entropy
-
Online tools help large language models to solve problems through reasoning
-
Large language models help computer programs to evolve
-
Subjects
- Machine learning
- Computer science
Sign up to Nature Briefing
An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.