ログイン
Language:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 医学部
  1. 医学部
  2. 学術雑誌掲載論文  (医学部)

Reasoning-optimised large language models reach near-expert accuracy on board-style orthopaedic exams: A multi-model comparison on 702 multiple-choice questions

http://hdl.handle.net/10458/0002002114
http://hdl.handle.net/10458/0002002114
1a3c8757-0d7f-4d2d-b7ae-cd040c6a5aba
名前 / ファイル ライセンス アクション
Knee Knee surg sports traumatol arthrosc - 2025 - Diniz - Reasoning‐optimised large language models reach near‐expert accuracy.pdf (1.4 MB)
license.icon
アイテムタイプ 学術雑誌論文 / Journal Article(1)
公開日 2025-12-26
タイトル
タイトル Reasoning-optimised large language models reach near-expert accuracy on board-style orthopaedic exams: A multi-model comparison on 702 multiple-choice questions
言語 en
言語
言語 eng
キーワード
言語 en
キーワード artificial intelligence
キーワード
言語 en
キーワード clinical decision support
キーワード
言語 en
キーワード large language models
キーワード
言語 en
キーワード medical education
キーワード
言語 en
キーワード orthopaedic surgery
資源タイプ
資源タイプ journal article
アクセス権
アクセス権 open access
著者 Diniz, Pedro

× Diniz, Pedro

en Diniz, Pedro(Personal)
Universitaire Brugmann

Search repository
横江, 琢示

× 横江, 琢示

WEKO 34429
e-Rad_Researcher 50895894

ja 横江, 琢示
宮崎大学

ja-Kana ヨコエ, タクジ

en Yokoe, Takuji
University of Miyazaki

Search repository
Öttl, Felix C

× Öttl, Felix C

en Öttl, Felix C(Personal)
Balgrist University

Search repository
Pereira, Hélder

× Pereira, Hélder

en Pereira, Hélder(Personal)
Centro Hospitalar Póvoa de Varzim

Search repository
Henriques, Rui

× Henriques, Rui

en Henriques, Rui(Personal)
Universidade de Lisboa

Search repository
Samuelsson, Kristian

× Samuelsson, Kristian

en Samuelsson, Kristian(Personal)
University of Gothenburg

Search repository
抄録
内容記述タイプ Abstract
内容記述 The purpose of this study was to compare the accuracy, calibration, reproducibility and operating cost of seven large language models (LLMs)-including four newer models capable of using advanced reasoning techniques to analyse complex medical information and generate accurate responses-on text-only orthopaedic multiple-choice questions (MCQs) and to quantify gains over GPT-4./From Orthobullets, 702 unique, non-image MCQs (drawn from AAOS Self-Assessment Examinations, Self-Assessment-Based Questions and Orthopaedic In Training Examination-Based Questions banks) were extracted. Each question was submitted to the following LLMs: OpenAI o3, Anthropic Claude Sonnet 4, Claude Opus 4 (with/without 'Extended Thinking') and Google Gemini 2.5 Pro. Additionally, OpenAI's GPT-4, GPT-4o and the open-weight Gemma 3 27B served as comparators. The primary outcome was overall accuracy. The secondary outcomes were topic and difficulty-stratified accuracy, calibration (expected calibration error [ECE] and Brier score), reproducibility (flip rate on a retest question subset), latency, token use and cost. Statistical tests included paired McNemar, Cochran Q, ordinal logistic regression and Fleiss κ (Bonferroni-adjusted α = 0.05)./GPT-4 achieved 69.7% accuracy (95% CI = 66.2-72.9). All four reasoning-optimised models scored ≥14 percentage points higher (p < 3.3 × 10-15); OpenAI o3 led with 93.6% (95% CI = 91.5-95.2), which represents a 34% relative error reduction. Accuracy tended to decline with question difficulty, yet the reasoning advantage persisted in every difficulty stratum. Claude Opus 4 showed the best calibration (ECE = 0.023), while GPT-4 exhibited overconfidence (ECE = 0.215). All models except Gemma 3 27B exhibited non-zero flip rates. Median query time: 0.9 s (Gemma) to 15.9 s (Gemini 2.5 Pro). Cost: 0 to 29.9 USD per 1000 queries./Reasoning-optimised LLMs now answer text-based orthopaedic exam questions with high accuracy and substantially better confidence calibration than earlier models. However, persistent stochasti
言語 en
書誌情報 en : Knee surgery, sports traumatology, arthroscopy : official journal of the ESSKA

発行日 2025-12-17
出版者
出版者 Wiley
言語 en
ISSN
収録物識別子タイプ EISSN
収録物識別子 14337347
DOI
関連タイプ isVersionOf
識別子タイプ DOI
関連識別子 https://doi.org/10.1002/ksa.70222
権利
権利情報 © 2025 The Author(s).
言語 en
著者版フラグ
出版タイプ VoR
戻る
0
views
See details
Views

Versions

Ver.1 2025-12-26 07:04:33.998738
Show All versions

Share

Share
tweet

Cite as

Other

print

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX
  • ZIP

コミュニティ

確認

確認

確認


Powered by WEKO3


Powered by WEKO3