AI Discernment in Foot and Ankle Surgery Research: A Survey Investigation

Category: Other Introduction/Purpose: Artificial intelligence (AI) encompasses computer systems emulating human intelligence. Large language models (LLM), like ChatGPT (OpenAI), exemplify this trend. Trained on vast datasets, LLMs utilize machine learning and natural language processing to generate...

Full description

Saved in:
Bibliographic Details
Main Authors: Steven R. Cooperman DPM, MBA, AACFAS, Abisola Olaniyan MBBS, MPH, PhD, Roberto Brandao DPM, FACFAS
Format: Article
Language:English
Published: SAGE Publishing 2024-12-01
Series:Foot & Ankle Orthopaedics
Online Access:https://doi.org/10.1177/2473011424S00129
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850123364585177088
author Steven R. Cooperman DPM, MBA, AACFAS
Abisola Olaniyan MBBS, MPH, PhD
Roberto Brandao DPM, FACFAS
author_facet Steven R. Cooperman DPM, MBA, AACFAS
Abisola Olaniyan MBBS, MPH, PhD
Roberto Brandao DPM, FACFAS
author_sort Steven R. Cooperman DPM, MBA, AACFAS
collection DOAJ
description Category: Other Introduction/Purpose: Artificial intelligence (AI) encompasses computer systems emulating human intelligence. Large language models (LLM), like ChatGPT (OpenAI), exemplify this trend. Trained on vast datasets, LLMs utilize machine learning and natural language processing to generate coherent responses. However, their use in scientific and medical research raises concerns about plagiarism and accuracy. The scientific community faces the challenge of distinguishing AI-generated content from human-authored text. This study aims to assess foot and ankle surgeons' ability to discern AI-generated abstracts from human-written ones in the field of foot and ankle surgery. Additionally, it examines how participant characteristics, such as experience and familiarity with AI, impact this differentiation process. Methods: A survey was developed encompassing participant characteristic inquiries and the presentation of 12 abstracts—6 AI-generated and 6 from the Journal of Foot and Ankle Surgery. Participants, blinded to abstract creation methods, determined whether each was AI or human-generated and provided confidence scores on a 0-100 scale. The survey was administered to foot and ankle surgeons at the Orthopedic Foot and Ankle Center twice, with the second survey completed two weeks after the initial one. Descriptive statistics characterized participant attributes, with mean and standard deviations reported. Two-sample tests of proportions assessed differences in correct identifications between AI and human-generated abstracts. Pearson's correlation examined associations between correct identifications, participant attributes, and confidence scores. Intraclass correlation coefficients (ICCs) evaluated inter- and intra-rater reliability. Statistical significance was set at p < 0.05, analyzed using STATA/S.E (Version 17.0). Results: Nine reviewers participated in the survey study with varying years of practice (0 – 29 years), publications (1 to 200), years reviewing articles (1 to 25), and self-reported AI familiarity scores (0 – 75). Among 216 responses, 109 (50.5%) correctly identified abstract sources. During the first assessment (T1), reviewers accurately identified 44% of AI-generated and 57% of human-generated abstracts, with similar results at the second assessment (T2). No significant difference existed between AI and human-generated abstract identification rates at both time points. Correlation analysis showed mixed relationships between correct identifications and reviewer characteristics. Moderate inter-rater reliability was observed initially, diminishing in the second assessment, with poor intra-rater reliability throughout. Conclusion: This study demonstrates the difficulties in detecting AI-generated text, even by foot and ankle specialists on topics specific to their field. An overall accuracy of 50.5% over our 216 total responses shows identification was a coin flip, with moderate to poor inter-rater and poor intra-rater reliability of responses. These findings highlight the difficulty with discerning AI-generated content in academic research and presents evidence for urgency in the creation of safeguards and guidelines for the management of AI generated content in medical literature.
format Article
id doaj-art-63ea9753b8924e3584db16be72dd6aaa
institution OA Journals
issn 2473-0114
language English
publishDate 2024-12-01
publisher SAGE Publishing
record_format Article
series Foot & Ankle Orthopaedics
spelling doaj-art-63ea9753b8924e3584db16be72dd6aaa2025-08-20T02:34:36ZengSAGE PublishingFoot & Ankle Orthopaedics2473-01142024-12-01910.1177/2473011424S00129AI Discernment in Foot and Ankle Surgery Research: A Survey InvestigationSteven R. Cooperman DPM, MBA, AACFASAbisola Olaniyan MBBS, MPH, PhDRoberto Brandao DPM, FACFASCategory: Other Introduction/Purpose: Artificial intelligence (AI) encompasses computer systems emulating human intelligence. Large language models (LLM), like ChatGPT (OpenAI), exemplify this trend. Trained on vast datasets, LLMs utilize machine learning and natural language processing to generate coherent responses. However, their use in scientific and medical research raises concerns about plagiarism and accuracy. The scientific community faces the challenge of distinguishing AI-generated content from human-authored text. This study aims to assess foot and ankle surgeons' ability to discern AI-generated abstracts from human-written ones in the field of foot and ankle surgery. Additionally, it examines how participant characteristics, such as experience and familiarity with AI, impact this differentiation process. Methods: A survey was developed encompassing participant characteristic inquiries and the presentation of 12 abstracts—6 AI-generated and 6 from the Journal of Foot and Ankle Surgery. Participants, blinded to abstract creation methods, determined whether each was AI or human-generated and provided confidence scores on a 0-100 scale. The survey was administered to foot and ankle surgeons at the Orthopedic Foot and Ankle Center twice, with the second survey completed two weeks after the initial one. Descriptive statistics characterized participant attributes, with mean and standard deviations reported. Two-sample tests of proportions assessed differences in correct identifications between AI and human-generated abstracts. Pearson's correlation examined associations between correct identifications, participant attributes, and confidence scores. Intraclass correlation coefficients (ICCs) evaluated inter- and intra-rater reliability. Statistical significance was set at p < 0.05, analyzed using STATA/S.E (Version 17.0). Results: Nine reviewers participated in the survey study with varying years of practice (0 – 29 years), publications (1 to 200), years reviewing articles (1 to 25), and self-reported AI familiarity scores (0 – 75). Among 216 responses, 109 (50.5%) correctly identified abstract sources. During the first assessment (T1), reviewers accurately identified 44% of AI-generated and 57% of human-generated abstracts, with similar results at the second assessment (T2). No significant difference existed between AI and human-generated abstract identification rates at both time points. Correlation analysis showed mixed relationships between correct identifications and reviewer characteristics. Moderate inter-rater reliability was observed initially, diminishing in the second assessment, with poor intra-rater reliability throughout. Conclusion: This study demonstrates the difficulties in detecting AI-generated text, even by foot and ankle specialists on topics specific to their field. An overall accuracy of 50.5% over our 216 total responses shows identification was a coin flip, with moderate to poor inter-rater and poor intra-rater reliability of responses. These findings highlight the difficulty with discerning AI-generated content in academic research and presents evidence for urgency in the creation of safeguards and guidelines for the management of AI generated content in medical literature.https://doi.org/10.1177/2473011424S00129
spellingShingle Steven R. Cooperman DPM, MBA, AACFAS
Abisola Olaniyan MBBS, MPH, PhD
Roberto Brandao DPM, FACFAS
AI Discernment in Foot and Ankle Surgery Research: A Survey Investigation
Foot & Ankle Orthopaedics
title AI Discernment in Foot and Ankle Surgery Research: A Survey Investigation
title_full AI Discernment in Foot and Ankle Surgery Research: A Survey Investigation
title_fullStr AI Discernment in Foot and Ankle Surgery Research: A Survey Investigation
title_full_unstemmed AI Discernment in Foot and Ankle Surgery Research: A Survey Investigation
title_short AI Discernment in Foot and Ankle Surgery Research: A Survey Investigation
title_sort ai discernment in foot and ankle surgery research a survey investigation
url https://doi.org/10.1177/2473011424S00129
work_keys_str_mv AT stevenrcoopermandpmmbaaacfas aidiscernmentinfootandanklesurgeryresearchasurveyinvestigation
AT abisolaolaniyanmbbsmphphd aidiscernmentinfootandanklesurgeryresearchasurveyinvestigation
AT robertobrandaodpmfacfas aidiscernmentinfootandanklesurgeryresearchasurveyinvestigation