Show simple item record

dc.contributor.authorFalis, M
dc.contributor.authorGema, AP
dc.contributor.authorDong, H
dc.contributor.authorDaines, L
dc.contributor.authorBasetti, S
dc.contributor.authorHolder, M
dc.contributor.authorPenfold, RS
dc.contributor.authorBirch, A
dc.contributor.authorAlex, B
dc.date.accessioned2024-09-17T08:54:30Z
dc.date.issued2024-09-13
dc.date.updated2024-09-16T16:30:42Z
dc.description.abstractObjectives The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels. Materials and Methods Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents. Results Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative. Discussion and Conclusion While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.en_GB
dc.description.sponsorshipUKRIen_GB
dc.description.sponsorshipEngineering and Physical Sciences Research Council (EPSRC)en_GB
dc.description.sponsorshipWellcome Trusten_GB
dc.description.sponsorshipLegal and General PLCen_GB
dc.description.sponsorshipNational Institute for Health and Care Research (NIHR)en_GB
dc.identifier.citationPublished online 13 September 2024en_GB
dc.identifier.doihttps://doi.org/10.1093/jamia/ocae132
dc.identifier.grantnumberEP/S02431X/1en_GB
dc.identifier.grantnumberEP/V050869/1en_GB
dc.identifier.grantnumber223499/Z/21/Zen_GB
dc.identifier.grantnumberNIHR202639en_GB
dc.identifier.urihttp://hdl.handle.net/10871/137473
dc.identifierORCID: 0000-0001-6828-6891 (Dong, Hang)
dc.language.isoenen_GB
dc.publisherOxford University Press (OUP) / American Medical Informatics Associationen_GB
dc.relation.urlhttps://physionet.org/about/citi-course/en_GB
dc.relation.urlhttps://doi.org/10.13026/bnc2-1a81en_GB
dc.rights© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.en_GB
dc.subjectICD codingen_GB
dc.subjectdata augmentationen_GB
dc.subjectlarge language modelen_GB
dc.subjectclinical text generationen_GB
dc.subjectevaluation by cliniciansen_GB
dc.titleCan GPT-3.5 generate and code discharge summaries?en_GB
dc.typeArticleen_GB
dc.date.available2024-09-17T08:54:30Z
dc.identifier.issn1067-5027
dc.descriptionThis is the final version. Available on open access from Oxford University Press via the DOI in this recorden_GB
dc.descriptionData availability: The synthetic discharge summary data generated as part of this study will be shared on reasonable request to the corresponding author upon presenting a certificate of completion of the CITI Data or Specimens Only Research course from the Collaborative Institutional Training Initiative program (https://physionet.org/about/citi-course/). The data has been accepted for publication and will be made available via PhysioNet (https://doi.org/10.13026/bnc2-1a81).en_GB
dc.identifier.eissn1527-974X
dc.identifier.journalJournal of the American Medical Informatics Associationen_GB
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_GB
dcterms.dateAccepted2024-05-22
dcterms.dateSubmitted2024-01-02
rioxxterms.versionVoRen_GB
rioxxterms.licenseref.startdate2024-09-13
rioxxterms.typeJournal Article/Reviewen_GB
refterms.dateFCD2024-09-17T08:50:07Z
refterms.versionFCDVoR
refterms.dateFOA2024-09-17T08:55:13Z
refterms.panelBen_GB
refterms.dateFirstOnline2024-09-13
exeter.rights-retention-statementNo


Files in this item

This item appears in the following Collection(s)

Show simple item record

© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Except where otherwise noted, this item's licence is described as © The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.