dc.contributor.author | Gibbons, C | |
dc.contributor.author | Richards, S | |
dc.contributor.author | Valderas, JM | |
dc.contributor.author | Campbell, J | |
dc.date.accessioned | 2018-05-15T07:10:20Z | |
dc.date.issued | 2017-03-15 | |
dc.description.abstract | BACKGROUND: Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. OBJECTIVE: The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. METHODS: We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. RESULTS: Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). CONCLUSIONS: Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor's performance. | en_GB |
dc.description.sponsorship | We thank Karen Alexander, the National Institute for Health Research (NIHR) Adaptive Tests for Long-Term Conditions (ATLanTiC) patient and public involvement partner, for providing critical insight, comments, and editing the manuscript. Data collection and qualitative coding were funded by the UK General Medical Council as an unrestricted research award. Support for the novel work presented in this paper was given by a postdoctoral fellowship award for CG (NIHR-PDF-2014-07-028). | en_GB |
dc.identifier.citation | Vol. 19(3), e65 | en_GB |
dc.identifier.doi | 10.2196/jmir.6533 | |
dc.identifier.uri | http://hdl.handle.net/10871/32852 | |
dc.language.iso | en | en_GB |
dc.publisher | JMIR Publications | en_GB |
dc.relation.url | https://www.ncbi.nlm.nih.gov/pubmed/28298265 | en_GB |
dc.rights | ©Chris Gibbons, Suzanne Richards, Jose Maria Valderas, John Campbell. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.03.2017.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included. | en_GB |
dc.subject | data mining | en_GB |
dc.subject | feedback | en_GB |
dc.subject | machine learning | en_GB |
dc.subject | surveys and questionnaires | en_GB |
dc.subject | work performance | en_GB |
dc.subject | Algorithms | en_GB |
dc.subject | Clinical Competence | en_GB |
dc.subject | Feedback | en_GB |
dc.subject | Humans | en_GB |
dc.subject | Physicians | en_GB |
dc.subject | Supervised Machine Learning | en_GB |
dc.subject | Surveys and Questionnaires | en_GB |
dc.title | Supervised machine learning algorithms can classify open-text feedback of doctor performance with human-level accuracy | en_GB |
dc.type | Article | en_GB |
dc.date.available | 2018-05-15T07:10:20Z | |
dc.identifier.issn | 1439-4456 | |
exeter.place-of-publication | Canada | en_GB |
dc.description | This is the final version of the article. Available from the publisher via the DOI in this record. | en_GB |
dc.identifier.journal | Journal of Medical Internet Research | en_GB |