Supervised machine learning algorithms can classify open-text feedback of doctor performance with human-level accuracy

Gibbons, C; Richards, S; Valderas, JM; Campbell, J

dc.contributor.author	Gibbons, C
dc.contributor.author	Richards, S
dc.contributor.author	Valderas, JM
dc.contributor.author	Campbell, J
dc.date.accessioned	2018-05-15T07:10:20Z
dc.date.issued	2017-03-15
dc.description.abstract	BACKGROUND: Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. OBJECTIVE: The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. METHODS: We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. RESULTS: Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). CONCLUSIONS: Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor's performance.	en_GB
dc.description.sponsorship	We thank Karen Alexander, the National Institute for Health Research (NIHR) Adaptive Tests for Long-Term Conditions (ATLanTiC) patient and public involvement partner, for providing critical insight, comments, and editing the manuscript. Data collection and qualitative coding were funded by the UK General Medical Council as an unrestricted research award. Support for the novel work presented in this paper was given by a postdoctoral fellowship award for CG (NIHR-PDF-2014-07-028).	en_GB
dc.identifier.citation	Vol. 19(3), e65	en_GB
dc.identifier.doi	10.2196/jmir.6533
dc.identifier.uri	http://hdl.handle.net/10871/32852
dc.language.iso	en	en_GB
dc.publisher	JMIR Publications	en_GB
dc.relation.url	https://www.ncbi.nlm.nih.gov/pubmed/28298265	en_GB
dc.rights	©Chris Gibbons, Suzanne Richards, Jose Maria Valderas, John Campbell. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.03.2017. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.	en_GB
dc.subject	data mining	en_GB
dc.subject	feedback	en_GB
dc.subject	machine learning	en_GB
dc.subject	surveys and questionnaires	en_GB
dc.subject	work performance	en_GB
dc.subject	Algorithms	en_GB
dc.subject	Clinical Competence	en_GB
dc.subject	Feedback	en_GB
dc.subject	Humans	en_GB
dc.subject	Physicians	en_GB
dc.subject	Supervised Machine Learning	en_GB
dc.subject	Surveys and Questionnaires	en_GB
dc.title	Supervised machine learning algorithms can classify open-text feedback of doctor performance with human-level accuracy	en_GB
dc.type	Article	en_GB
dc.date.available	2018-05-15T07:10:20Z
dc.identifier.issn	1439-4456
exeter.place-of-publication	Canada	en_GB
dc.description	This is the final version of the article. Available from the publisher via the DOI in this record.	en_GB
dc.identifier.journal	Journal of Medical Internet Research	en_GB

Files in this item

Name:: gibbons Richards 2017 algorith ...
Size:: 15.49Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institute of Health Research

Show simple item record

Show Statistical Information