Show simple item record

dc.contributor.authorPrice, Sarah Jane
dc.date.accessioned2016-05-26T08:48:37Z
dc.date.issued2016-01-05
dc.description.abstractElectronic medical record databases (e.g. the Clinical Practice Research Datalink, CPRD) are increasingly used in epidemiological research. The CPRD has two formats of data: coded, which is the sole format used in almost all research; and free-text (or ‘hidden’), which may contain much clinical information but is generally unavailable to researchers. This thesis examines the ramifications of omitting free-text records from research. Cases with bladder (n=4,915) or pancreatic (n=3,635) cancer were matched to controls (n=21,718, bladder; n=16,459, pancreas) on age, sex and GP practice. Coded and text-only records of attendance for haematuria, jaundice and abdominal pain in the year before cancer diagnosis were identified. The number of patients whose entire attendance record for a symptom/sign existed solely in the text was quantified. Associations between recording method (coded or text-only) and case/control status were estimated (χ2 test). For each symptom/sign, the positive predictive value (PPV, Bayes' Theorem) and odds ratio (OR, conditional logistic regression) for cancer were estimated before and after supplementation with text-only records. Text-only recording was considerable, with 7,951/20,958 (37%) of symptom records being in that format. For individual patients, text-only recording was more likely in controls (140/336=42%) than cases (556/3,147=18%) for visible haematuria in bladder cancer (χ2 test, p<0.001), and for jaundice (21/31=67% vs 463/1,565=30%, p<0.0001) and abdominal pain (323/1,126=29% vs 397/1,789=22%, p<0.001) in pancreatic cancer. Adding text records reduced PPVs of visible haematuria for bladder cancer from 4.0% (95% CI: 3.5–4.6%) to 2.9% (2.6–3.2%) and of jaundice for pancreatic cancer from 12.8% (7.3–21.6%) to 6.3% (4.5–8.7%). Coded records suggested that non-visible haematuria occurred in 127/4,915 (2.6%) cases, a figure below that generally used for study. Supplementation with text-only records increased this to 312/4,915 (6.4%), permitting the first estimation of its OR (28.0, 95% CI: 20.7–37.9, p<0.0001) and PPV (1.60%, 1.22–2.10%, p<0.0001) for bladder cancer. The results suggest that GPs make strong clinical judgements about the probable significance of symptoms – preferentially coding clinical features they consider significant to a diagnosis, while using text to record those that they think are not.en_GB
dc.identifier.urihttp://hdl.handle.net/10871/21692
dc.language.isoenen_GB
dc.publisherUniversity of Exeteren_GB
dc.subjectClinical Practice Research Datalinken_GB
dc.subjectGeneral Practice Research Databaseen_GB
dc.subjectText recordsen_GB
dc.subjectPrimary careen_GB
dc.subjectCancer diagnosisen_GB
dc.subjectBladder canceren_GB
dc.subjectPancreatic canceren_GB
dc.subjectBiasen_GB
dc.titleWhat are we missing by ignoring text records in the Clinical Practice Research Datalink? Using three symptoms of cancer as examples to estimate the extent of data in text format that is hidden to researchen_GB
dc.typeThesis or dissertationen_GB
dc.date.available2016-05-26T08:48:37Z
dc.contributor.advisorHamilton, William
dc.contributor.advisorStapley, Sal
dc.contributor.advisorBarraclough, Kevin
dc.publisher.departmentHealth Services Research, Institute of Health Researchen_GB
dc.type.degreetitlePhD in Medical Studiesen_GB
dc.type.qualificationlevelDoctoralen_GB
dc.type.qualificationnamePhDen_GB


Files in this item

This item appears in the following Collection(s)

Show simple item record