Toward improved statistical methods for analyzing Cotinine-Biomarker health association data
Tobacco Induced Diseases
BioMed Central for International Society for the Prevention of Tobacco Induced Diseases
© Koru-Sengul et al; licensee BioMed Central Ltd. 2011 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
BACKGROUND: Serum cotinine, a metabolite of nicotine, is frequently used in research as a biomarker of recent tobacco smoke exposure. Historically, secondhand smoke (SHS) research uses suboptimal statistical methods due to censored serum cotinine values, meaning a measurement below the limit of detection (LOD). METHODS: We compared commonly used methods for analyzing censored serum cotinine data using parametric and non-parametric techniques employing data from the 1999-2004 National Health and Nutrition Examination Surveys (NHANES). To illustrate the differences in associations obtained by various analytic methods, we compared parameter estimates for the association between cotinine and the inflammatory marker homocysteine using complete case analysis, single and multiple imputation, "reverse" Kaplan-Meier, and logistic regression models. RESULTS: Parameter estimates and statistical significance varied according to the statistical method used with censored serum cotinine values. Single imputation of censored values with either 0, LOD or LOD/√2 yielded similar estimates and significance; multiple imputation method yielded smaller estimates than the other methods and without statistical significance. Multiple regression modelling using the "reverse" Kaplan-Meier method yielded statistically significant estimates that were larger than those from parametric methods. CONCLUSIONS: Analyses of serum cotinine data with values below the LOD require special attention. "Reverse" Kaplan-Meier was the only method inherently able to deal with censored data with multiple LODs, and may be the most accurate since it avoids data manipulation needed for use with other commonly used statistical methods. Additional research is needed into the identification of optimal statistical methods for analysis of SHS biomarkers subject to a LOD.
This work was supported by the Flight Attendant Medical Research Institute (FAMRI) Clinical Innovator Awards to Dr. Koru-Sengul and Dr. Lee, the National Institute of Environmental Health Sciences (NIH F30 ES015969), the National Institute for Occupational Safety and Health grants (R01 OH003915), and the European Union Convergence funding (ECEHH, PCMD, University of Exeter).
This is the final version of the article. Available from BioMed Central via the DOI in this record.
Vol. 9, article 11
Place of publication