Last year, the scientific world lost one of its preeminent statisticians, Calyampudi
Radhakrishna Rao. Much has already been written about his more famous results, the
Cramér-Rao lower bound, the Rao-Blackwell Theorem, and information geometry. (The
August 2021 issue of the International Statistical Review provides a good primer.) ...
Last year, the scientific world lost one of its preeminent statisticians, Calyampudi
Radhakrishna Rao. Much has already been written about his more famous results, the
Cramér-Rao lower bound, the Rao-Blackwell Theorem, and information geometry. (The
August 2021 issue of the International Statistical Review provides a good primer.) Instead,
this memorial column will connect two applied publications Rao worked on early in his
career to modern data science:
(1) “Anthropometric survey of the United Provinces, 1941: A statistical study” by P.C.
Mahalanobis, D.N. Majumdar, M.W.M Yeatts, and C.R. Rao published in Sankhyā in
1949; and
(2) The Ancient Inhabitants of Jebel Moya by R. Mukherjee, C.R. Rao, and J.C. Trevor,
published in 1955.
Both works offer the contemporary reader excellent examples of following a modern
collaborative data science framework, from study design to data stewardship. We will focus
on three themes integrated within such frameworks: replicability, reproducibility, and
incorporating data context. (Replicability is the idea that a new study can repeat the results
of the old one. It implies, however, that the initial study provide sufficient details so that
someone else can "replicate" it, starting from data collection. This differs from
reproducibility, where researchers supply enough information, including the raw data, so
that the existing results can be independently generated. That said, especially given that
these are historical data sets, replicability can only be discussed as a theoretical possibility.)