Global data for local science: Assessing the scale of data infrastructures in biological and biomedical research
The use of online databases to collect and disseminate data is typically portrayed as crucial to the management of ‘big science’. At the same time, databases are not deemed successful unless they facilitate the re-use of data towards new scientific discoveries, which often involves engaging with several highly diverse and inherently unstable research communities. This paper examines the tensions encountered by database developers in their efforts to foster both the global circulation and the local adoption of data. I focus on two prominent attempts to build data infrastructures in the fields of plant science and cancer research over the last decade: The Arabidopsis Information Resource and the Cancer Biomedical Informatics Grid. I show how curators’ experience of the diverse and dynamic nature of biological research led them to envision databases as catering primarily for local, rather than global, science; and to structure them as platforms where methodological and epistemic diversity can be expressed and explored, rather than denied or overcome. I conclude that one way to define the scale of data infrastructure is to consider the range and scope of the biological and biomedical questions which it helps to address; and that within this perspective, databases have a larger scale than the science that they serve, which tends to remain fragmented into a wide variety of specialised projects.