Skip to main content


Dr Javier Lezaun: Screens and filters: curating the open archive

Egenis seminar with Dr Javier Lezaun

Event details

This paper examines the production and maintenance of ‘open data’ in scientific research. It follows the evolution of a dataset, the Tres Cantos Anti-Malarial Set (TCAMS), which comprises information about more than 13,500 compounds found to inhibit growth of the malaria parasite. This dataset was generated in 2010 from a screening of GSK’s proprietary chemical library. It is now deposited in ChEMBL, one of the largest free-access cheminformatics databases in the world. ChEMBL is a key resource in the emergence of ‘open-source drug discovery’, particularly in the field of neglected tropical diseases, and TCAMS has become a primary example of the effort to energize research collaborations by enhancing data sharing. The paper uses this case study to discuss the complex combinations of the shared and the proprietary that characterize contemporary drug discovery, and the provision of new public scientific resources more generally. In particular, the presentation pays attention to data curation, a set of routine practices meant to arrest the natural tendency of public data to decay. By focusing on the work of data maintenance, the paper explores an understanding of ‘open science’ as a labour-intensive effort.


Byrne House