Data Science and AI Resources
The IDSAI resources page aims to share useful resources with colleagues to support their work in the areas of Data Science and AI. If there are additional resources you would like us to share on here or you have ideas for new resources please contact us at: firstname.lastname@example.org
If you are looking for information about the technical data science and AI training and short courses available at the University, please visit the IDSAI training page.
For more information about the IDSAI's research theme: Data Governance, Algorithms & Values please visit the research theme page.
If you have any further questions please contact the Data Governance, Algorithms & Values theme lead, Sabina Leonelli: email@example.com
Please find below a selection of articles and references regarding ethical data science and AI research:
Data management and use: Governance in the 21st century - A joint report by the British Academy and the Royal Society
Big Data in Environment and Human Health - Oxford Research Encyclopedia of Environmental Science
Locating ethics in data science: responsibility and accountability in global and distributed knowledge production systems - Sabina Leonelli
The Data Ethics Canvas of the Open Data Institute: https://theodi.org/article/data-ethics-canvas
Taddeo, Mariarosaria, and Luciano Floridi. 2018. “How AI Can Be a Force for Good.” Science 361 (6404): 751 LP – 752. https://doi.org/10.1126/science.aat5991
Data Management and Use: Governance in the 21st Century. A Joint Report of the Royal Society and the British Academy. https://royalsociety.org/~/media/policy/projects/data-governance/data-management-governance.pdf
Zook, Matthew, Solon Barocas, danah boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, et al. 2017. “Ten Simple Rules for Responsible Big Data Research.” PLOS Computational Biology 13 (3): e1005399. doi:10.1371/journal.pcbi.1005399
Leonelli, S. (2017) Biomedical Knowledge Production in the Age of Big Data. Report for the Swiss Science and Innovation Council, published online November 2017: http://www.swir.ch/images/stories/pdf/en/Exploratory_study_2_2017_Big_Data_SSIC_EN.pdf
Fleming LE, Tempini N, Gordon-Brown H, Nichols G, Sarran C, Vineis P, Leonardi G, Golding B, Haines A, Kessel A, Murray V, Depledge M, Leonelli S. (2017) Big Data in Environment and Human Health: Challenges and Opportunities. Oxford Encyclopaedia for Environment and Human Health. Oxford University Press.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018. DOI: 10.1038/sdata.2016.18
Leonelli, S. (2016) Locating Ethics in Data Science: Responsibility and Accountability in Global and Distributed Knowledge Production. Philosophical Transactions of the Royal Society: Part A. 374: 20160122. http://dx.doi.org/10.1098/rsta.2016.0122
Richards, M., Anderson, R., Hinde, S., Kaye, J., Lucassen, A., Matthews, P., Parker, M., Shotter, M., Watts, G., Wallace, S., Wise, J., 2015. The collection, linking and use of data in biomedical research and health care: ethical issues. Nuffield Council on Bioethics, London
Global Alliance for Genomics and Health (GA4GH, 2016). Framework for responsible sharing of genomic and health-related data. See https://genomicsandhealth.org/about-the-global-alliance/keydocuments/framework-responsible-sharing-genomic-and-health-related-data.
O’Brien D., Ullman J., Altman M., Gasser U., Bar-Sinai M., Nissim K., Vadhan S., Wojcik M. J. OECD Recommendations on Health Data Governance http://www.oecd.org/els/health-systems/health-data-governance.htm
High performance computing (HPC)
Isca is the University's HPC environment. It represents a £3m investment by the University, designed to serve the advanced computing requirements of all research disciplines.
The first of its kind in a UK University, Isca combines a traditional HPC cluster with a virtualised cluster environment, providing a range of node types in a single machine.
Isca is available free of charge to all research groups on all campuses. All research active staff are able to access Isca, from any University of Exeter campus or via a VPN connection. Isca consists of a range of compute resources: the traditional cluster (128 GB nodes) is complimented by two large memory (3TB) nodes, Xeon Phi accelerator nodes and GPU (Tesla K80) compute nodes. The non-traditional element of Isca includes a cluster of higher memory nodes (256 GB), 3 TB nodes, and an Openstack environment for the management of virtualised resources.
Contact: For further information on ISCA please email the ISCA team.
The university is also part of the GW4 consortium that have partnered with Cray to deliver a Tier 2 developmental system known as Isambard. Isambard is designed to evaluate novel technologies such as GPU cards, Intel's Phi architecture and Arm processors.
Find out more about Isambard.
If you would like to request an account for Isambard you can find information on how to do this here.
The Joint Academic Data Science Endeavour (JADE) is a large GPU facility designed to meet the needs of machine learning and related data science applications. In addition to providing a leading computer resource, the JADE facility has also provided a nucleus around which a national consortium of AI researchers has formed, making it the de facto national compute facility for AI research. You can find out about getting access to JADE here.
ARCHER is the latest UK National Supercomputing Service. The ARCHER Service started in November 2013 and is expected to run for 5 years. ARCHER provides a capability resource to allow researchers to run simulations and calculations that require large numbers of processing cores working in a tightly-coupled, parallel fashion.
Find out about getting access to ARCHER.
DiRAC was established to provide distributed High Performance Computing (HPC) services to the STFC theory community. HPC-based modelling is an essential tool for the exploitation and interpretation of observational and experimental data generated by astronomy and particle physics facilities support by STFC as this technology allows scientists to test their theories and run simulations from the data gathered in experiments.
Find out more about DiRAC.
Reproducible research is work that can be independently verified. In practice, it means sharing the data and code that were used to generate published results - yet this is often easier said than done. 'The Turing Way' is a guide to reproducible data science that will support students and academics as they develop their code, with the aim of helping them produce work that will be regarded as gold-standard examples of trustworthy and reusable research.
As part of Data Science Week Dr Kirstie Whittaker, Turing Programme Lead for Tools, Practices and Systems discussed ‘The Turing Way' with colleagues at Exeter.
You can view the recording of her presentation at Data Science Week here (this includes a link to the slideset she used).
Kirstie also shared a number of links to useful resources for The Turing Way which are available below:
• The Turing Way book: https://the-turing-way.netlify.com
• Github repository: https://github.com/alan-turing-institute/the-turing-way
• Contributing guidelines: https://github.com/alan-turing-institute/the-turing-way/blob/master/CONTRIBUTING.md
• Online Collaboration Cafe outline and schedule: https://github.com/alan-turing-institute/the-turing-way/blob/master/project_management/online-collaboration-cafe.md
• Gitter chat room: https://gitter.im/alan-turing-institute/the-turing-way
• Twitter: https://twitter.com/turingway
• Mailing list: https://tinyletter.com/TuringWay