Training and Resources
Technical Short Courses and Training - Data Science and AI
The IDSAI will be sharing details of technical short courses and training related to data science and AI available at the University on this page.
If you have any suggestions or requests for courses please contact us at firstname.lastname@example.org
Please select one of the buttons below to find out more about the training offers currently available.
If you need some inspiration for how data science and/or AI can be applied to the research project / problem that you are working on, then take a look at some examples of DS / AI in action from our IDSAI theme leads below:
Using Naturally Occurring Data to Understand Post-Natal Depression
Miriam Koschate & Elahe Naserian
Post-natal depression (PND) affects between 10-15% of mothers and about 10% of fathers, and can have detrimental consequences for the family as a whole as well as the new-born child well into their adulthood. Prospective longitudinal data are difficult and expensive to collect leading to many studies being either cross-sectional or retrospective. PND is still highly stigmatised making self-report answers less reliable. The rising use of online forums, such as Netmums, Mumsnet and Reddit, by millions of parents and parents-to-be creates anonymous naturally occurring, longitudinal data in the form of online posts. Computational analysis of such posts may provide a new way of examining changes during pregnancy and after birth.
Uncovering Individualised Treatment Effect: Evidence from Educational Trials
Oliver Hauser & Zhimin Xiao, Charlie Kirkwod, Daniel Li, Benjamin Jones, Steve Higgins
The use of large-scale Randomised Controlled Trials (RCTs) is fast becoming "the gold standard" of testing the causal effects of policy, social, and educational interventions. RCTs are typically evaluated — and ultimately judged — by the economic, educational, and statistical significance of the Average Treatment Effect (ATE) in the study sample. However, many interventions have heterogeneous treatment effects across different individuals, not captured by the ATE. One way to identify heterogeneous treatment effects is to conduct subgroup analyses, such as focusing on low-income Free School Meal pupils as required for projects funded by the Education Endowment Foundation (EEF) in England. These subgroup analyses, as we demonstrate in 48 EEF-funded RCTs involving over 200,000 students, are usually not standardised across studies and offer flexible degrees of freedom to researchers, potentially leading to mixed results. Here, we develop and deploy a machine-learning and regression-based framework for systematic estimation of Individualised Treatment Effect (ITE), which can show where a seemingly ineffective and uninformative intervention worked, for whom, and by how much. Our findings have implications for decision-makers in education, public health, and medical trials.
Q-Step is a UK-wide £19.5 million programme to promote a step-change in quantitative training in the British social sciences. Exeter Q-Step brings together a range of activities related to training as well as curriculum development in quantitative methods. The Q-Step workshops in Applied Data Analysis seek to provide additional support to students interested in Quantitative Methods for the Social Sciences. They aspire to raise interest in Applied Data Analysis amongst undergraduates and postgraduates, and embed quantitative literacy in the culture of the University. They provide a number of resources available to all colleagues:
Workshops - workshops are open to all. Upcoming courses include:
- Introduciton to Bayesian Analysis
- Bayesian analysis with JAGS/Topics in Bayesian analysis
- Introduction to Open-Source Intelligence (OSINT)
For more information about these courses please visit the Exeter Q-Step site.
The Q-Step learning resources page - slides and captured workshops from Q-Step are available on the Centre's git hub page. These are available for all colleagues to access. Materials available include:
- Computational methods: Python, R, SPSS, SQL, Nvivo
- Quantitative research methods
- Qualitative research methods
Stats Support Desk - Q-Step now provide an online Stats helpdesk service, available for colleagues to book appointments on Tuesdays between 1600 and 1800 (UG and MSc students will take priority at times of peak demand). These appointments are held via Skype for Business and can be booked via: https://statshelpdesk.youcanbook.me
National Centre for Research Methods - Academics across the UK, including in Exeter, have been awarded £2.8million until 2024 to help to support social scientists to conduct world-leading research through innovative training and capacity-building activities.
The training will be aimed at postgraduate students and early career researchers at any stage of their career, and will be delivered by staff at the University of Exeter Q Step Centre and international experts. The funding, from the Economic and Social Research Council, will pay for a comprehensive programme of cutting-edge research methods training across the UK. The training will be delivered through e-books, videos and interactive slide decks. Check this page and the Q-Step page for announcements of training activities.
The Exeter Data Analytics Hub is a team of academics based at the University of Exeter across Exeter and Penryn campuses, who offer a range of workshops in the field of statistics, data science, machine learning, programming and more. They provide technical and analytical support for early career researchers based in at the University of Exeter. Courses include:
- Python for Scientific Research
- Introduction to R
- Machine Learning
- Spatial Data Analysis
For more information about these courses and workshops please visit the Exeter Data Analytics Hub site.
The Doctoral College provide a wide range of training courses and professional development opportunities available to postgraduate and early career researchers across the University. Information about training opportunities can be found here.
You can book on to all of the courses through My Career Zone.
There are a number of online resources available through the Doctoral College's ELE pages including:
Where can I find R help and training?
The Exeter R User group runs user-group meetings featuring external speakers from both academia and industry, and has members of the group present their knowledge of R packages, best practice and specialist topics. The group is free to join and meetings are open to anyone with an interest R (regardless of their knowledge level) and are an excellent opportunity to network with other R Users.
The Health Statistics team also offer Stats Clinics, where a member of the team will be available to assist with your Stats related queries. Attendees will be asked to complete a questionnaire prior to their Stats Clinic appointment, to enable the team to offer the best advice. Please check their events page for upcoming dates.
Data Science and AI Resources
The IDSAI aims to share useful resources with colleagues to support their work in the areas of Data Science and AI. If there are additional resources you would like us to share on here or you have ideas for new resources please contact us at: email@example.com
Please select one of the buttons below to find out more about the resources available.
For more information about the IDSAI's research theme: Data Governance, Algorithms & Values please visit the research theme page.
If you have any further questions please contact the Data Governance, Algorithms & Values theme lead, Sabina Leonelli: firstname.lastname@example.org
Please find below a selection of articles and references regarding ethical data science and AI research:
Data management and use: Governance in the 21st century - A joint report by the British Academy and the Royal Society
Big Data in Environment and Human Health - Oxford Research Encyclopedia of Environmental Science
Locating ethics in data science: responsibility and accountability in global and distributed knowledge production systems - Sabina Leonelli
The Data Ethics Canvas of the Open Data Institute: https://theodi.org/article/data-ethics-canvas
Taddeo, Mariarosaria, and Luciano Floridi. 2018. “How AI Can Be a Force for Good.” Science 361 (6404): 751 LP – 752. https://doi.org/10.1126/science.aat5991
Data Management and Use: Governance in the 21st Century. A Joint Report of the Royal Society and the British Academy. https://royalsociety.org/~/media/policy/projects/data-governance/data-management-governance.pdf
Zook, Matthew, Solon Barocas, danah boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, et al. 2017. “Ten Simple Rules for Responsible Big Data Research.” PLOS Computational Biology 13 (3): e1005399. doi:10.1371/journal.pcbi.1005399
Leonelli, S. (2017) Biomedical Knowledge Production in the Age of Big Data. Report for the Swiss Science and Innovation Council, published online November 2017: http://www.swir.ch/images/stories/pdf/en/Exploratory_study_2_2017_Big_Data_SSIC_EN.pdf
Fleming LE, Tempini N, Gordon-Brown H, Nichols G, Sarran C, Vineis P, Leonardi G, Golding B, Haines A, Kessel A, Murray V, Depledge M, Leonelli S. (2017) Big Data in Environment and Human Health: Challenges and Opportunities. Oxford Encyclopaedia for Environment and Human Health. Oxford University Press.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018. DOI: http://dx.doi.org/10.1038/sdata.2016.18
Leonelli, S. (2016) Locating Ethics in Data Science: Responsibility and Accountability in Global and Distributed Knowledge Production. Philosophical Transactions of the Royal Society: Part A. 374: 20160122. http://dx.doi.org/10.1098/rsta.2016.0122
Richards, M., Anderson, R., Hinde, S., Kaye, J., Lucassen, A., Matthews, P., Parker, M., Shotter, M., Watts, G., Wallace, S., Wise, J., 2015. The collection, linking and use of data in biomedical research and health care: ethical issues. Nuffield Council on Bioethics, London
Global Alliance for Genomics and Health (GA4GH, 2016). Framework for responsible sharing of genomic and health-related data. See https://genomicsandhealth.org/about-the-global-alliance/keydocuments/framework-responsible-sharing-genomic-and-health-related-data.
O’Brien D., Ullman J., Altman M., Gasser U., Bar-Sinai M., Nissim K., Vadhan S., Wojcik M. J. OECD Recommendations on Health Data Governance http://www.oecd.org/els/health-systems/health-data-governance.htm
High performance computing (HPC)
Isca is the University's HPC environment. It represents a £3m investment by the University, designed to serve the advanced computing requirements of all research disciplines.
The first of its kind in a UK University, Isca combines a traditional HPC cluster with a virtualised cluster environment, providing a range of node types in a single machine.
Isca is available free of charge to all research groups on all campuses. All research active staff are able to access Isca, from any University of Exeter campus or via a VPN connection. Isca consists of a range of compute resources: the traditional cluster (128 GB nodes) is complimented by two large memory (3TB) nodes, Xeon Phi accelerator nodes and GPU (Tesla K80) compute nodes. The non-traditional element of Isca includes a cluster of higher memory nodes (256 GB), 3 TB nodes, and an Openstack environment for the management of virtualised resources.
Contact: For further information on ISCA please email the ISCA team.
The university is also part of the GW4 consortium that have partnered with Cray to deliver a Tier 2 developmental system known as Isambard. Isambard is designed to evaluate novel technologies such as GPU cards, Intel's Phi architecture and Arm processors.
Find out more about Isambard.
If you would like to request an account for Isambard you can find information on how to do this here.
The Joint Academic Data Science Endeavour (JADE) is a large GPU facility designed to meet the needs of machine learning and related data science applications. In addition to providing a leading computer resource, the JADE facility has also provided a nucleus around which a national consortium of AI researchers has formed, making it the de facto national compute facility for AI research. You can find out about getting access to JADE here.
ARCHER is the latest UK National Supercomputing Service. The ARCHER Service started in November 2013 and is expected to run for 5 years. ARCHER provides a capability resource to allow researchers to run simulations and calculations that require large numbers of processing cores working in a tightly-coupled, parallel fashion.
Find out about getting access to ARCHER.
DiRAC was established to provide distributed High Performance Computing (HPC) services to the STFC theory community. HPC-based modelling is an essential tool for the exploitation and interpretation of observational and experimental data generated by astronomy and particle physics facilities support by STFC as this technology allows scientists to test their theories and run simulations from the data gathered in experiments.
Find out more about DiRAC.
Reproducible research is work that can be independently verified. In practice, it means sharing the data and code that were used to generate published results - yet this is often easier said than done. 'The Turing Way' is a guide to reproducible data science that will support students and academics as they develop their code, with the aim of helping them produce work that will be regarded as gold-standard examples of trustworthy and reusable research.
As part of Data Science Week Dr Kirstie Whittaker, Turing Programme Lead for Tools, Practices and Systems discussed ‘The Turing Way' with colleagues at Exeter.
You can view the recording of her presentation at Data Science Week here (this includes a link to the slideset she used).
Kirstie also shared a number of links to useful resources for The Turing Way which are available below:
• The Turing Way book: https://the-turing-way.netlify.com
• Github repository: https://github.com/alan-turing-institute/the-turing-way
• Contributing guidelines: https://github.com/alan-turing-institute/the-turing-way/blob/master/CONTRIBUTING.md
• Online Collaboration Cafe outline and schedule: https://github.com/alan-turing-institute/the-turing-way/blob/master/project_management/online-collaboration-cafe.md
• Gitter chat room: https://gitter.im/alan-turing-institute/the-turing-way
• Twitter: https://twitter.com/turingway
• Mailing list: https://tinyletter.com/TuringWay