Training and resources

Technical short courses and training - Data Science and AI

The IDSAI will be sharing details of technical short courses and training related to data science and AI available at the University on this page.

If you have any suggestions or requests for courses please contact us at idsai@exeter.ac.uk

Please select one of the buttons below to find out more about the training offers currently available.

An Introduction to Bayesian Modelling

As part of Data Science Week 2020, Dr Oliver Stoner (IDSAI Research Fellow) delivered an Introduction to Bayesian Modelling for colleagues.

You can view the recording of this session here.

An Introduction to Optimisation

As part of Data Science Week 2020, Dr George Dr Ath (Research Fellow in the Department of Computer Science) delivered an Introduction to Optimisation for colleagues.

You can view the recording of this session here.

Applying Data Science and AI to Research Questions

If you need some inspiration for how data science and/or AI can be applied to the research project / problem that you are working on, then take a look at some examples of DS / AI in action from our IDSAI theme leads below:

Using Naturally Occurring Data to Understand Post-Natal Depression

Miriam Koschate & Elahe Naserian

Post-natal depression (PND) affects between 10-15% of mothers and about 10% of fathers, and can have detrimental consequences for the family as a whole as well as the new-born child well into their adulthood. Prospective longitudinal data are difficult and expensive to collect leading to many studies being either cross-sectional or retrospective. PND is still highly stigmatised making self-report answers less reliable. The rising use of online forums, such as Netmums, Mumsnet and Reddit, by millions of parents and parents-to-be creates anonymous naturally occurring, longitudinal data in the form of online posts. Computational analysis of such posts may provide a new way of examining changes during pregnancy and after birth.

Read more.‌

Uncovering Individualised Treatment Effect: Evidence from Educational Trials

Oliver Hauser & Zhimin Xiao, Charlie Kirkwod, Daniel Li, Benjamin Jones, Steve Higgins

The use of large-scale Randomised Controlled Trials (RCTs) is fast becoming "the gold standard" of testing the causal effects of policy, social, and educational interventions. RCTs are typically evaluated — and ultimately judged — by the economic, educational, and statistical significance of the Average Treatment Effect (ATE) in the study sample. However, many interventions have heterogeneous treatment effects across different individuals, not captured by the ATE. One way to identify heterogeneous treatment effects is to conduct subgroup analyses, such as focusing on low-income Free School Meal pupils as required for projects funded by the Education Endowment Foundation (EEF) in England. These subgroup analyses, as we demonstrate in 48 EEF-funded RCTs involving over 200,000 students, are usually not standardised across studies and offer flexible degrees of freedom to researchers, potentially leading to mixed results. Here, we develop and deploy a machine-learning and regression-based framework for systematic estimation of Individualised Treatment Effect (ITE), which can show where a seemingly ineffective and uninformative intervention worked, for whom, and by how much. Our findings have implications for decision-makers in education, public health, and medical trials.

Read the working paper here.

Exeter Q-Step Centre

Q-Step is a UK-wide £19.5 million programme to promote a step-change in quantitative training in the British social sciences. Exeter Q-Step brings together a range of activities related to training as well as curriculum development in quantitative methods. The Q-Step workshops in Applied Data Analysis seek to provide additional support to students interested in Quantitative Methods for the Social Sciences. They aspire to raise interest in Applied Data Analysis amongst undergraduates and postgraduates, and embed quantitative literacy in the culture of the University. They provide a number of resources available to all colleagues:

Workshops - workshops are open to all. Upcoming courses include:

Introduction to Bayesian Analysis
Bayesian analysis with JAGS/Topics in Bayesian analysis
Introduction to Open-Source Intelligence (OSINT)

For more information about these courses please visit the Exeter Q-Step site.

The Q-Step learning resources page - slides and captured workshops from Q-Step are available on the Centre's git hub page. These are available for all colleagues to access. Materials available include:

Computational methods: Python, R, SPSS, SQL, Nvivo
Quantitative research methods
Qualitative research methods

Stats Support Desk - Q-Step now provide an online Stats helpdesk service, available for colleagues to book appointments on Tuesdays between 1600 and 1800 (UG and MSc students will take priority at times of peak demand). These appointments are held via Skype for Business and can be booked via: https://statshelpdesk.youcanbook.me

National Centre for Research Methods - Academics across the UK, including in Exeter, have been awarded £2.8million until 2024 to help to support social scientists to conduct world-leading research through innovative training and capacity-building activities.

The training will be aimed at postgraduate students and early career researchers at any stage of their career, and will be delivered by staff at the University of Exeter Q Step Centre and international experts. The funding, from the Economic and Social Research Council, will pay for a comprehensive programme of cutting-edge research methods training across the UK. The training will be delivered through e-books, videos and interactive slide decks. Check this page and the Q-Step page for announcements of training activities.

Exeter Data Analytics Hub

The Exeter Data Analytics Hub is a team of academics based at the University of Exeter across Exeter and Penryn campuses, who offer a range of workshops in the field of statistics, data science, machine learning, programming and more. They provide technical and analytical support for early career researchers based in at the University of Exeter. Courses include:

Python for Scientific Research
Introduction to R
Machine Learning
Spatial Data Analysis

For more information about these courses and workshops please visit the Exeter Data Analytics Hub site.

Doctoral College

The Doctoral College provide a wide range of training courses and professional development opportunities available to postgraduate and early career researchers across the University. Information about training opportunities can be found here.

You can book on to all of the courses through PGR iTrent.

There are a number of online resources available through the Doctoral College's ELE pages including:

Data Analysis Resources and Support
Data visualisation training and support
Software training resources for:
- NVivo
- Python
- R
- SPSS
- Stata
- ArcGIS

Health Statistics: R Help & Training

Where can I find R help and training?

Courses are occasionally run by Lauren Rodgers in the Health Statistics Team and the Bioinformatics Hub - Please check their events page for upcoming dates.

The Exeter R User group runs user-group meetings featuring external speakers from both academia and industry, and has members of the group present their knowledge of R packages, best practice and specialist topics. The group is free to join and meetings are open to anyone with an interest R (regardless of their knowledge level) and are an excellent opportunity to network with other R Users.

The Health Statistics team also offer Stats Clinics, where a member of the team will be available to assist with your Stats related queries. Attendees will be asked to complete a questionnaire prior to their Stats Clinic appointment, to enable the team to offer the best advice. Please check their events page for upcoming dates.

Data Science and AI resources

The IDSAI aims to share useful resources with colleagues to support their work in the areas of Data Science and AI. If there are additional resources you would like us to share on here or you have ideas for new resources please contact us at: idsai@exeter.ac.uk

Please select one of the buttons below to find out more about the resources available.

Data Science and AI Ethics

For more information about the IDSAI's research theme: Data Governance, Algorithms & Values please visit the research theme page.

If you have any further questions please contact the Data Governance, Algorithms & Values theme lead, Sabina Leonelli: s.leonelli@exeter.ac.uk

Please find below a selection of articles and references regarding ethical data science and AI research:

Data management and use‌: Governance in the 21st century - A joint report by the British Academy and the Royal Society

Big Data in Environment and Human Health‌ - Oxford Research Encyclopedia of Environmental Science

Locating ethics in data science‌: responsibility and accountability in global and distributed knowledge production systems - Sabina Leonelli

The Data Ethics Canvas of the Open Data Institute: https://theodi.org/article/data-ethics-canvas

Taddeo, Mariarosaria, and Luciano Floridi. 2018. “How AI Can Be a Force for Good.” Science 361 (6404): 751 LP – 752. https://doi.org/10.1126/science.aat5991

Data Management and Use: Governance in the 21st Century. A Joint Report of the Royal Society and the British Academy. https://royalsociety.org/~/media/policy/projects/data-governance/data-management-governance.pdf

Zook, Matthew, Solon Barocas, danah boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, et al. 2017. “Ten Simple Rules for Responsible Big Data Research.” PLOS Computational Biology 13 (3): e1005399. doi:10.1371/journal.pcbi.1005399

Leonelli, S. (2017) Biomedical Knowledge Production in the Age of Big Data. Report for the Swiss Science and Innovation Council, published online November 2017: http://www.swir.ch/images/stories/pdf/en/Exploratory_study_2_2017_Big_Data_SSIC_EN.pdf

Fleming LE, Tempini N, Gordon-Brown H, Nichols G, Sarran C, Vineis P, Leonardi G, Golding B, Haines A, Kessel A, Murray V, Depledge M, Leonelli S. (2017) Big Data in Environment and Human Health: Challenges and Opportunities. Oxford Encyclopaedia for Environment and Human Health. Oxford University Press.

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018. DOI: http://dx.doi.org/10.1038/sdata.2016.18

Leonelli, S. (2016) Locating Ethics in Data Science: Responsibility and Accountability in Global and Distributed Knowledge Production. Philosophical Transactions of the Royal Society: Part A. 374: 20160122. http://dx.doi.org/10.1098/rsta.2016.0122

Richards, M., Anderson, R., Hinde, S., Kaye, J., Lucassen, A., Matthews, P., Parker, M., Shotter, M., Watts, G., Wallace, S., Wise, J., 2015. The collection, linking and use of data in biomedical research and health care: ethical issues. Nuffield Council on Bioethics, London

Global Alliance for Genomics and Health (GA4GH, 2016). Framework for responsible sharing of genomic and health-related data. See https://genomicsandhealth.org/about-the-global-alliance/keydocuments/framework-responsible-sharing-genomic-and-health-related-data.

O’Brien D., Ullman J., Altman M., Gasser U., Bar-Sinai M., Nissim K., Vadhan S., Wojcik M. J. OECD Recommendations on Health Data Governance http://www.oecd.org/els/health-systems/health-data-governance.htm

High Performance Computing

High performance computing (HPC)

ISCA

Isca is the University's HPC environment. It represents a £3m investment by the University, designed to serve the advanced computing requirements of all research disciplines.
The first of its kind in a UK University, Isca combines a traditional HPC cluster with a virtualised cluster environment, providing a range of node types in a single machine.
Isca is available free of charge to all research groups on all campuses. All research active staff are able to access Isca, from any University of Exeter campus or via a VPN connection. Isca consists of a range of compute resources: the traditional cluster (128 GB nodes) is complimented by two large memory (3TB) nodes, Xeon Phi accelerator nodes and GPU (Tesla K80) compute nodes. The non-traditional element of Isca includes a cluster of higher memory nodes (256 GB), 3 TB nodes, and an Openstack environment for the management of virtualised resources.

Contact: For further information on ISCA please email the ISCA team.

Isambard

The university is also part of the GW4 consortium that have partnered with Cray to deliver a Tier 2 developmental system known as Isambard. Isambard is designed to evaluate novel technologies such as GPU cards, Intel's Phi architecture and Arm processors.
Find out more about Isambard.

If you would like to request an account for Isambard you can find information on how to do this here.

JADE

The Joint Academic Data Science Endeavour (JADE) is a large GPU facility designed to meet the needs of machine learning and related data science applications. In addition to providing a leading computer resource, the JADE facility has also provided a nucleus around which a national consortium of AI researchers has formed, making it the de facto national compute facility for AI research. You can find out about getting access to JADE here.

Archer

ARCHER is the latest UK National Supercomputing Service. The ARCHER Service started in November 2013 and is expected to run for 5 years. ARCHER provides a capability resource to allow researchers to run simulations and calculations that require large numbers of processing cores working in a tightly-coupled, parallel fashion.
Find out about getting access to ARCHER2.

DiRAC

DiRAC was established to provide distributed High Performance Computing (HPC) services to the STFC theory community. HPC-based modelling is an essential tool for the exploitation and interpretation of observational and experimental data generated by astronomy and particle physics facilities support by STFC as this technology allows scientists to test their theories and run simulations from the data gathered in experiments.
Find out more about DiRAC.

The Turing Way

Reproducible research is work that can be independently verified. In practice, it means sharing the data and code that were used to generate published results - yet this is often easier said than done. 'The Turing Way' is a guide to reproducible data science that will support students and academics as they develop their code, with the aim of helping them produce work that will be regarded as gold-standard examples of trustworthy and reusable research.

As part of Data Science Week Dr Kirstie Whittaker, Turing Programme Lead for Tools, Practices and Systems discussed ‘The Turing Way' with colleagues at Exeter.

You can view the recording of her presentation at Data Science Week here (this includes a link to the slideset she used).

Kirstie also shared a number of links to useful resources for The Turing Way which are available below:

• The Turing Way book: https://the-turing-way.netlify.com
• Github repository: https://github.com/alan-turing-institute/the-turing-way
• Contributing guidelines: https://github.com/alan-turing-institute/the-turing-way/blob/master/CONTRIBUTING.md
• Online Collaboration Cafe outline and schedule: https://github.com/alan-turing-institute/the-turing-way/blob/master/project_management/online-collaboration-cafe.md
• Gitter chat room: https://gitter.im/alan-turing-institute/the-turing-way
• Twitter: https://twitter.com/turingway
• Mailing list: https://tinyletter.com/TuringWay