Health Statistics for Data Scientists

Module title	Health Statistics for Data Scientists
Module code	HPDM096
Academic year	2023/4
Credits	15
Module staff	Dr Harry Green ()

Duration: Term	1	2	3
Duration: Weeks	10	0	0

Number students taking module (anticipated)	20

Module description

This module provides a broad introduction to statistical modelling for data scientists. The module starts by considering the different stages of a statistical investigation and emphasising the importance of problem formulation. The module highlights the benefits of exploratory data analysis based on descriptive statistics and graphs. Key concepts in probability theory and the role of statistical distributions in modelling health data will be covered. The core part of the module provides a foundation in regression modelling to include simple linear regression, logistic regression, survival analysis and models that account for complex temporal and hierarchical data structures. In the later part of the module, you will learn advanced techniques for making causal inferences about the effectiveness of health interventions. This will include instrumental variable analysis, regression discontinuity designs and the difference-in-differences method. Throughout this module, you will gain practical experience of statistical computing using the R software environment and exposure to case studies based on real-world health data.

Module aims - intentions of the module

The aim of the module is to provide a modern statistical framework for answering health research questions through interrogation of health datasets derived from randomised trials, electronic health records or other observational studies. The module will equip you with the theoretical underpinning and computational skills needed for advanced regression modelling of health data. Both frequentist and Bayesian approaches to modelling will be considered and contrasted. The module will consider potential sources of bias when key variables are unmeasured or contain missing values and explore a range of advanced statistical methods for strengthening causal inferences in real-world health evaluations. The module will emphasise the fundamental role of the statistician as a problem solver and consider the different stages of the “problem solving” cycle. Case studies will be used to help you develop an appreciation of modelling strategy and to give you practical experience of interpreting model findings in the context of real health problems.

Intended Learning Outcomes (ILOs)

ILO: Module-specific skills

On successfully completing the module you will be able to...

1. Examine and apply fundamental concepts in statistical modelling and inference, including conditional probability, statistical distributions, sampling variability, estimators, bias and likelihood functions
2. Critically evaluate the Bayesian and frequentist frameworks for statistical inference including their strengths, limitations and differences
3. Examine the theoretical basis of linear regression, generalized linear models and survival analysis
4. Apply a range of regression and causal inference methods to address health data science problems
5. Critically evaluate the strengths and limitations of different statistical methods, including regression models and causal inference methods, within a health data science project

ILO: Discipline-specific skills

On successfully completing the module you will be able to...

6. Formulate health research questions as statistical problems
7. Draw conclusions from the results of a data analysis and justify those conclusions, appropriately acknowledging uncertainty in the results

ILO: Personal and key skills

On successfully completing the module you will be able to...

8. Use the R software environment for statistical computing
9. Understand and critically appraise academic research papers in research field
10. Effectively communicate arguments, evidence and conclusions using a variety of formats in a manner appropriate to the intended audience

Syllabus plan

Whilst the module’s precise content may vary from year to year, an example of an overall structure is as follows:

Formulating statistical problems
Statistical computing using R
Exploratory data analysis
Probability theory and statistical distributions
Interval estimation and hypothesis testing including parametric and non-parametric methods
Likelihoods and maximum likelihood estimation
Bayesian and frequentist inference
Monte Carlo simulation
Power and sample size calculations
Linear regression modelling
Generalised linear models
Longitudinal and survival analysis
Multilevel models for hierarchical data
Missing data mechanisms and multiple imputation
Causal inference for healthcare evaluations including adjustment methods for addressing measured and unmeasured confounding

Learning activities and teaching methods (given in hours of study time)

Scheduled Learning and Teaching Activities	Guided independent study	Placement / study abroad
35	115	0

Details of learning activities and teaching methods

Category	Hours of study time	Description
Scheduled Learning and Teaching	15	Lectures (10 x 1.5 hours)
Scheduled Learning and Teaching	20	Computer based workshops (10 x 2 hours)
Guided independent study	115	Background reading and preparation for module assessments

Formative assessment

Form of assessment	Size of the assessment (eg length / duration)	ILOs assessed	Feedback method
Multiple choice questions will be given as part of the workshops, and will be self-assessed	10 questions for each workshop session	All	Oral staff where required

Summative assessment (% of credit)

Coursework	Written exams	Practical exams
80	0	20

Details of summative assessment

Form of assessment	% of credit	Size of the assessment (eg length / duration)	ILOs assessed	Feedback method
Group presentation	20	10 minutes (in groups of 3/4)	4-10	Written
Written assignment	80	2000-2500 words	1-10	Written

	0
	0
	0
	0

Details of re-assessment (where required by referral or deferral)

Original form of assessment	Form of re-assessment	ILOs re-assessed	Timescale for re-assessment
Group presentation (20%)	Individual presentation 10 minutes	4-10	Typically within six weeks of the result
Written assignment (80%), 2000-2500 words	Written assignment	1-10	Typically within six weeks of the result

Re-assessment notes

Please refer to the TQA section on Referral/Deferral: http://as.exeter.ac.uk/academic-policy-standards/tqa-manual/aph/consequenceoffailure/

Indicative learning resources - Basic reading

Basic reading:

Essential Medical Statistics. Kirkwood and Stern, Blackwell Science. (Available online: http://encore.exeter.ac.uk/iii/encore/record/C__Rb3519976 )
Introductory Statistics with R, Second Edition. Dalgaard, P. Springer and Hall (2008).

http://www.academia.dk/BiologiskAntropologi/Epidemiologi/PDF/Introductory_Statistics_with_R__2nd_ed.pdf

An Introduction to Generalized Linear Models, Third Edition. Dobson, AJ and Barnett, AG, Chapman & Hall (2008).

https://reneues.files.wordpress.com/2010/01/an-introduction-to-generalized-linear-models-second-edition-dobson.pdf

An Introduction to Statistical Learning with Applications in R Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf

R for Data Science Garrett Grolemund, Hadley Wickham https://r4ds.had.co.nz/
Data Analysis Using Regression and Multilevel/Hierarchical Models. Gelman and Hill, Cambridge University Press (2007). https://faculty.psau.edu.sa/filedownload/doc-12-pdf-a1997d0d31f84d13c1cdc44ac39a8f2c-original.pdf

Indicative learning resources - Web based and electronic resources

ELE page: https://vle.exeter.ac.uk/course/view.php?id=8441

Key words search

probability, statistical distribution, estimator, likelihood, regression, survival analysis, Cox model, mixed effects model, bias, confounding, causation, Bayesian methods

Credit value	15
Module ECTS	7.5
Module pre-requisites	None
Module co-requisites	HPDM092 Fundamentals of Research Design
NQF level (module)	7
Available as distance learning?	No
Origin date	19/12/2019
Last revision date	02/05/2023