Health Statistics for Data Scientists
Module title | Health Statistics for Data Scientists |
---|---|
Module code | HPDM096 |
Academic year | 2023/4 |
Credits | 15 |
Module staff | Dr Harry Green () |
Duration: Term | 1 | 2 | 3 |
---|---|---|---|
Duration: Weeks | 10 | 0 | 0 |
Number students taking module (anticipated) | 20 |
---|
Module description
This module provides a broad introduction to statistical modelling for data scientists. The module starts by considering the different stages of a statistical investigation and emphasising the importance of problem formulation. The module highlights the benefits of exploratory data analysis based on descriptive statistics and graphs. Key concepts in probability theory and the role of statistical distributions in modelling health data will be covered. The core part of the module provides a foundation in regression modelling to include simple linear regression, logistic regression, survival analysis and models that account for complex temporal and hierarchical data structures. In the later part of the module, you will learn advanced techniques for making causal inferences about the effectiveness of health interventions. This will include instrumental variable analysis, regression discontinuity designs and the difference-in-differences method. Throughout this module, you will gain practical experience of statistical computing using the R software environment and exposure to case studies based on real-world health data.
Module aims - intentions of the module
The aim of the module is to provide a modern statistical framework for answering health research questions through interrogation of health datasets derived from randomised trials, electronic health records or other observational studies. The module will equip you with the theoretical underpinning and computational skills needed for advanced regression modelling of health data. Both frequentist and Bayesian approaches to modelling will be considered and contrasted. The module will consider potential sources of bias when key variables are unmeasured or contain missing values and explore a range of advanced statistical methods for strengthening causal inferences in real-world health evaluations. The module will emphasise the fundamental role of the statistician as a problem solver and consider the different stages of the “problem solving” cycle. Case studies will be used to help you develop an appreciation of modelling strategy and to give you practical experience of interpreting model findings in the context of real health problems.
Intended Learning Outcomes (ILOs)
ILO: Module-specific skills
On successfully completing the module you will be able to...
- 1. Examine and apply fundamental concepts in statistical modelling and inference, including conditional probability, statistical distributions, sampling variability, estimators, bias and likelihood functions
- 2. Critically evaluate the Bayesian and frequentist frameworks for statistical inference including their strengths, limitations and differences
- 3. Examine the theoretical basis of linear regression, generalized linear models and survival analysis
- 4. Apply a range of regression and causal inference methods to address health data science problems
- 5. Critically evaluate the strengths and limitations of different statistical methods, including regression models and causal inference methods, within a health data science project
ILO: Discipline-specific skills
On successfully completing the module you will be able to...
- 6. Formulate health research questions as statistical problems
- 7. Draw conclusions from the results of a data analysis and justify those conclusions, appropriately acknowledging uncertainty in the results
ILO: Personal and key skills
On successfully completing the module you will be able to...
- 8. Use the R software environment for statistical computing
- 9. Understand and critically appraise academic research papers in research field
- 10. Effectively communicate arguments, evidence and conclusions using a variety of formats in a manner appropriate to the intended audience
Syllabus plan
Whilst the module’s precise content may vary from year to year, an example of an overall structure is as follows:
- Formulating statistical problems
- Statistical computing using R
- Exploratory data analysis
- Probability theory and statistical distributions
- Interval estimation and hypothesis testing including parametric and non-parametric methods
- Likelihoods and maximum likelihood estimation
- Bayesian and frequentist inference
- Monte Carlo simulation
- Power and sample size calculations
- Linear regression modelling
- Generalised linear models
- Longitudinal and survival analysis
- Multilevel models for hierarchical data
- Missing data mechanisms and multiple imputation
- Causal inference for healthcare evaluations including adjustment methods for addressing measured and unmeasured confounding
Learning activities and teaching methods (given in hours of study time)
Scheduled Learning and Teaching Activities | Guided independent study | Placement / study abroad |
---|---|---|
35 | 115 | 0 |
Details of learning activities and teaching methods
Category | Hours of study time | Description |
---|---|---|
Scheduled Learning and Teaching | 15 | Lectures (10 x 1.5 hours) |
Scheduled Learning and Teaching | 20 | Computer based workshops (10 x 2 hours) |
Guided independent study | 115 | Background reading and preparation for module assessments |
Formative assessment
Form of assessment | Size of the assessment (eg length / duration) | ILOs assessed | Feedback method |
---|---|---|---|
Multiple choice questions will be given as part of the workshops, and will be self-assessed | 10 questions for each workshop session | All | Oral staff where required |
Summative assessment (% of credit)
Coursework | Written exams | Practical exams |
---|---|---|
80 | 0 | 20 |
Details of summative assessment
Form of assessment | % of credit | Size of the assessment (eg length / duration) | ILOs assessed | Feedback method |
---|---|---|---|---|
Group presentation | 20 | 10 minutes (in groups of 3/4) | 4-10 | Written |
Written assignment | 80 | 2000-2500 words | 1-10 | Written |
0 | ||||
0 | ||||
0 | ||||
0 |
Details of re-assessment (where required by referral or deferral)
Original form of assessment | Form of re-assessment | ILOs re-assessed | Timescale for re-assessment |
---|---|---|---|
Group presentation (20%) | Individual presentation 10 minutes | 4-10 | Typically within six weeks of the result |
Written assignment (80%), 2000-2500 words | Written assignment | 1-10 | Typically within six weeks of the result |
Re-assessment notes
Please refer to the TQA section on Referral/Deferral: http://as.exeter.ac.uk/academic-policy-standards/tqa-manual/aph/consequenceoffailure/
Indicative learning resources - Basic reading
Basic reading:
- Essential Medical Statistics. Kirkwood and Stern, Blackwell Science. (Available online: http://encore.exeter.ac.uk/iii/encore/record/C__Rb3519976 )
- Introductory Statistics with R, Second Edition. Dalgaard, P. Springer and Hall (2008).
- An Introduction to Generalized Linear Models, Third Edition. Dobson, AJ and Barnett, AG, Chapman & Hall (2008).
- An Introduction to Statistical Learning with Applications in R Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf
- R for Data Science Garrett Grolemund, Hadley Wickham https://r4ds.had.co.nz/
- Data Analysis Using Regression and Multilevel/Hierarchical Models. Gelman and Hill, Cambridge University Press (2007). https://faculty.psau.edu.sa/filedownload/doc-12-pdf-a1997d0d31f84d13c1cdc44ac39a8f2c-original.pdf
Indicative learning resources - Web based and electronic resources
ELE page: https://vle.exeter.ac.uk/course/view.php?id=8441
Credit value | 15 |
---|---|
Module ECTS | 7.5 |
Module pre-requisites | None |
Module co-requisites | HPDM092 Fundamentals of Research Design |
NQF level (module) | 7 |
Available as distance learning? | No |
Origin date | 19/12/2019 |
Last revision date | 02/05/2023 |