Study information

# Health Statistics for Data Scientists

Module title Health Statistics for Data Scientists HPDM096 2023/4 15 Dr Harry Green ()
 Duration: Term Duration: Weeks 1 2 3 10 0 0
 Number students taking module (anticipated) 20

## Module description

This module provides a broad introduction to statistical modelling for data scientists. The module starts by considering the different stages of a statistical investigation and emphasising the importance of problem formulation. The module highlights the benefits of exploratory data analysis based on descriptive statistics and graphs.  Key concepts in probability theory and the role of statistical distributions in modelling health data will be covered.  The core part of the module provides a foundation in regression modelling to include simple linear regression, logistic regression, survival analysis and models that account for complex temporal and hierarchical data structures. In the later part of the module, you will learn advanced techniques for making causal inferences about the effectiveness of health interventions. This will include instrumental variable analysis, regression discontinuity designs and the difference-in-differences method. Throughout this module, you will gain practical experience of statistical computing using the R software environment and exposure to case studies based on real-world health data.

## Module aims - intentions of the module

The aim of the module is to provide a modern statistical framework for answering health research questions through interrogation of health datasets derived from randomised trials, electronic health recordor other observational studies. The module will equip you with the theoretical underpinning and computational skills needed for advanced regression modelling of health dataBoth frequentist and Bayesian approaches to modelling will be considered and contrasted. The module will consider potential sources of bias when key variables are unmeasured or contain missing values and explore a range of advanced statistical methods for strengthening causal inferences in real-world health evaluations.  The module will emphasise the fundamental role of the statistician as a problem solver and consider the different stages of the “problem solving” cycle. Case studies will be used to help you develop an appreciation of modelling strategy and to give you practical experience of interpreting model findings in the context of real health problems.

## Intended Learning Outcomes (ILOs)

### ILO: Module-specific skills

On successfully completing the module you will be able to...

• 1. Examine and apply fundamental concepts in statistical modelling and inference, including conditional probability, statistical distributions, sampling variability, estimators, bias and likelihood functions
• 2. Critically evaluate the Bayesian and frequentist frameworks for statistical inference including their strengths, limitations and differences
• 3. Examine the theoretical basis of linear regression, generalized linear models and survival analysis
• 4. Apply a range of regression and causal inference methods to address health data science problems
• 5. Critically evaluate the strengths and limitations of different statistical methods, including regression models and causal inference methods, within a health data science project

### ILO: Discipline-specific skills

On successfully completing the module you will be able to...

• 6. Formulate health research questions as statistical problems
• 7. Draw conclusions from the results of a data analysis and justify those conclusions, appropriately acknowledging uncertainty in the results

### ILO: Personal and key skills

On successfully completing the module you will be able to...

• 8. Use the R software environment for statistical computing
• 9. Understand and critically appraise academic research papers in research field
• 10. Effectively communicate arguments, evidence and conclusions using a variety of formats in a manner appropriate to the intended audience

## Syllabus plan

Whilst the module’s precise content may vary from year to year, an example of an overall structure is as follows:

• Formulating statistical problems
• Statistical computing using R
• Exploratory data analysis
• Probability theory and statistical distributions
• Interval estimation and hypothesis testing including parametric and non-parametric methods
• Likelihoods and maximum likelihood estimation
• Bayesian and frequentist inference
• Monte Carlo simulation
• Power and sample size calculations
• Linear regression modelling
• Generalised linear models
• Longitudinal and survival analysis
• Multilevel models for hierarchical data
• Missing data mechanisms and multiple imputation
• Causal inference for healthcare evaluations including adjustment methods for addressing measured and unmeasured confounding

## Learning activities and teaching methods (given in hours of study time)

Scheduled Learning and Teaching ActivitiesGuided independent studyPlacement / study abroad
351150

## Details of learning activities and teaching methods

CategoryHours of study timeDescription
Scheduled Learning and Teaching15Lectures (10 x 1.5 hours)
Scheduled Learning and Teaching20Computer based workshops (10 x 2 hours)
Guided independent study115Background reading and preparation for module assessments

## Formative assessment

Form of assessmentSize of the assessment (eg length / duration)ILOs assessedFeedback method
Multiple choice questions will be given as part of the workshops, and will be self-assessed10 questions for each workshop session AllOral staff where required

## Summative assessment (% of credit)

CourseworkWritten examsPractical exams
80020

## Details of summative assessment

Form of assessment% of creditSize of the assessment (eg length / duration)ILOs assessedFeedback method
Group presentation2010 minutes (in groups of 3/4)4-10Written
Written assignment 802000-2500 words1-10Written
0
0
0
0

## Details of re-assessment (where required by referral or deferral)

Original form of assessmentForm of re-assessmentILOs re-assessedTimescale for re-assessment
Group presentation (20%)Individual presentation 10 minutes 4-10Typically within six weeks of the result
Written assignment (80%), 2000-2500 wordsWritten assignment 1-10Typically within six weeks of the result

## Indicative learning resources - Basic reading

• An Introduction to Generalized Linear Models, Third Edition. Dobson, AJ and Barnett, AG, Chapman & Hall (2008).

https://reneues.files.wordpress.com/2010/01/an-introduction-to-generalized-linear-models-second-edition-dobson.pdf

• An Introduction to Statistical Learning with Applications in R Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf

## Indicative learning resources - Web based and electronic resources

### Key words search

probability, statistical distribution, estimator, likelihood, regression, survival analysis, Cox model, mixed effects model, bias, confounding, causation, Bayesian methods

Credit value 15 7.5 None HPDM092 Fundamentals of Research Design 7 No 19/12/2019 02/05/2023