Study information

# Statistical Data Modelling - 2023 entry

MODULE TITLE CREDIT VALUE Statistical Data Modelling 15 MTHM506 Dr Oscar Rodriguez De Rivera Ortega (Coordinator)
DURATION: TERM 1 2 3
DURATION: WEEKS 5 (October start) / 0 (January start) 0 (October start) / 5 (January start)
 Number of Students Taking Module (anticipated) 50
DESCRIPTION - summary of the module content

Statistical modelling lies at the heart of modern data analysis and is a vital part of data science, particularly when decision making is involved. Simple statistical models include linear regression familiar from most foundation courses in statistics. This module places linear regression into the very broad framework of Bayesian statistical data modelling, which has become one of the most popular approaches to data analysis. Bayesian inference will be introduced as a unifying modelling framework, and the module will introduce modelling concepts such as Generalized Linear Models, Generalized Additive Models, Hierarchical Models, Multi-Level Models, Discrete Mixture Models, Models for Flawed Data and predictive model validation. These will provide you with a toolbox and the ability to analyse any real world data set, including binary data, count data, contingency tables, data with temporal and spatial structure as well as data that are missing or partially missing. We will use the statistical software R as the main platform to fit this wide range of models, and will use it in practical sessions so that, as well as a sound theoretical basis, you will develop an understanding of how to apply techniques discussed in the module in practical data analysis.

AIMS - intentions of the module

Statistical data modelling offers a systematic and rigorous way of describing data and thus the mechanisms and processes that generated them. Uncertainty is formally quantified in terms of probability. This module will formally define statistical data modelling as a process by which we can use the data as subjective judgement to construct a mathematical description of the data. It will then argue that Bayesian inference is truly a unifying framework with which we can build and check the validity of statistical data models, while fully quantifying the different sources of uncertainty that result in the apparent haphazard nature of real data sets. The module will introduce well-established but fairly restrictive models such as GLMs but then move on to present more state-of-the-art approaches such as GAMs and Bayesian Hierarchical Models as well as a conceptual framework for correcting flaws in observational data sets (such as censoring). The module will introduce a plethora of real data sets spanning a wide range of applications such as public health, weather, climate, ecology, biology, epidemiology, natural hazards and many others.

INTENDED LEARNING OUTCOMES (ILOs) (see assessment section below for how ILOs will be assessed)

On successful completion of this module you should be able to:

Module Specific Skills and Knowledge

1 Show understanding of the many different types of data structures that can commonly occur and the need to respect the nature of the data in building statistical models;

2 Demonstrate awareness of, and ability to apply, the unifying power of Bayesian inference for data analysis and its use in inference (e.g. quantifying relationships) and prediction;

3 Reveal awareness of, and ability to apply, related modern developments in statistical modelling techniques, including nonparametric and semi-parametric formulations (GAMs), Bayesian hierarchical modelling and models for flawed data;

4 Utilise appropriate software and a suitable computer language for advanced modelling of data;

Discipline Specific Skills and Knowledge

5 Demonstrate understanding and appreciation of, and aptitude in, the mathematical definition of stochastic models for data perceived to arise at random;

6 Apply simulation-based numerical integration methods in the context of Bayesian statistical modelling

7 Appreciate and apply the concept of piecewise processes and their use in semi-parametric statistical models

8 Understanding of the multivariate Normal distribution and its use in Bayesian statistical modelling

Personal and Key Transferable / Employment Skills and Knowledge

9 Show advanced data analysis skills and be able to communicate associated reasoning and interpretations effectively in writing;

10 Apply relevant computer software competently;

11 Use learning resources appropriately;

12 Exemplify self-management and time-management skills;

13 Gain experience in problem solving using data analysis.
SYLLABUS PLAN - summary of the structure and academic content of the module

- Introduction of linear regression as a special case of a statistical model and of statistical modelling as a method;

- Value of Bayesian inference as a unifying modelling framework;

- Posterior predictive model checking;

- Generalised linear models (GLMs): definition and historical use;

- Generalised Additive Models (GLMs): definition and a method to capture space-time structures;

- Normal approximation to the posterior and connection to maximum likelihood;

- Hierarchical Models: definition and links to random effects and multi-level models;

- Discrete mixture models and zero-inflation;

- Models for flawed data.

LEARNING AND TEACHING
LEARNING ACTIVITIES AND TEACHING METHODS (given in hours of study time)
 Scheduled Learning & Teaching Activities Guided Independent Study 30 120
DETAILS OF LEARNING ACTIVITIES AND TEACHING METHODS
 Category Hours of study time Description Scheduled learning and teaching 20 Lectures Scheduled learning and teaching 10 Hands-on practical sessions Guided Independent Study 36 Post lecture study and reading Guided Independent Study 84 Formative and summative coursework preparation

ASSESSMENT
FORMATIVE ASSESSMENT - for feedback and development purposes; does not count towards module grade
Form of Assessment Size of Assessment (e.g. duration/length) ILOs Assessed Feedback Method
Unassessed Practical Modelling Exercises 20 exercises 1-13 Verbal, in class

SUMMATIVE ASSESSMENT (% of credit)
 Coursework Written Exams 100 0
DETAILS OF SUMMATIVE ASSESSMENT
Form of Assessment % of Credit Size of Assessment (e.g. duration/length) ILOs Assessed Feedback Method
Coursework – practical modelling exercises and theoretical problems 50 10 Hours 1-13 Written and oral
Coursework – data analysis project 50 20 Hours 1-13 Written and oral

DETAILS OF RE-ASSESSMENT (where required by referral or deferral)
Original Form of Assessment Form of Re-assessment ILOs Re-assessed Time Scale for Re-assessment
CW - Practical modelling exercises 1* CW - Practical modelling exercises 1 1-13 Ref/Def Period

CW - data analysis group project * CW - data analysis individual project 1-13 Ref/Def Period

RE-ASSESSMENT NOTES

Deferrals: Reassessment will be by coursework in the deferred element only. For deferred candidates, the module mark will be uncapped.

Referrals: Reassessment will be by a single piece of coursework worth 100% of the module only. As it is a referral, the mark will be capped at 50%.

RESOURCES
INDICATIVE LEARNING RESOURCES - The following list is offered as an indication of the type & level of
information that you are expected to consult. Further guidance will be provided by the Module Convener

Type Author Title Edition Publisher Year ISBN
Set Aitkin, M., Francis, B., Hinde, J. and Darnell, R. Statistical Modelling in R Oxford University Press 2008 9780199219131
Set Crawley, M.J. The R Book Wiley 2007 9780470510247
Set Faraway, J.J. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models Chapman & Hall 2006 158488424X
Set Wood, Simon N. Generalized Additive Models: An Introduction with R Chapman & Hall/CRC 2006 978-1584884743
Set Gelman, A. and Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models Cambridge University Press 2007 052168689X
Set Krzanowski W.J. An Introduction to Statistical Modelling Arnold 1998 000-0-340-69185-9
CREDIT VALUE ECTS VALUE 15 7.5
PRE-REQUISITE MODULES None None
NQF LEVEL (FHEQ) AVAILABLE AS DISTANCE LEARNING 7 No Monday 14th September 2020 Friday 9th December 2022
KEY WORDS SEARCH None Defined