COMM115 | 2025/6 | University of Exeter

MODULE TITLE	Data Science at Scale	CREDIT VALUE	15
MODULE CODE	COMM115	MODULE CONVENER	Unknown

DURATION: TERM	1	2	3
DURATION: WEEKS	12

Number of Students Taking Module (anticipated)	40

DESCRIPTION - summary of the module content

Data science and some machine learning technologies rely on large amounts of data to be effective and many commercial and scientific applications require the analysis of large quantities of heterogenous, noisy data on distributed machines. This module will examine the ways in which algorithms for data science can be implemented for large data and will discuss new algorithms specifically designed for large scale data. You will also work with large-scale distributed and cloud systems for storing and computing with big data.

AIMS - intentions of the module

Through theory and practice this module aims to equip you with an understanding of the principles of distributed computing, particularly on cloud-based systems, the ways in which data can be stored and accessed to allow efficient computation, and efficient algorithms for large-scale computation.

Distributed cloud computing will provide you with the underpinning knowledge required to develop and implement machine learning and artificial intelligence algorithms on distributed high-performance computing systems.

INTENDED LEARNING OUTCOMES (ILOs) (see assessment section below for how ILOs will be assessed)

On successful completion of this module you should be able to:

Module Specific Skills and Knowledge:

Explain the common challenges encountered in large scale data science projects.
Demonstrate competence in applying a range of abstraction and programming models for large-scale data processing.
Analyse and use a range of data storage models for parallel query processing
Apply cloud and distributed systems for data processing.
Design and implement algorithms for machine learning on large scale distributed systems.

Discipline Specific Skills and Knowledge:

Describe several different programming paradigms and associated data structures.
Critically analyse and compare a variety of data science methods and their applications to real problems.

Personal and Key Transferable/ Employment Skills and Knowledge:

Plan and write a technical report.
Adapt existing technical knowledge to learning new methods.

SYLLABUS PLAN - summary of the structure and academic content of the module

Introduction: the size of data and impediments to efficient computation;
Data storage and retrieval: relational databases and NoSQL systems;
Distributed systems and data: cloud computing and supercomputing; data distribution and consistency;
The MapReduce paradigm and implementations;
Algorithms for large scale learning: stochastic gradient descent, large scale linear algebra;
Stream processing;
Future architectures; co-design of hardware and algorithms.

LEARNING AND TEACHING

LEARNING ACTIVITIES AND TEACHING METHODS (given in hours of study time)

Scheduled Learning & Teaching Activities	35	Guided Independent Study	115	Placement / Study Abroad	0

DETAILS OF LEARNING ACTIVITIES AND TEACHING METHODS

Category	Hours of study time	Description
Scheduled Learning and Teaching	20	Lectures
Scheduled Learning and Teaching	15	Workshops and tutorials
Guided Independent Study	115	Coursework; private study; reading

ASSESSMENT

FORMATIVE ASSESSMENT - for feedback and development purposes; does not count towards module grade

Form of Assessment	Size of the assessment e.g. duration/length	ILOs assessed	Feedback method
Practical exercise	10	All	Answers to exercises and oral feedback

SUMMATIVE ASSESSMENT (% of credit)

Coursework	30	Written Exams	70	Practical Exams	0

DETAILS OF SUMMATIVE ASSESSMENT

Form of Assessment	% of credit	Size of the assessment e.g. duration/length	ILOs assessed	Feedback method
Written exam – closed book	70	2 hours	1-6	Orally, on request
Continuous assessment	30	30 hours	2-5, 7-9	Written

DETAILS OF RE-ASSESSMENT (where required by referral or deferral)

Original form of assessment	Form of re-assessment	ILOs re-assessed	Time scale for re-assessment
Written exam – closed book	Written Exam – closed book	1-6	Referral/deferral period
Continuous assessment	Continuous assessment	2-5, 7-9	Referral/deferral period

RE-ASSESSMENT NOTES

Reassessment will be by coursework and/or written exam in the failed or deferred element only. For referred candidates, the module mark will be capped at 50%. For deferred candidates, the module mark will be uncapped.

RESOURCES

INDICATIVE LEARNING RESOURCES - The following list is offered as an indication of the type & level of
information that you are expected to consult. Further guidance will be provided by the Module Convener

Basic reading:

Kleppmann, M., 2016. Designing data-intensive applications: the big ideas behind reliable, scalable, and maintainable systems. 1st ed. Sebastopol, CA: O'Reilly Media.
White, T., 2015. Hadoop: the definitive guide: storage and analysis at internet scale. 4th ed. Sebastopol, CA: O'Reilly Media.
Narkhede, N., Shapira, G. and Polino, T., 2016. Kafka: the definitive guide. 1st ed. Sebastopol, CA: O'Reilly Media.
Chambers, B., 2018. Spark: the definitive guide. 1st ed. Sebastopol, CA: O'Reilly Media.

Web-based and electronic resources:

ELE

Reading list for this module:

There are currently no reading list entries found for this module.

CREDIT VALUE	15	ECTS VALUE	7.5

PRE-REQUISITE MODULES	None
CO-REQUISITE MODULES	None

NQF LEVEL (FHEQ)	7	AVAILABLE AS DISTANCE LEARNING	No
ORIGIN DATE	Monday 11th November 2024	LAST REVISION DATE	Thursday 29th May 2025

KEY WORDS SEARCH	Data science, distributed computing, cloud computing

Data Science at Scale - 2025 entry