Skip to main content

Study information

Data Science at Scale - 2025 entry

MODULE TITLEData Science at Scale CREDIT VALUE15
MODULE CODECOMM115 MODULE CONVENERUnknown
DURATION: TERM 1 2 3
DURATION: WEEKS 12
Number of Students Taking Module (anticipated) 40
DESCRIPTION - summary of the module content

Data science and some machine learning technologies rely on large amounts of data to be effective and many commercial and scientific applications require the analysis of large quantities of heterogenous, noisy data on distributed machines. This module will examine the ways in which algorithms for data science can be implemented for large data and will discuss new algorithms specifically designed for large scale data. You will also work with large-scale distributed and cloud systems for storing and computing with big data.

AIMS - intentions of the module

Through theory and practice this module aims to equip you with an understanding of the principles of distributed computing, particularly on cloud-based systems, the ways in which data can be stored and accessed to allow efficient computation, and efficient algorithms for large-scale computation.

Distributed cloud computing will provide you with the underpinning knowledge required to develop and implement machine learning and artificial intelligence algorithms on distributed high-performance computing systems.

INTENDED LEARNING OUTCOMES (ILOs) (see assessment section below for how ILOs will be assessed)

On successful completion of this module you should be able to:

Module Specific Skills and Knowledge:

  1. Explain the common challenges encountered in large scale data science projects.

  2. Demonstrate competence in applying a range of abstraction and programming models for large-scale data processing.

  3. Analyse and use a range of data storage models for parallel query processing

  4. Apply cloud and distributed systems for data processing.

  5. Design and implement algorithms for machine learning on large scale distributed systems.

Discipline Specific Skills and Knowledge:

  1. Describe several different programming paradigms and associated data structures.

  2. Critically analyse and compare a variety of data science methods and their applications to real problems.

Personal and Key Transferable/ Employment Skills and Knowledge:

  1. Plan and write a technical report.

  2. Adapt existing technical knowledge to learning new methods.

SYLLABUS PLAN - summary of the structure and academic content of the module
  • Introduction: the size of data and impediments to efficient computation;
  • Data storage and retrieval: relational databases and NoSQL systems;
  • Distributed systems and data: cloud computing and supercomputing; data distribution and consistency;
  • The MapReduce paradigm and implementations;
  • Algorithms for large scale learning: stochastic gradient descent, large scale linear algebra;
  • Stream processing;
  • Future architectures; co-design of hardware and algorithms.
LEARNING AND TEACHING
LEARNING ACTIVITIES AND TEACHING METHODS (given in hours of study time)
Scheduled Learning & Teaching Activities 35 Guided Independent Study 115 Placement / Study Abroad 0
DETAILS OF LEARNING ACTIVITIES AND TEACHING METHODS
Category  Hours of study time  Description 
Scheduled Learning and Teaching 20 Lectures
Scheduled Learning and Teaching 15 Workshops and tutorials
Guided Independent Study 115 Coursework; private study; reading

 

ASSESSMENT
FORMATIVE ASSESSMENT - for feedback and development purposes; does not count towards module grade
Form of Assessment Size of the assessment e.g. duration/length ILOs assessed Feedback method
Practical exercise 10 All Answers to exercises and oral feedback

 

SUMMATIVE ASSESSMENT (% of credit)
Coursework 30 Written Exams 70 Practical Exams 0
DETAILS OF SUMMATIVE ASSESSMENT
Form of Assessment % of credit Size of the assessment e.g. duration/length ILOs assessed  Feedback method
Written exam – closed book 70 2 hours 1-6 Orally, on request
Continuous assessment 30 30 hours 2-5, 7-9 Written

 

DETAILS OF RE-ASSESSMENT (where required by referral or deferral)
Original form of assessment Form of re-assessment  ILOs re-assessed Time scale for re-assessment
Written exam – closed book Written Exam – closed book 1-6 Referral/deferral period
Continuous assessment Continuous assessment 2-5, 7-9 Referral/deferral period

 

RE-ASSESSMENT NOTES

Reassessment will be by coursework and/or written exam in the failed or deferred element only. For referred candidates, the module mark will be capped at 50%. For deferred candidates, the module mark will be uncapped.

RESOURCES
INDICATIVE LEARNING RESOURCES - The following list is offered as an indication of the type & level of
information that you are expected to consult. Further guidance will be provided by the Module Convener

Basic reading:

  • Kleppmann, M., 2016. Designing data-intensive applications: the big ideas behind reliable, scalable, and maintainable systems. 1st ed. Sebastopol, CA: O'Reilly Media.
  • White, T., 2015. Hadoop: the definitive guide: storage and analysis at internet scale. 4th ed. Sebastopol, CA: O'Reilly Media.
  • Narkhede, N., Shapira, G. and Polino, T., 2016. Kafka: the definitive guide. 1st ed. Sebastopol, CA: O'Reilly Media.
  • Chambers, B., 2018. Spark: the definitive guide. 1st ed. Sebastopol, CA: O'Reilly Media.

Web-based and electronic resources:

  • ELE

Reading list for this module:

There are currently no reading list entries found for this module.

CREDIT VALUE 15 ECTS VALUE 7.5
PRE-REQUISITE MODULES None
CO-REQUISITE MODULES None
NQF LEVEL (FHEQ) 7 AVAILABLE AS DISTANCE LEARNING No
ORIGIN DATE Monday 11th November 2024 LAST REVISION DATE Thursday 29th May 2025
KEY WORDS SEARCH Data science, distributed computing, cloud computing

Please note that all modules are subject to change, please get in touch if you have any questions about this module.