Data Science at Scale - 2025 entry
| MODULE TITLE | Data Science at Scale | CREDIT VALUE | 15 |
|---|---|---|---|
| MODULE CODE | COMM115 | MODULE CONVENER | Unknown |
| DURATION: TERM | 1 | 2 | 3 |
|---|---|---|---|
| DURATION: WEEKS | 12 |
| Number of Students Taking Module (anticipated) | 40 |
|---|
Data science and some machine learning technologies rely on large amounts of data to be effective and many commercial and scientific applications require the analysis of large quantities of heterogenous, noisy data on distributed machines. This module will examine the ways in which algorithms for data science can be implemented for large data and will discuss new algorithms specifically designed for large scale data. You will also work with large-scale distributed and cloud systems for storing and computing with big data.
Through theory and practice this module aims to equip you with an understanding of the principles of distributed computing, particularly on cloud-based systems, the ways in which data can be stored and accessed to allow efficient computation, and efficient algorithms for large-scale computation.
Distributed cloud computing will provide you with the underpinning knowledge required to develop and implement machine learning and artificial intelligence algorithms on distributed high-performance computing systems.
On successful completion of this module you should be able to:
Module Specific Skills and Knowledge:
-
Explain the common challenges encountered in large scale data science projects.
-
Demonstrate competence in applying a range of abstraction and programming models for large-scale data processing.
-
Analyse and use a range of data storage models for parallel query processing
-
Apply cloud and distributed systems for data processing.
-
Design and implement algorithms for machine learning on large scale distributed systems.
Discipline Specific Skills and Knowledge:
-
Describe several different programming paradigms and associated data structures.
-
Critically analyse and compare a variety of data science methods and their applications to real problems.
Personal and Key Transferable/ Employment Skills and Knowledge:
-
Plan and write a technical report.
-
Adapt existing technical knowledge to learning new methods.
- Introduction: the size of data and impediments to efficient computation;
- Data storage and retrieval: relational databases and NoSQL systems;
- Distributed systems and data: cloud computing and supercomputing; data distribution and consistency;
- The MapReduce paradigm and implementations;
- Algorithms for large scale learning: stochastic gradient descent, large scale linear algebra;
- Stream processing;
- Future architectures; co-design of hardware and algorithms.
| Scheduled Learning & Teaching Activities | 35 | Guided Independent Study | 115 | Placement / Study Abroad | 0 |
|---|
| Category | Hours of study time | Description |
| Scheduled Learning and Teaching | 20 | Lectures |
| Scheduled Learning and Teaching | 15 | Workshops and tutorials |
| Guided Independent Study | 115 | Coursework; private study; reading |
| Form of Assessment | Size of the assessment e.g. duration/length | ILOs assessed | Feedback method |
| Practical exercise | 10 | All | Answers to exercises and oral feedback |
| Coursework | 30 | Written Exams | 70 | Practical Exams | 0 |
|---|
| Form of Assessment | % of credit | Size of the assessment e.g. duration/length | ILOs assessed | Feedback method |
| Written exam – closed book | 70 | 2 hours | 1-6 | Orally, on request |
| Continuous assessment | 30 | 30 hours | 2-5, 7-9 | Written |
| Original form of assessment | Form of re-assessment | ILOs re-assessed | Time scale for re-assessment |
| Written exam – closed book | Written Exam – closed book | 1-6 | Referral/deferral period |
| Continuous assessment | Continuous assessment | 2-5, 7-9 | Referral/deferral period |
Reassessment will be by coursework and/or written exam in the failed or deferred element only. For referred candidates, the module mark will be capped at 50%. For deferred candidates, the module mark will be uncapped.
information that you are expected to consult. Further guidance will be provided by the Module Convener
- Kleppmann, M., 2016. Designing data-intensive applications: the big ideas behind reliable, scalable, and maintainable systems. 1st ed. Sebastopol, CA: O'Reilly Media.
- White, T., 2015. Hadoop: the definitive guide: storage and analysis at internet scale. 4th ed. Sebastopol, CA: O'Reilly Media.
- Narkhede, N., Shapira, G. and Polino, T., 2016. Kafka: the definitive guide. 1st ed. Sebastopol, CA: O'Reilly Media.
- Chambers, B., 2018. Spark: the definitive guide. 1st ed. Sebastopol, CA: O'Reilly Media.
Web-based and electronic resources:
- ELE
Reading list for this module:
| CREDIT VALUE | 15 | ECTS VALUE | 7.5 |
|---|---|---|---|
| PRE-REQUISITE MODULES | None |
|---|---|
| CO-REQUISITE MODULES | None |
| NQF LEVEL (FHEQ) | 7 | AVAILABLE AS DISTANCE LEARNING | No |
|---|---|---|---|
| ORIGIN DATE | Monday 11th November 2024 | LAST REVISION DATE | Thursday 29th May 2025 |
| KEY WORDS SEARCH | Data science, distributed computing, cloud computing |
|---|
Please note that all modules are subject to change, please get in touch if you have any questions about this module.


