Fundamentals of Data Science - 2022 entry
| MODULE TITLE | Fundamentals of Data Science | CREDIT VALUE | 30 |
|---|---|---|---|
| MODULE CODE | MTHM601 | MODULE CONVENER | Dr Tim Hughes (Coordinator) |
| DURATION: TERM | 1 | 2 | 3 |
|---|---|---|---|
| DURATION: WEEKS | 11 | 0 | 0 |
| Number of Students Taking Module (anticipated) | 50 |
|---|
This module develops core skills in data science, modelling, and essential programming skills. The ability to extract information from data as a basis for evidence-based decision making and policy is becoming increasingly important across a wide variety of sectors in the world of big data, including climate, health, technology, and the environment. This module will equip you with the tools required to collate, import and manipulate data, together with methods for inference. You will be introduced to different types and sources of data and the tools for performing data analysis, from producing informative graphical summaries to generating sophisticated visualisations. These techniques are crucial both as the basis for communication and for informing complex modelling. This will be placed in a contemporary and cutting edge setting through the use of locally curated and global open source datasets, and will draw on the flexible and freely available programming environments of Python and R.
|
Module Specific Skills and Knowledge: |
|
|
1 |
Demonstrate the ability to import, manipulate and summarise data, including an understanding of the relative merits of different methods of formatting; |
|
2 |
Demonstrate an understanding of how data source and way of collection effect subsequent data analyses; |
|
3 |
Demonstrate effective use of Python and/or R/RStudio to facilitate data wrangling, unsupervised and supervised data analyses; |
|
Discipline Specific Skills and Knowledge: |
|
|
4 |
Demonstrate effective and efficient data processing and programming skills; |
|
5 |
Demonstrate competencies of data visualization; |
|
6 |
Demonstrate an understanding of the methodology and practical use of a range of data analysis techniques, including unsupervised and supervised machine learning and statistical modelling methods; |
|
7 |
Demonstrate an understanding of common pitfalls in data processing and analysis and how to avoid them; |
|
8 |
Demonstrate appreciation and understanding of relevant datasets in application areas; |
|
Personal and Key Transferable/ Employment Skills and Knowledge: |
|
|
9 |
Data and statistical analysis skills; |
|
10 |
Use of Python, R/RStudio and other software; |
|
11 |
Effective use of learning resources; |
|
12 |
Report writing and presentation. |
-
Data collection, pre-processing and communication:
- Cleansing;
- Visualisation;
- Handling missing, corrupted, uncertain and/or biased data;
-
Effective programming:
- Coding in R/R Studio and Python;
- Computer Hardware;
- Version control, collaborative and high performance computing;
- Reproducible programming;
-
Analysis:
- Fundamentals of probability, linear algebra and calculus;
- Fundamentals of statistical modelling;
- Sampling and sampled data;
- Inference, confidence intervals, and hypothesis testing;
- Regression analysis and model selection;
- Spatial-temporal and hierarchical models;
- Introduction to machine learning: supervised methods (e.g., classification and regression) and unsupervised methods (e.g., clustering and dimensionality reduction);
-
Application areas;
- Datasets for ecology and evolution: populations, infectious diseases, biodiversity, genetics;
- Datasets for renewable energy: solar, wind, marine (resource and generation data), electricity/heat consumption, smart grid;
- Datasets for environment and sustainability: sustainable development indices, health, weather and climate, land and marine pollution.
| Scheduled Learning & Teaching Activities | 60 | Guided Independent Study | 240 | Placement / Study Abroad | 0 |
|---|
|
Category |
Hours of study time |
Description |
|
Scheduled Learning and Teaching Activities |
30 |
Lectures and tutorials |
|
Scheduled Learning and Teaching Activities |
30 |
Hands-on practical sessions |
|
Guided Independent Study |
120 |
Self-study and background reading |
|
Guided Independent Study |
120 |
Assessed data analyses, quizzes, report writing and preparation for presentations |
| Form of Assessment | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
|---|---|---|---|
| Exercises | Several quizzes/exercise sheets | 1-11 | Oral, during tutorial sessions |
| Practicals | Several practical sheets for self-directed and guided learning | 1-11 | Oral, during tutorial sessions |
| Coursework | 100 | Written Exams | 0 | Practical Exams | 0 |
|---|
| Form of Assessment | % of Credit | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
|---|---|---|---|---|
| Exercises | 50 | Several quizzes/ exercise sheets (4 expected) | 1-11 | Written, oral or automated feedback |
| Report | 50 | Approx. 10-15 pages | 1-12 | Written |
| Original Form of Assessment | Form of Re-assessment | ILOs Re-assessed | Time Scale for Re-assessment |
|---|---|---|---|
| Exercises | Coursework (100%) | 1-11 | To be agreed by consequences of failure meeting |
| Report | Coursework (100%) | All | To be agreed by consequences of failure meeting |
information that you are expected to consult. Further guidance will be provided by the Module Convener
Web-based and electronic resources:
- ELE – College to provide hyperlink to appropriate pages
Other resources:
- Recent articles and open-source codes provided by the tutors.
Reading list for this module:
| Type | Author | Title | Edition | Publisher | Year | ISBN |
|---|---|---|---|---|---|---|
| Set | James, G., Witten, D., Hastie, T., Tibshirani, R. | An Introduction to Statistical Learning: with Applications in R | Springer | 2013 | 978-1461471370 | |
| Set | Simon Rogers & Mark Girolami | A First Course in Machine Learning | 2nd | CRC Press | 2016 | B01N7ZEBK8 |
| Set | Murphy, K. | Machine Learning: A Probabilistic Perspective | 1st | MIT Press | 2012 | 978-0-262-018029 |
| Set | Hastie T., Tibshirani R. & Friedman J. | The Elements of Statistical Learning: Data Mining, Inference, and Prediction | 2nd | Springer | 2009 | 978-0387848587 |
| Set | Bishop, C. | Pattern Recognition and Machine Learning | 1 | Springer | 2006 | 978-0387310732 |
| Set | Aurelien Geron | Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow | O'Reilly | 2019 | 978-1492032649 | |
| Set | Sebastian Raschka, Vahid Mirjalili | Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow | 2nd | Packt Publishing | 2017 | 978-1787125933 |
| CREDIT VALUE | 30 | ECTS VALUE | 15 |
|---|---|---|---|
| PRE-REQUISITE MODULES | None |
|---|---|
| CO-REQUISITE MODULES | None |
| NQF LEVEL (FHEQ) | 7 | AVAILABLE AS DISTANCE LEARNING | No |
|---|---|---|---|
| ORIGIN DATE | Monday 14th December 2020 | LAST REVISION DATE | Wednesday 25th May 2022 |
| KEY WORDS SEARCH | Data processing; Data visualisation; Programming; Statistical modelling; Machine learning; Applied data analysis |
|---|
Please note that all modules are subject to change, please get in touch if you have any questions about this module.


