Skip to main content

Study information

Coding for Machine Learning and Data Science

Module titleCoding for Machine Learning and Data Science
Module codeHPDM139
Academic year2021/2
Credits15
Module staff

Dr Thomas Monks (Convenor)

Duration: Term123
Duration: Weeks

10

Number students taking module (anticipated)

25

Module description

Data science and machine learning are exciting rapidly evolving disciplines that offer huge potential for the future of health care, medicine and wider areas of science. To keep up with the pace of change a modern data scientist requires fundamental skills in coding. This module will:

• Boost your Python coding skills to a level where they are ready to undertake research and applied projects in data science and machine learning in health, medicine and general industry.
• Introduce you to the complexity of working with real world data in a health and medicine context.
• Introduce key machine learning concepts in supervised learning including an introduction to deep learning.
• Teach you coding skills that are transferable outside of health and medicine.

Module aims - intentions of the module

This module is suitable for students from a wide range of quantitative backgrounds who have some existing computer coding experience but wish to take these skills to a higher level. It will provide students working in health, medicine and wider scientific fields with the fundamental coding skills to conduct modern data science and machine learning.

The module is organised in two halves. In the first half of the module you will take a hands on approach to improving your existing Python skills, build a working knowledge of python’s data science libraries (NumPy, Pandas and MatplotLib), develop skills in data wrangling and gain an appreciation of a reproducible workflow. In the second half of the course, you will develop skills in machine learning used in research and practice. You will focus on working with complex data and be introduced to key machine learning infrastructure in Python.

The module will be suitable for students with varying levels of existing coding skills. The content will boost the skills of those students who have had no formal training in computing (e.g. those who have learnt online in their own time). In addition the module will reinforce the skills of students who have had formal training (e.g. in a computer science degree) and tailor them towards large complex health data challenges.

Intended Learning Outcomes (ILOs)

ILO: Module-specific skills

On successfully completing the module you will be able to...

  • 1. Demonstrate competence in the fundamentals of coding in the python programme language and produce code to a standard suitable for cutting edge research and industry applications.
  • 2. Analyse and manipulate complex data sets in health and demonstrate competence in building statistical and computational models to work with them in python.

ILO: Discipline-specific skills

On successfully completing the module you will be able to...

  • 3. Apply a wide range of supervised machine learning algorithms to model outcomes in complex datasets.
  • 4. Critically appraise data science problems and evaluate the tools that are needed to solve them.

ILO: Personal and key skills

On successfully completing the module you will be able to...

  • 5. Use a wide range of python tools including modern data science tools to conduct quantitative analyses.
  • 6. Explain and demonstrate the steps to follow in a reproducible scientific work flow used modern data science tools.
  • 7. Explain the importance of coding for high quality data science and machine learning research.

Syllabus plan

Whilst the module’s precise content may vary from year to year, an example of an overall structure is as follows:

  • An introduction to Linux and the OpenStack
  • The basics and advanced concepts of coding in standard Python
  • An introduction to Jupyter notebooks for data science and machine learning
  • Reproducible workflows in python and introduction to GitHub.
  • An introduction to NumPy, Pandas and MatplotLib
  • Advanced data wrangling in Python, NumPy and Pandas
  • An introduction to regression and classification in sklearn
  • An introduction to deep learning in python for supervised learning

Learning activities and teaching methods (given in hours of study time)

Scheduled Learning and Teaching ActivitiesGuided independent studyPlacement / study abroad
35115

Details of learning activities and teaching methods

CategoryHours of study timeDescription
Scheduled Learning and Teaching10Lectures (10 X 1 hour lectures)
Scheduled Learning and Teaching20Workshops / tutorials (10 x 2 hours)
Scheduled Learning and Teaching5Pre-recorded lectures on reproducible workflow (5 X 1 hour lectures)
Guided Independent Study115Background reading and preparation for module assessments

Formative assessment

Form of assessmentSize of the assessment (eg length / duration)ILOs assessedFeedback method
Computer lab exercises20 hours1-7Written answers to exercises. Verbal
Seminar discussion2 hours1-7Verbal

Summative assessment (% of credit)

CourseworkWritten examsPractical exams
10000

Details of summative assessment

Form of assessment% of creditSize of the assessment (eg length / duration)ILOs assessedFeedback method
Coding assignment 1501000 words1,4-7Written
Coding assignment 2501000 words2, 3-7Written

Details of re-assessment (where required by referral or deferral)

Original form of assessmentForm of re-assessmentILOs re-assessedTimescale for re-assessment
Coding assignment 1 (50%), 1000 wordsCoding assignment 11,4-7Typically within six weeks of the assignment.
Coding assignment 2 (50%),1000 wordsCoding assignment 22,3-7Typically within six weeks of the assignment.

Re-assessment notes

Please refer to the TQA section on Referral/Deferral: http://as.exeter.ac.uk/academic-policy-standards/tqa-manual/aph/consequenceoffailure/

Indicative learning resources - Basic reading

Basic reading

• Lutz. Learning Python. (2013). 5th Edition. O’Reilly
• Mckinney (2017). Python for data analysis. 2nd Edition. O’Reilly

Advanced reading:

• James, Wittenm Hastie, Tibshirani (2017). An introduction to statistical learning. 7th Edition. Springer.
• Geron. (2020). Hands-on machine learning with SciKit-Learn, Keras and Tensorflow. 2nd Edition. (updated for Tensorflow 2.0).

 

Indicative learning resources - Web based and electronic resources

• ELE – College to provide hyperlink to appropriate pages

Key words search

Python, Coding, Machine Learning, Data Science

Credit value15
Module ECTS

7.5

Module pre-requisites

None

Module co-requisites

None

NQF level (module)

7

Available as distance learning?

No

Origin date

12/01/2021

Last revision date

12/01/2021