Skip to main content

Study information

Advanced Bioinformatics, Interpretation, Statistics and Data Quality Assurance

Module titleAdvanced Bioinformatics, Interpretation, Statistics and Data Quality Assurance
Module codeHPDM046Z
Academic year2024/5
Module staff

Dr Matthew Wakeling (Convenor)

Duration: Term123
Duration: Weeks




Number students taking module (anticipated)


Module description

This module is available either via blended learning with contact days on-campus, or as fully distance learning via our online platform. There may be some variation in scheduled teaching and learning activities depending on your mode of study.
The main challenge for application of genomic data is in its analysis and interpretation. In this module you will build on the knowledge and understanding gained in the Bioinformatics, Interpretation, Statistics and Data Quality Assurance module. You will learn how to use programming and scripting via the command line as well as the 'Galaxy' interface to formulate more complex research questions and analyse NHS data sets. You will gain a greater understanding of the different approaches to sequence data assembly and alignment and copy number variant and structural variant analysis.

Module aims - intentions of the module

The module will cover more advanced principles of informatics and bioinformatics applied to clinical genomics, how to find major genomic and genetic data resources for use in more complex data analysis, and use of programming and scripting via the command line. Theoretical sessions will be coupled with practical assignments of analysing and annotating predefined data sets. Upon completion of this module you will be eligible to base your MSc research project on data from the ‘100,000 Genomes Project’.

Intended Learning Outcomes (ILOs)

ILO: Module-specific skills

On successfully completing the module you will be able to...

  • 1. Apply programming and scripting knowledge via the command line as well as the ‘Galaxy’ interface and a more advanced application of statistical methods for the handling and analysis of sequencing data for application in both diagnostic and research settings.
  • 2. Demonstrate practical experience of the bioinformatics pipeline through the ‘Genomics England’ programme.
  • 3. Critically evaluate the principles applied to quality control of sequencing data and filtering strategies to identify single nucleotide, copy number and structural variation in sequencing data.

ILO: Discipline-specific skills

On successfully completing the module you will be able to...

  • 4. Justify and defend the place of Professional Best Practice Guidelines in the diagnostic setting for the reporting of genomic variation
  • 5. Interrogate major data sources and integrate with clinical data, to assess the pathogenic and clinical significance of the genome result.

ILO: Personal and key skills

On successfully completing the module you will be able to...

  • 6. Critically reflect on personal practice and make connections between known and unknown areas, to allow for personal development, adaptation and change.
  • 7. Respond to innovation and new technologies and be able to evaluate these in the context of best practice and the need for improved service delivery.
  • 8. Communicate accurately and effectively with peers, tutors and the public.

Syllabus plan

Whilst the module's precise content may vary from year to year, an example of an overall structure is as follows:

  • Use of programming and scripting knowledge via the command line.
  • Aligning genome sequencing reads to a reference sequence using up to date alignment programmes (e.g. BWA) and comparison to de novo assembly.
  • Use of tools to call copy number and structural variants, annotation of variant-call files using established databases.
  • Assessment of data quality through application of quality control measures and control data sets.
  • Filtering strategies of variants, in context of clinical data, and using publically available control data sets.
  • Use of multiple database sources, in silico tools and literature for pathogenicity evaluation, and familiarity with the statistical programmes to support this.
  • Use of tools to view the alignment of reads to the reference genome to assess coverage at the gene and variant level and provide visual read level support for a variant e.g. IGV, Ensembl, UCSC.
  • Principles of orthogonal validation and confirmation for different types of variant.
  • Principles of integration of laboratory and clinical information, and place of best-practice guidelines for indicating the clinical significance of results.
  • How to analyse genomic data to identify epigenetic and other variation that modifies phenotype.
  • Practice in examples of analysis of genomic data in the Training Embassy within the 'Genomics England' Data Centre.


Learning activities and teaching methods (given in hours of study time)

Scheduled Learning and Teaching ActivitiesGuided independent studyPlacement / study abroad

Details of learning activities and teaching methods

CategoryHours of study timeDescription
Scheduled learning and teaching activities6Lectures (on-campus or online)
Scheduled learning and teaching activities12Workshops (on-campus or online)
Guided independent study5Preparation for scheduled learning and teaching
Guided independent study5Tutor guided online discussion forum
Guided independent study10Tutor guided online workshop
Guided independent study10Preparation of case based project
Guided independent study22Answering online problem solving questions (including preparation)
Guided independent study80Online resources and independent guided literature research.

Formative assessment

Form of assessmentSize of the assessment (eg length / duration)ILOs assessedFeedback method
Participation in online workshopWeekly (1-2 hours weekly)1-8Written
Participation in online discussion forumWeekly (1 hour weekly)1-8Written

Summative assessment (% of credit)

CourseworkWritten examsPractical exams

Details of summative assessment

Form of assessment% of creditSize of the assessment (eg length / duration)ILOs assessedFeedback method
Case-based report1002500 words, (including code)1-5Written

Details of re-assessment (where required by referral or deferral)

Original form of assessmentForm of re-assessmentILOs re-assessedTimescale for re-assessment
Case-based report (100%)Report (2500 words, including code)1-5Typically within six weeks of the result

Re-assessment notes

RE-ASSESSMENT NOTES – Please refer to the TQA section on Referral/Deferral:


Indicative learning resources - Basic reading

Genomic medicine is moving so fast that most of the up to date information on bioinformatics will be offered not in books but as articles and other online resources.

 The following books provide some good basic background knowledge and information to build on during the course.

  • Lesk, A. (2014) Introduction to Bioinformatics. Oxford University Press (4 th Edition).
  • Schwartz, R., Foy, B. and Phoenix, T. (2011). Learning Perl. Making Easy Things Easy and Hard Things Possible. Beijing: O'Reilly Media.
  • Grolemund, G. (2014). Hands-On Programming with R. Write Your Own Functions and Simulations. Garrett Grolemund. Beijing: O'Reilly Media

Indicative learning resources - Web based and electronic resources




Web based and electronic resources:



Key words search

Alignment, data analysis, bioinformatics, sequencing, filtering

Credit value15
Module ECTS


NQF level (module)


Available as distance learning?


Origin date


Last revision date