Data Processing with AWK and UNIX


Big data has become a familiar resource in many fields of scientific research. It is now not uncommon for small research groups to have access to many gigabytes, or even terabytes of information. Unix has remained the fundamental platform for running applications used to process all kinds of scientific data, large and small. Understanding how to quickly retrieve, manage, process and interpret data of all sizes and forms in the Unix environment is becoming an increasingly critical skill.


Using Unix and its associated tools, analysts can:

  1. Quickly and effectively evaluate available hardware and software resources for a given task
  2. Transfer/download large scientific datasets and prepare them for downstream analysis quickly using lightweight built-in Unix tools such as AWK
  3. Install, compile, run and subsequently monitor software whist making maximum use of available hardware resources to further analyse data

Workshop Aims

The workshop briefly revisits the command line basics covered in the prerequisite Unix workshop, and quickly advances into learning and practicing the methods required to:

  • Assess available hardware and software resources available in a given Unix system
  • Fetch real datasets from a variety of sources
  • Filter, Clean and summarise datasets using AWK and other native Unix tools and understand which tool is the best fit for a given task
  • Compile, install and run software in Unix for further downstream analyses
  • Monitor and manage running processes effectively and get the most out of available system resources


The workshop will consist of introductory talks, and will then transition into free-flow working through the workshop material, with demonstrators at hand to help when needed.


  • Experience with the Unix command line and basic programming gained from any of the Unix-based Biomedical Informatics Hub workshops such as the Unix Workshop, Genomics, RNA-seq, Introduction to Python or introduction to R is required.
  • Individuals that have not taken any of these workshops, but have firm experience using the Unix environment, and some programming are welcome to attend.
  • The workshop is open to researchers across the University.





1pm – 4:30pm


Dr Marcus Tuke


B12 Hatherly