Skip to main content

Study information

Text as Data

Module titleText as Data
Module codePOLM150
Academic year2022/3
Credits15
Module staff

Dr Travis Coan (Convenor)

Duration: Term123
Duration: Weeks

11

Number students taking module (anticipated)

15

Module description

Effective analytics in the era of “big data” requires researchers to have a wide-range of tools at their disposal. This module focuses a set of tools that are essential for data analysts focusing on social scientific questions: text analysis. Text analytic methods are increasingly important for researchers in the policy domain, as textual information offers an abundant—and potentially vital—data source for policy analysis and evaluation. This module provides you with a practical introduction to collecting, processing, and analyzing “text as data.” You will formulate your own policy-relevant research questions and apply the tools introduced in the module to answer these questions. In doing so, you will gain a practical introduction the process of analyzing text as data, from collecting textual information to text classification and analysis.

Although there are no formal pre-requisites for taking the module, some (even limited) programming experience will be helpful. You will use the Python programming language to implement most of the tools introduced throughout the term. No prior knowledge of Python is assumed. You are encouraged to reach out to module convener with questions regarding Python or programming more generally.

Module aims - intentions of the module

There are three primary aims of the module. First, the module will provide an applied introduction to the use of text analysis in social scientific research. You are introduced to the entire research “pipeline” for a typical text-based project, including: a) collecting textual information online (e.g., web scraping), b) key approaches to text preprocessing and “feature extraction,” and c) supervised and unsupervised approaches to text classification.  These methods are essential for data scientists interested social science questions. Second, the module introduces you to the Python programming language. Python is a popular language for scientific computing and knowledge of Python will place you at a competitive advantage in industry, government, or when pursing further education. Third, the module assessments aim to further reinforce the importance research design and thus provide students with yet another opportunity to hone critical research skills.

Intended Learning Outcomes (ILOs)

ILO: Module-specific skills

On successfully completing the module you will be able to...

  • 1. apply appropriate tools for collecting and preprocessing textual information;
  • 2. understand and apply a variety of text analysis methods to answer questions in social science and public policy;
  • 3. critically evaluate the strengths and weaknesses of particular text analytic tools for answering research questions in the social and policy sciences;

ILO: Discipline-specific skills

On successfully completing the module you will be able to...

  • 4. employ text analytic methods to empirically evaluate theories and hypotheses in the social and policy sciences;
  • 5. evaluate the role of text analysis for supporting policy analysis and evaluation;
  • 6. construct arguments based on textual data for both written and oral presentation;
  • 7. demonstrate a strong command of research design through written and oral assessments;

ILO: Personal and key skills

On successfully completing the module you will be able to...

  • 8. gain a solid foundation in the Python programming language;
  • 9. communicate effectively in speech and writing;
  • 10. work independently and within a limited time frame to complete a specified task.

Syllabus plan

Although the module’s precise content may vary from year to year, it is envisaged that the syllabus will cover the following topics:

  • Programming in Python
  • Collecting textual information online
  • Preprocessing text for analysis and “feature selection”
  • Dictionary-based methods for text classification
  • Supervised and unsupervised learning for text classification
  • Ideological scaling
  • Using text-based measures in regression models

Learning activities and teaching methods (given in hours of study time)

Scheduled Learning and Teaching ActivitiesGuided independent studyPlacement / study abroad
22128

Details of learning activities and teaching methods

CategoryHours of study timeDescription
Scheduled Learning and Teaching Activity2211 x 2 hour lectures
Guided Independent Study40Activities to familiarize you with the Python programming language
Guided Independent Study30Reading and preparing for lectures
Guided independent study58Research and analysis for final essay and presentation

Formative assessment

Form of assessmentSize of the assessment (eg length / duration)ILOs assessedFeedback method
Practicals4 short assignments to reinforce programming skills.1-5,8-10Written

Summative assessment (% of credit)

CourseworkWritten examsPractical exams
85015

Details of summative assessment

Form of assessment% of creditSize of the assessment (eg length / duration)ILOs assessedFeedback method
Research proposal10800 words2-7, 9-10Written
Research essay754,000 words1-10Written
Presentation (individual)1510 minutes1-10Written
0
0
0

Details of re-assessment (where required by referral or deferral)

Original form of assessmentForm of re-assessmentILOs re-assessedTimescale for re-assessment
Research proposal1-2 pages proposal2-7,9-10August/September reassessment period
Research Essay4,000 words essay1-10August/September reassessment period
Presentation (individual)10 minutes presentation1-10Spring term

Indicative learning resources - Basic reading

Basic reading:

  • Swaroop C H, A Byte of Python. https://python.swaroopch.com.
  • Diipanjan Sarkar, Text Analytics with Python: A Practical Real-World Approach (New York, NY: Springer).
  • Justin Grimmer and Brandon M. Stewart (2013) “Text as Data: The Promise and Pitfalls of Automatic Content Methods for Political Texts,” Political Analysis 21 (3): 267-297.
  • Michael Alvarez (eds), Computational Social Science (Cambridge, UK: Cambridge University Press).

Indicative learning resources - Web based and electronic resources

Indicative learning resources - Other resources

Key words search

policy analytics, data analytics, text analysis, data science

Credit value15
Module ECTS

7.5

Module pre-requisites

None

Module co-requisites

None

NQF level (module)

7

Available as distance learning?

No

Origin date

30/05/2017

Last revision date

03/11/2020