Text as Data
| Module title | Text as Data |
|---|---|
| Module code | POLM150 |
| Academic year | 2020/1 |
| Credits | 15 |
| Module staff | Dr Travis Coan (Convenor) |
| Duration: Term | 1 | 2 | 3 |
|---|---|---|---|
| Duration: Weeks | 11 |
| Number students taking module (anticipated) | 15 |
|---|
Module description
Effective analytics in the era of “big data” requires researchers to have a wide-range of tools at their disposal. This module focuses a set of tools that are essential for data analysts focusing on social scientific questions: text analysis. Text analytic methods are increasingly important for researchers in the policy domain, as textual information offers an abundant—and potentially vital—data source for policy analysis and evaluation. This module provides you with a practical introduction to collecting, processing, and analyzing “text as data.” You will formulate your own policy-relevant research questions and apply the tools introduced in the module to answer these questions. In doing so, you will gain a practical introduction the process of analyzing text as data, from collecting textual information to text classification and analysis.
Although there are no formal pre-requisites for taking the module, some (even limited) programming experience will be helpful. You will use the Python programming language to implement most of the tools introduced throughout the term. No prior knowledge of Python is assumed. You are encouraged to reach out to module convener with questions regarding Python or programming more generally.
Module aims - intentions of the module
There are three primary aims of the module. First, the module will provide an applied introduction to the use of text analysis in social scientific research. You are introduced to the entire research “pipeline” for a typical text-based project, including: a) collecting textual information online (e.g., web scraping), b) key approaches to text preprocessing and “feature extraction,” and c) supervised and unsupervised approaches to text classification. These methods are essential for data scientists interested social science questions. Second, the module introduces you to the Python programming language. Python is a popular language for scientific computing and knowledge of Python will place you at a competitive advantage in industry, government, or when pursing further education. Third, the module assessments aim to further reinforce the importance research design and thus provide students with yet another opportunity to hone critical research skills.
Intended Learning Outcomes (ILOs)
ILO: Module-specific skills
On successfully completing the module you will be able to...
- 1. apply appropriate tools for collecting and preprocessing textual information;
- 2. understand and apply a variety of text analysis methods to answer questions in social science and public policy;
- 3. critically evaluate the strengths and weaknesses of particular text analytic tools for answering research questions in the social and policy sciences;
ILO: Discipline-specific skills
On successfully completing the module you will be able to...
- 4. employ text analytic methods to empirically evaluate theories and hypotheses in the social and policy sciences;
- 5. evaluate the role of text analysis for supporting policy analysis and evaluation;
- 6. construct arguments based on textual data for both written and oral presentation;
- 7. demonstrate a strong command of research design through written and oral assessments;
ILO: Personal and key skills
On successfully completing the module you will be able to...
- 8. gain a solid foundation in the Python programming language;
- 9. communicate effectively in speech and writing;
- 10. work independently and within a limited time frame to complete a specified task.
Syllabus plan
Although the module’s precise content may vary from year to year, it is envisaged that the syllabus will cover the following topics:
- Programming in Python
- Collecting textual information online
- Preprocessing text for analysis and “feature selection”
- Dictionary-based methods for text classification
- Supervised and unsupervised learning for text classification
- Ideological scaling
- Using text-based measures in regression models
Learning activities and teaching methods (given in hours of study time)
| Scheduled Learning and Teaching Activities | Guided independent study | Placement / study abroad |
|---|---|---|
| 22 | 128 |
Details of learning activities and teaching methods
| Category | Hours of study time | Description |
|---|---|---|
| Scheduled Learning and Teaching Activity | 22 | 11 x 2 hour lectures |
| Guided Independent Study | 40 | Activities to familiarize you with the Python programming language |
| Guided Independent Study | 30 | Reading and preparing for lectures |
| Guided independent study | 58 | Research and analysis for final essay and presentation |
Formative assessment
| Form of assessment | Size of the assessment (eg length / duration) | ILOs assessed | Feedback method |
|---|---|---|---|
| Practicals | 4 short assignments to reinforce programming skills. | 1-5,8-10 | Written |
Summative assessment (% of credit)
| Coursework | Written exams | Practical exams |
|---|---|---|
| 85 | 0 | 15 |
Details of summative assessment
| Form of assessment | % of credit | Size of the assessment (eg length / duration) | ILOs assessed | Feedback method |
|---|---|---|---|---|
| Research proposal | 10 | 800 words | 2-7, 9-10 | Written |
| Research essay | 75 | 4,000 words | 1-10 | Written |
| Presentation (individual) | 15 | 10 minutes | 1-10 | Written |
Details of re-assessment (where required by referral or deferral)
| Original form of assessment | Form of re-assessment | ILOs re-assessed | Timescale for re-assessment |
|---|---|---|---|
| Research proposal | 1-2 pages proposal | 2-7,9-10 | August/September reassessment period |
| Research Essay | 4,000 words essay | 1-10 | August/September reassessment period |
| Presentation (individual) | 10 minutes presentation | 1-10 | Spring term |
Indicative learning resources - Basic reading
Basic reading:
- Swaroop C H, A Byte of Python. https://python.swaroopch.com.
- Diipanjan Sarkar, Text Analytics with Python: A Practical Real-World Approach (New York, NY: Springer).
- Justin Grimmer and Brandon M. Stewart (2013) “Text as Data: The Promise and Pitfalls of Automatic Content Methods for Political Texts,” Political Analysis 21 (3): 267-297.
- Michael Alvarez (eds), Computational Social Science (Cambridge, UK: Cambridge University Press).
Indicative learning resources - Web based and electronic resources
- Learn Python interactively online using Code School’s free Python course: https://www.codecademy.com/learn/python
Indicative learning resources - Other resources
- For more information on downloading and installing Python: https://wiki.python.org/moin/BeginnersGuide/Download
| Credit value | 15 |
|---|---|
| Module ECTS | 7.5 |
| Module pre-requisites | None |
| Module co-requisites | None |
| NQF level (module) | 7 |
| Available as distance learning? | No |
| Origin date | 30/05/2017 |
| Last revision date | 03/11/2020 |


