DCU Home | Our Courses | Loop | Registry | Library | Search DCU
<< Back to Module List

Latest Module Specifications

Current Academic Year 2025 - 2026

Our systems are undergoing maintenance and are temporarily unavailable. Please try again later.

Module Title
Module Code (ITS: CA6012)
Faculty School
NFQ level Credit Rating
Description

This module will introduce students to important issues that arise in Natural Language Processing (NLP) because NLP systems are trained on human data, are used by humans, and directly affect human lives. Students will study (i) the particular properties that characterise human languages, (ii) issues arising from training NLP systems on human-generated data, (iii) implications of building NLP systems for use by humans, and (iv) responsibilities arising from the real impact that NLP systems, like other AI systems, have on people's lives. Students will learn about the fundamentals of theoretical and applied linguistics, the ethical issues and safeguards that need to be put in place in language data collection, and how NLP systems can be responsibly and informatively evaluated by human users and results be meaningfully analysed and responsibly reported. Students will be introduced to the different ways in which NLP systems directly and indirectly affect people, and what developers can do to assess impacts and mitigate them. Students will engage with these topics through lectures, hands-on exercises and a research project.

Learning Outcomes

1. Reflect on the basic properties of different human languages and how they result in different challenges for NLP system development
2. Reflect on and critically assess the role linguistics plays in NLP and comparatively assess different arguments about the need for linguistic knowledge in building NLP systems
3. Critically assess different ways of collecting language data and make informed choices about suitable methods in given contexts
4. Design, test and implement ethically and legally appropriate methods for collecting language data
5. Reflect on and apply in practice knowledge about bias in data and how it translates into bias in NLP systems
6. Reflect on and critique current research on diagnosing and fixing racial, gender and other types of bias in NLP systems.
7. Critically assess different methods for evaluating NLP systems and make informed decisions about what evaluation methods to use in different contexts.
8. Design, test and implement ethically and legally appropriate methods for human evaluation of NLP systems, including responsible reporting of results.
9. Reflect on and critique wider philosophical and sociological perspectives on the role of AI and NLP in society, their benefits and dangers.
10. Reflect on and apply in practice learning relating to working in a team to develop an NLP project proposal that includes ethically and legally appropriate data collection, addressing potential bias in the data, responsible vetting and selection of existing resources, and responsible evaluation and reporting of results.


WorkloadFull time hours per semester
TypeHoursDescription
Lecture242 hour lecture once weekly
Laboratory242 hour lab once a week
Assignment Completion80Practical tasks, project plan, case study and essay
Independent Study59.5This reflects the work carried out by students outside the lectures and labs (reading background material, preparing for discussions, finishing lab assignments
Total Workload: 187.5
Assessment Breakdown
TypeDescription% of totalAssessment Date
Practical/skills evaluationPractical tasks: During the weekly practical sessions, students will carry out assessed tasks, collected at four points in the module, including preparing discussion topics and exploring/revising code.20%As required
Group project Project proposal: Working in groups, students identify an NLP application task and develop a project proposal and work plan for it that includes ethically and legally appropriate data collection, addressing potential bias in the data, responsible vetting and selection of existing resources, and a comprehensive plan for responsible evaluation and reporting of results.30%Once per semester
Group assignmentCase study: Students will select a set of existing NLP systems that perform comparable tasks and test the systems to diagnose any of the potential issues related to bias covered in the lectures. Students will each select, test and report results for one NLP system individually (10%) and contribute to the comparative analysis and group report (10%).20%Once per semester
EssayEssay: Students select a topic relevant to the module content and write a 1250 word essay exploring the topic in depth.30%Once per semester
Reassessment Requirement Type
Resit arrangements are explained by the following categories;
RC1: A resit is available for both* components of the module.
RC2: No resit is available for a 100% coursework module.
RC3: No resit is available for the coursework component where there is a coursework and summative examination element.

* ‘Both’ is used in the context of the module having a coursework/summative examination split; where the module is 100% coursework, there will also be a resit of the assessment

Pre-requisite None
Co-requisite None
Compatibles None
Incompatibles None

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

Part I What is language data like
In the first part of the module, students will be introduced to the fundamentals of theoretical and applied linguistics from the perspective of their historical and current relevance to NLP. In practical exercises, students will for example explore the difference that knowledge about the properties of different languages can make to the controllability, performance, and transparency of NLP systems.

Part II Where does language data come from
In the second part of the module, students will learn about how the very large amounts of data required for building current NLP systems are collected, including from user-generated online content, processed and annotated. Issues will be explored that arise when such resources are not available (as is the case for most of the world’s languages) and how these can be addressed. The ethical and legal rights of the people who create the collected data will be examined, along with approaches and techniques designed to safeguard privacy and ensure ethical responsibility, including 'ethics by design' and emerging AI legal frameworks. Students will for example explore techniques for data collection and design plans for improving their alignment with current ethical and legal requirements.

Part III How do data and other factors affect the system
This part of the module will systematically explore the growing body of work on different forms of bias displayed by NLP systems trained on human-generated data, including techniques for diagnosing bias in systems, and current research efforts to develop techniques for automatically debiasing systems. Other factors that will be explored include application task construal and system design choices. In practical exercises, students will for example explore how NLP system biases lead to gender, racial and other identities being overwritten.

Part IV How do systems affect people
The final part of the module will survey the increasing spread of NLP-based systems into many areas of our daily lives, from automatic processing of college and job applications, to ubiquitous product and services recommender systems, and voice interaction with devices. Students will learn about degrees to which different groups of people are disadvantaged by systems that have bias built into them not only via the data they are trained on, but also via their application task and design. The importance of responsible evaluation and reporting of results will be further explored in this context. Students will learn about impact assessment and system evaluation in real-world contexts, exploring and applying their learning in a case study scenario.

Indicative Reading List

Books:
  • Emily M. Bender: 2013, Linguistic Fundamentals for Natural Language Processing, Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax (Synthesis Lectures on Human Language Technologies), Morgan & Claypool Publishers, 9781627050111
  • Emily M. Bender, Alex Lascarides: 2019, Linguistic Fundamentals for Natural Language Processing II 100 Essentials from Semantics and Pragmatics (Synthesis Lectures on Human Language Technologies), Morgan & Claypool Publishers, 268, 9781681730738


Articles:
  • Karen Spärck Jones: 2007, What about the Linguistics? Last Words., Computational Linguistics,, 33(3)., https://aclanthology.org/J07-3008.pdf, 48306
  • 2009: ACL Lifetime Achievement Award: The Dawn of Statistical ASR and MT, Computational Linguistics, 35(4)., https://aclanthology.org/J09-4004.pdf, 48307, 1
  • Book Reviews: Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax by Emily M. Bender: Computational Linguistics, https://aclanthology.org/J15-1007.pdf, 48320, 1, Jessica Vitak , Katie Shilton , Zahra Ashktorab
  • CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social: https://doi.org/10.1145/2818048.2820078, 48321, 1, Ai, Hua, Antoine Raux, Dan Bohus, Maxine Eskenazi, and Diane Litman., 2007
  • https://aclanthology.org/2007.sigdial-1.23.pdf: 48322, 1, Emily M. Bender, Batya Friedman, 2018, Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science, Transactions of the Association for Computational Linguistics, 6,
  • 48323: 1, Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, Monojit Choudhury, 2020, The State and Fate of Linguistic Diversity and Inclusion in the NLP World, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,
  • 1: John W. Ayers, Theodore L. Caputi, Camille Nebeker & Mark Dredze, 2018, Don’t quote me: reverse identification of research participants in social media studies, npj Digital Med, 1(30), https://doi.org/10.1038/s41746-018-0036-2
  • Läubli, Samuel , Castilho, Sheila , Neubig, Graham, Sennrich, Rico , Shen, Qinlan and Toral, Antonio: 2020, A set of recommendations for assessing human-machine parity in language translation, Journal of Artificial Intelligence Research, 67, http://dx.doi.org/10.1613/jair.1.11371, 48351
  • 2020: Racial disparities in automated speech recognition, Proceedings of the National Academy of Sciences, 17 (14), https://doi.org/10.1073/pnas.1915768117, 48352, 1
  • Automatically Neutralizing Subjective Bias in Text.: Proceedings of the AAAI Conference on Artificial Intelligence, 34(01), https://doi.org/10.1609/aaai.v34i01.5385, 48395, 1, Shneiderman B.
  • Proc Natl Acad Sci U S A: 113(48), https://doi.org/10.1073/pnas.1618211113, 48354, 1, Dirk Hovy, Shannon L. Spruit, 2016
  • https://aclanthology.org/P16-2096: 48355, 1, Angwin, J., & Larson, 2016, Bias in criminal risk scores is mathematically inevitable, researchers say., ProPublica, December 30,
  • 48356: 1, Emma Strubell, Ananya Ganesh, Andrew McCallum, 2019, Energy and Policy Considerations for Deep Learning in NLP, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,
  • 1: Kalluri, P., 2020, Don't ask if artificial intelligence is good or fair, ask how it shifts power, Nature, 583(7815), 169, https://doi.org/10.1038/d41586-020-02003-2
  • Nikolaos M. Siafakas: 0, Do we need a Hippocratic Oath for artificial intelligence scientists?, AI Magazine, Winter 2021, https://doi.org/10.1609/aimag.v42i4.15090,
Other Resources

  • Video: https://www.youtube.com/watch?v=fMym_BKWQzk2017, The Trouble with Bias., NEURIPS Keynote.,

<< Back to Module List View 2024/25 Module Record for