CAISA Lab

MA-INF 4115 INTRODUCTION TO NATURAL LANGUAGE PROCESSING

Winter Semester 2023 – 2024

Updates !!!

17.10.2023: The first exercise starts on Wednesday, 25.10.2023 at 2:15 PM.
17.10.2023: The first lecture starts on Thursday, 26.10.2023 at 10:15 AM.

Logistics

  • Lectures: are on Thursday 10:15 AM - 11:45 AM in B-IT-Max 0.109 (Friedrich-Hirzebruch-Allee 6). The Zoom link is posted on eCampus.

  • Exercises: are on Wednesday in B-IT-Max 0.109. You can choose one of the following exercise groups to attend. The Zoom link is posted on eCampus.

    • Group 1: 2:15 PM - 3:45 PM (Vahid)
    • Group 2: 4:00 PM - 5:30 PM (Ulvi)
  • Course Materials: will be uploaded every week on eCampus.

  • Contact: Students should ask all course-related questions in our forum discussion on eCampus. For external inquiries, emergencies, or personal matters, you can email us at itnlp.uni.bonn@gmail.com.

  • Office Hours: Please reach out to us first via mail to arrange any in-person meeting.

    • Prof. Dr. Lucie Flek: Friedrich-Hirzebruch-Allee 6 (B-IT) – Room: 2.123
    • Vahid Sadiri Javadi: Friedrich-Hirzebruch-Allee 6 (B-IT) – Room: 2.120

Content

What is the Introduction to NLP course about?

This course provides a technical perspective on NLP - methods for building computer software that understands and manipulates human language. Contemporary data-driven approaches are emphasized, focusing on machine learning techniques. The covered applications vary in complexity, including for example Entity Recognition, Argument Mining, or Emotion Analysis.

Through lectures, exercises, and a final project, you will gain a thorough introduction to cutting-edge research in NLP, from the linguistic basis of computational language methods to recent advances in deep learning and large language models.

Recommended participation requirements:

  • Basic programming knowledge in Python and Machine Learning
  • Basics of Machine Learning
  • Basic knowledge of Python Libraries for ML (NumPy, Scikit-Learn, Pandas)
  • Basics of Probability, Linear Algebra and Statistics

Coursework

Assignments (Prerequisite for the exam): will be uploaded on eCampus.

  • Credit:
    • Assignment 1 (10%): Word Operations
    • Assignment 2 (20%): Text Classification (Scikit-Learn)
    • Assignment 3 (20%): Word Vectors (SpaCy)
    • Assignment 4 (30%): Fine-tuning with LLMs (Hugging Face)
    • Assignment 5 (20%): Hidden Markov Model
  • Deadlines: All assignments are due on Tuesday before the exercise class at 11:59 PM. All deadlines are listed in the schedule.
  • Submission: Assignments should be submitted via eCampus. Further instructions are given in each assignment file. Please do not email us your assignments.
  • Collaboration: Working on assignments in a group of 2 students is allowed. please name your file with both student names. File name:
  • Grade/ Feedback: You will receive your graded assignment every week on eCampus.
    NOTE: You need to achieve at least 50% of the credits to be allowed to take the exam.

Final Project (40%)

  • Project Types: Students choose one of the following project types:
    • Default Project: Students choose one of the datasets we listed here, formulate a real-world problem (PF), and try to solve it (PS) by training a model or fine-tuning a pre-trained LLM.
      Submission: [Code + Report for final results]
    • Resource Creation Project: To answer this question: How to generate an NLP dataset from any internet source? Students design a pipeline to build and annotate a dataset. They should define at least two NLP downstream tasks for their dataset.
      Submission: [Code + Dataset + Report for formulated NLP tasks]
    • Robustness and Reproducibility Project: To measure the ability of a model or an NLP system to perform consistently and accurately across a wide range of inputs and conditions, students collect and annotate an evaluation set with 100 – 200 instances and test at least two models with the new evaluation set.
      Submission: [Code + Evaluation set + Report for final results]
  • Project components:
    • Problem Formulation (PF) (10%)
        [Guideline for Default Project] 
      [Guideline for Resource Creation Project]
      [Guideline for Robustness and Reproducibility Project]
    • Problem Solving (PS) (15%)
    • Project Poster (PP) (5%):
        [Guideline for Poster]
      
    • Project Report (PR) (10%):
        [Guideline for Default Project] 
      [Guideline for Resource Creation Project]
      [Guideline for Robustness and Reproducibility Project]
  • Submission: Depending on which project type the students choose, they submit each project component on eCampus in the following format:
      PF: A PDF file with this name: Team_.pdf 
    PP: A PDF file with this name: Team_.pdf
    PS + PR: A ZIP file containing all the necessary files with this name: Team_.zip
  • Deadlines: All deadlines for PF, PP, and PS + PR are listed in the schedule.
  • Mentors: Every team has a mentor, who gives feedback and advice during the project.
  • Computing resources:
    • CS Faculty: You can add your Student ID to this list. GSG will provide you with additional computing resources on behalf of the CAISA lab.
    • Saturn Cloud: You can use 150 hours a month free of 64GB RAM and GPU instances. Check this out.
    • Google Colaboratory: Colab is a hosted Jupyter Notebook service that provides free access to computing resources, including GPUs and TPUs. Check this out.
  • Using external resources: You can use any machine learning or deep learning framework you like (Scikit-learn, PyTorch, TensorFlow, etc.). You may use any existing code, libraries, etc., and consult papers, books, online references, etc. for your project. However, you must cite your sources in your final project report.
  • Team:
    • Team size: Students should do final projects in teams of 3 up to 5 people. Larger teams are expected to do correspondingly larger projects.
    • Building a team: You can either find your teammates on your own or ask us to find teammates for you. You may join the CS Master Bonn Discord Server.
    • Submission:
      • Please send us the list of your team members via itnlp.uni.bonn@gmail.com. in the following format:
          Subject: ITNLP - WS2023 - |Matr. Nr.| 
        Team Speaker: |Name|, |Matr. Nr.|, |Mail Addr.
        Team Members: |Name|, |Matr. Nr.|, |Mail Addr.| |Name|, |Matr. Nr.|, |Mail Addr.|
      • In case, you need a teammate, please mail us at itnlp.uni.bonn@gmail.com.
           Subject: ITNLP - WS2023 - Looking for a team 
        |Name|, |Matr. Nr.|, |Mail Addr.|
    • Deadline: is listed in the schedule.
    • Contribution: In the final report we ask for a statement of what each team member contributed to the project. Team members will typically get the same grade, but we may differentiate in extreme cases of unequal contribution. You can contact us in confidence in the event of unequal contribution.

Exam (60%)

  • Exam dates: will be announced as soon as we receive the rooms and dates from the examination office.
  • Allowed material: Calculator is permitted.

Allocation

  • 3 + 1 SWS
  • Master in Media Informatics: 6 ECTS credits
  • Master in computer science at University of Bonn: MA-INF 4115 6 CP
  • Students must register for the exam on POS/BASIS.

Literature

  • J. Eisenstein: Introduction to Natural Language Processing
  • Jurafsky, Daniel, and James H. Martin. “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.”
  • S. Bird, E. Klein, E. Loper; Natural Language Processing with Python

Schedule

  Date Description Events Deadlines
Week 1 Lecture (Thu Oct 26)      
Week 1 Exercise (Wed Oct 25) Introduction & Python basics    
Week 2 Lecture (Thu Nov 2)      
Week 2 Exercise (Wed Nov 1) HOLIDAY    
Week 3 Lecture (Thu Nov 9)      
Week 3 Exercise (Wed Nov 8) Word operations and feature extraction using Pandas & Sklearn Assignment 1 OUT Team Members DUE
Week 4 Lecture (Thu Nov 16)      
Week 4 Exercise (Wed Nov 15) Linear classification using TF - IDF Assignment 2 OUT Assignment 1 DUE
Week 5 Lecture (Thu Nov 23)      
Week 5 Exercise (Wed Nov 22) Word embeddings using spaCy Assignment 3 OUT Assignment 2 DUE
Week 6 Lecture (Thu Nov 30)      
Week 6 Exercise (Wed Nov 29) Q & A: PF + PS   Problem Formulation Assignment 3 DUE
Week 7 Lecture (Thu Dec 7)      
Week 7 Exercise (Wed Dec 6) Transformers and Generative Models I    
Week 8 Lecture (Thu Dec 14)      
Week 8 Exercise (Wed Dec 13) Transformers and Generative Models II Assignment 4 OUT  
Week 9 Lecture (Thu Dec 21)      
Week 9 Exercise (Wed Dec 20) POS tagging & HMMs Assignment 5 OUT Assignment 4 DUE
Week 10 Lecture (Thu Jan 11)      
Week 10 Exercise (Wed Jan 10) Project development   Assignment 5 DUE
Week 11 Lecture (Thu Jan 18)      
Week 11 Exercise (Wed Jan 17) Project development    
Week 12 Lecture (Thu Jan 25)     Poster DUE
Week 12 Exercise (Wed Jan 24) Project development    
Week 13 Lecture (Thu Feb 01)      
Week 13 Exercise (Wed Jan 31) Project Presentation (Poster)