Week 2

Lecture 2: Words, tokenization, tagged text

This week we will look at

  • some basic linguistics concepts related to words
  • the processes of tokenization and normalization
  • tagged text

Slides

Recording

Mandatory reading

Jurafsky and Martin, Speech and Language Processing, 3. ed. (edition of 16 Oct,. 2019!)

  • Ch. 2 Regular expressions etc
    • Sec. 2.0
    • Sec. 2.2 Words
    • Sec. 2.3 Corpora
    • Sec. 2.4 Normalization, except 2.4.3 and the technical details of 2.4.1
  • Ch. 8 Part-of-speech-tagging,
    • Sec 8.1 and 8.2

NLTK Book

  • Ch. 3, sec. 6 Normalizing Text
  • Ch. 3, sec. 8 Segmentation
  • Ch. 5, sec. 1 Using a tagger
  • Ch. 5, sec. 2 Tagged corpora

Wikipedia

Recommended reading

Wikipedia

Lab-session 1, Tuesday 25 August at Sed

Observe that the group sessions have moved to Sed with 28 seats!

Bring either your laptop or keyboard+mouse!

Lab setup

Program

Solutions

Recording

Published Aug. 21, 2020 2:24 PM - Last modified Sep. 11, 2020 5:36 PM