Overview LING 5200 Computational Corpus Linguistics Martha Palmer

  • Slides: 24
Download presentation
Overview LING 5200 Computational Corpus Linguistics Martha Palmer 1

Overview LING 5200 Computational Corpus Linguistics Martha Palmer 1

What’s a corpus? n Mc. Enery & Wilson: q (i) (loosely) any body of

What’s a corpus? n Mc. Enery & Wilson: q (i) (loosely) any body of text q (ii) (most commonly) a body of machinereadable text q (iii) (more strictly) a finite collection of machine-readable text, sampled to be maximally representable of a language or variety LING 5200, 2006 2 BASED on Kevin Cohen’s LING 5200

What’s corpus linguistics? n “the study of language based on examples of ‘real life’

What’s corpus linguistics? n “the study of language based on examples of ‘real life’ language use” (Mc. Enery & Wilson) q n A methodology, not a branch of linguistics Biber et al. : q Uses computers q “Natural” texts q Large & principled collection q Both quantitative and qualitative LING 5200, 2006 3 BASED on Kevin Cohen’s LING 5200

What was Chomsky’s complaint? n n Linguistics should model competence not performance. What are

What was Chomsky’s complaint? n n Linguistics should model competence not performance. What are the underlying rules that allow us to generate language? Context – structuralists believed in collecting linguistic data about a language without taking meaning and communication into consideration. Mirrors the debate between the rationalists and the empiricists. But, does Chomsky account for meaning? (see Searle) LING 5200, 2006 4 BASED on Kevin Cohen’s LING 5200

Which Linguistic branches can make use of corpus linguistics? n n n Phonetics Phonology

Which Linguistic branches can make use of corpus linguistics? n n n Phonetics Phonology Morphology Syntax Semantics Pragmatics LING 5200, 2006 n n n 5 Psycholinguistics Computational Lx Descriptive Lx Historical Lx Sociolinguistics BASED on Kevin Cohen’s LING 5200

Corpus linguistics in context data Natural Language Processing Corpus Linguistics applications Computational Linguistics models

Corpus linguistics in context data Natural Language Processing Corpus Linguistics applications Computational Linguistics models LING 5200, 2006 6 BASED on Kevin Cohen’s LING 5200

What’s LING 5200 Corpus Linguistics? n n Tools Techniques LING 5200, 2006 7 BASED

What’s LING 5200 Corpus Linguistics? n n Tools Techniques LING 5200, 2006 7 BASED on Kevin Cohen’s LING 5200

Overview n n n Quick intro to Unix A little corpus design Quick tour

Overview n n n Quick intro to Unix A little corpus design Quick tour of corpora and annotation Tools for working with corpora Programming in Python Some software engineering LING 5200, 2006 8 BASED on Kevin Cohen’s LING 5200

Why Python? n n It works Many advantages It’s a bona fide programming language

Why Python? n n It works Many advantages It’s a bona fide programming language You’ll need it for CSCI 5832 LING 5200, 2006 9 BASED on Kevin Cohen’s LING 5200

Administrative things n n n Textbooks – Unix, Python Office hours – Mon 5

Administrative things n n n Textbooks – Unix, Python Office hours – Mon 5 -6, Tues 1 -2 verbs. colorado. edu/mpalmer/ling 5200 Prerequisites - none Grades – homeworks/project Accounts on babel LING 5200, 2006 10 BASED on Kevin Cohen’s LING 5200

Logging on for the first time n n n First thing to do: change

Logging on for the first time n n n First thing to do: change your password. passwd Give it your current password, then your new password. Repeat the new one. (to catch typos) LING 5200, 2006 11 BASED on Kevin Cohen’s LING 5200

Connecting with another computer ssh –l your_name babel. colorado. edu You are prompted to

Connecting with another computer ssh –l your_name babel. colorado. edu You are prompted to log in. LING 5200, 2006 12 BASED on Kevin Cohen’s LING 5200

Logging on for the first time, again n First thing to do: change your

Logging on for the first time, again n First thing to do: change your password. passwd Give it your current password, then your new password. Repeat the new one. (Why? ) LING 5200, 2006 13 BASED on Kevin Cohen’s LING 5200

Where am I? n Type pwd n You see something like this: /home/mpalmer LING

Where am I? n Type pwd n You see something like this: /home/mpalmer LING 5200, 2006 14 BASED on Kevin Cohen’s LING 5200

What's that mean? ? LING 5200, 2006 15 BASED on Kevin Cohen’s LING 5200

What's that mean? ? LING 5200, 2006 15 BASED on Kevin Cohen’s LING 5200

Important directories / bin LING 5200, 2006 home etc usr mpalmer local ling 5200

Important directories / bin LING 5200, 2006 home etc usr mpalmer local ling 5200 bin RCS 16 BASED on Kevin Cohen’s LING 5200

Important directories / bin /home/mpalmer/ling 5200 LING 5200, 2006 home etc usr mpalmer local

Important directories / bin /home/mpalmer/ling 5200 LING 5200, 2006 home etc usr mpalmer local ling 5200 bin RCS 17 BASED on Kevin Cohen’s LING 5200

Important directories / bin /home/mpalmer/ling 5200 LING 5200, 2006 home etc usr mpalmer local

Important directories / bin /home/mpalmer/ling 5200 LING 5200, 2006 home etc usr mpalmer local ling 5200 bin RCS 18 /usr/local/bin BASED on Kevin Cohen’s LING 5200

Navigating directories n ls to list contents, cd to change directory q n n

Navigating directories n ls to list contents, cd to change directory q n n n Directories are just like windows folders /home/mpalmer shortcut: ~ “the directory above this one”: . . “this directory”: . LING 5200, 2006 19 BASED on Kevin Cohen’s LING 5200

What's in the neighborhood? n n Type ls You see a list of directories

What's in the neighborhood? n n Type ls You see a list of directories and files that are contained within the current directory Homework_1. txt tools buglog. txt LING 5200, 2006 20 BASED on Kevin Cohen’s LING 5200

I'd like to go somewhere else… n n Type pwd Type cd Where are

I'd like to go somewhere else… n n Type pwd Type cd Where are you? Type cd. . n Where are you? Type cd your_user_id n Where are you? n LING 5200, 2006 21 BASED on Kevin Cohen’s LING 5200

Unix is a verb-initial language cd. . "go" LING 5200, 2006 where to go

Unix is a verb-initial language cd. . "go" LING 5200, 2006 where to go 22 BASED on Kevin Cohen’s LING 5200

Unix is a verb-initial language cd "go" LING 5200, 2006 If no argument, I

Unix is a verb-initial language cd "go" LING 5200, 2006 If no argument, I assume you mean "home" 23 BASED on Kevin Cohen’s LING 5200

Making a new directory n n n n Type cd ls mkdir ling 5200

Making a new directory n n n n Type cd ls mkdir ling 5200 ls Go to the directory you just made (how? ) Type pwd Type ls LING 5200, 2006 24 BASED on Kevin Cohen’s LING 5200