More Xkwic and Tgrep LING 5200 Computational Corpus

  • Slides: 29
Download presentation
More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006

More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006 1

Resources – Laura is bugging me to make a CU Corpora page… Like this

Resources – Laura is bugging me to make a CU Corpora page… Like this http: //www. stanford. edu/dept/linguistics/c orpora/cas-home. html n n TGREP http: //www. stanford. edu/dept/linguistics/ corpora/cas-tut-tgrep. html LING 5200, 2006 2 BASED on Kevin Cohen’s LING 5200

Searching with pos tags and ! n n [word = "[t. T]he" & !(

Searching with pos tags and ! n n [word = "[t. T]he" & !( pos = "DT" ) ]; wsj [ !(word = "water" | pos = "NN")]; [ !(word = "water") & !( pos = "NN")]; [ word != "water" & pos != "NN" ]; LING 5200, 2006 3 BASED on Kevin Cohen’s LING 5200

Operator precedence The precedence properties of the (logical) operators are defined by the following

Operator precedence The precedence properties of the (logical) operators are defined by the following list, i. e. if operator x is listed before operator y, operator x has precedence over y. Operators are evaluated left-right n =, !, &, | n [ ! word = "water" & ! pos = "NN" ]; disambiguates as n [ !(word = "water") & !( pos = "NN")]; LING 5200, 2006 4 BASED on Kevin Cohen’s LING 5200

Searching sequences with | and ? n "Bill" [pos = "NP"]; n [pos =

Searching sequences with | and ? n "Bill" [pos = "NP"]; n [pos = "NP"]; n n ([pos = "NP"]) | ([pos = "NP"] "of" [pos = "NP"]); ([pos = "NP"] "of“? [pos = "NP"]); Note: First match applies LING 5200, 2006 5 BASED on Kevin Cohen’s LING 5200

Corpus Position: wild cards and contexts n n n "give" []* "up"; "give" []{0,

Corpus Position: wild cards and contexts n n n "give" []* "up"; "give" []{0, 5} "up"; "give" []* "up" within 7; "Clinton" expand to 5; "Clinton" expand left to 5; "Clinton" expand right to 5; LING 5200, 2006 6 BASED on Kevin Cohen’s LING 5200

Assignments and Intersect n n n n Q 1 = "rain"; Q 2 =

Assignments and Intersect n n n n Q 1 = "rain"; Q 2 = [pos="NN"]; intersect Q 1 Q 2; Q 1 = [pos = "JJ"] [pos = "NN"]; Q 2 = "acid" "rain"; intersect Q 1 Q 2; [word = "acid" & pos = "JJ"] [word = "rain" & pos = "NN"] LING 5200, 2006 7 BASED on Kevin Cohen’s LING 5200

Structural restrictions n n "give" []* "up" within s; ("gain" []* "profit") | ("profit"

Structural restrictions n n "give" []* "up" within s; ("gain" []* "profit") | ("profit" []* "gain") within 3 s; ("gain" []* "profit") | ("profit" []* "gain") within article; "Clinton" expand left to 2 s; LING 5200, 2006 8 BASED on Kevin Cohen’s LING 5200

Defining structural restrictions n Nounphrase = [pos = "DT"] [pos = "JJ"] [pos =

Defining structural restrictions n Nounphrase = [pos = "DT"] [pos = "JJ"] [pos = "NN"]; n Nounphrase; n [pos = “JJ”] n Go back to select LING 5200, 2006 9 BASED on Kevin Cohen’s LING 5200

For fun n <s> [pos = "V. *"][pos = "PN. *”] </s> n <s>

For fun n <s> [pos = "V. *"][pos = "PN. *”] </s> n <s> []* [pos = "V. *"][pos = "PN. *”] </s> n ( [pos = “V. *”] [pos = “PN. *”]) within s n Not a question, not beginning of sentence… LING 5200, 2006 10 BASED on Kevin Cohen’s LING 5200

less is more n n n less <filename> cat ? ? /* | less

less is more n n n less <filename> cat ? ? /* | less Switches q q q SPACE – next screenful b– previous screenful /<reg exp pattern> /RNR search for pattern ? <reg exp pattern> search backwards for pattern q - quit LING 5200, 2006 11 BASED on Kevin Cohen’s LING 5200

Searching for a word n n tgrep Halloween – what happens? Why don’t you

Searching for a word n n tgrep Halloween – what happens? Why don’t you have to specify a file? babel>grep tgrep. cshrc # tgrep stuff #setenv TGREP_CORPUS /corpora/treebank 2/tbl_075/tgrepabl/brwn_cmb. crp setenv TGREP_CORPUS /corpora/treebank 2/tgrepabl/wsj_mrg. crp n n Count results: tgrep research | wc –l cat ? ? /* | grep Halloween | wc -l LING 5200, 2006 12 BASED on Kevin Cohen’s LING 5200

Tgrep Switches n n -a Match on all patterns in a sentence -w Return

Tgrep Switches n n -a Match on all patterns in a sentence -w Return the whole sentence -n Put the entire string on one line -t Print only the terminals LING 5200, 2006 13 BASED on Kevin Cohen’s LING 5200

Viewing it in sentential context n tgrep –wn Halloween | more n tgrep –wn

Viewing it in sentential context n tgrep –wn Halloween | more n tgrep –wn research | more (20, 865 hits) n Can also use less LING 5200, 2006 14 BASED on Kevin Cohen’s LING 5200

Viewing it in sentential context n tgrep –wn research | more LING 5200, 2006

Viewing it in sentential context n tgrep –wn research | more LING 5200, 2006 15 BASED on Kevin Cohen’s LING 5200

Searching by POS n tgrep NNS | more Another way to do your sanity

Searching by POS n tgrep NNS | more Another way to do your sanity check LING 5200, 2006 16 BASED on Kevin Cohen’s LING 5200

See more data? n tgrep NNS | grep. | more LING 5200, 2006 17

See more data? n tgrep NNS | grep. | more LING 5200, 2006 17 BASED on Kevin Cohen’s LING 5200

Sentential context (again) n tgrep –wn NNS | more LING 5200, 2006 18 BASED

Sentential context (again) n tgrep –wn NNS | more LING 5200, 2006 18 BASED on Kevin Cohen’s LING 5200

Searching by syntactic constituent n tgrep NP | more LING 5200, 2006 19 BASED

Searching by syntactic constituent n tgrep NP | more LING 5200, 2006 19 BASED on Kevin Cohen’s LING 5200

Single-line outputs n tgrep –n NP | more LING 5200, 2006 20 BASED on

Single-line outputs n tgrep –n NP | more LING 5200, 2006 20 BASED on Kevin Cohen’s LING 5200

Viewing tree-like output n tgrep –w NP | head 20 LING 5200, 2006 21

Viewing tree-like output n tgrep –w NP | head 20 LING 5200, 2006 21 BASED on Kevin Cohen’s LING 5200

Searching for relations between nodes n tgrep ‘NP < CC’ | head -16 LING

Searching for relations between nodes n tgrep ‘NP < CC’ | head -16 LING 5200, 2006 22 BASED on Kevin Cohen’s LING 5200

tgrep –g (whole language) n n n n A < B – A immediately

tgrep –g (whole language) n n n n A < B – A immediately dominates B A < B – A is immediately dominated by B A << B – A dominates B A >> B – A is dominated by B A. B – A immediately precedes B A. . B – A precedes B A<<, B – B is the leftmost descendent of A A<<‘B – B is the rightmost descendent of A LING 5200, 2006 23 BASED on Kevin Cohen’s LING 5200

Alternation n n node names can be ORed e. g. tgrep ‘Clinton|Gore’ | head

Alternation n n node names can be ORed e. g. tgrep ‘Clinton|Gore’ | head LING 5200, 2006 24 BASED on Kevin Cohen’s LING 5200

Character classes n n Regular expressions tgrep ‘/[Cc]hild/’ | egrep. | head LING 5200,

Character classes n n Regular expressions tgrep ‘/[Cc]hild/’ | egrep. | head LING 5200, 2006 25 BASED on Kevin Cohen’s LING 5200

Working towards that weird example… n tgrep ‘/[Pp]resident/’ | head LING 5200, 2006 26

Working towards that weird example… n tgrep ‘/[Pp]resident/’ | head LING 5200, 2006 26 BASED on Kevin Cohen’s LING 5200

Combining alternation and a regular expression n tgrep ‘Clinton|Gore|[Pp]resident/’ | head LING 5200, 2006

Combining alternation and a regular expression n tgrep ‘Clinton|Gore|[Pp]resident/’ | head LING 5200, 2006 27 BASED on Kevin Cohen’s LING 5200

Searching for a transitive verb n tgrep -w 'VP << like < NP <<

Searching for a transitive verb n tgrep -w 'VP << like < NP << DT' | more LING 5200, 2006 28 BASED on Kevin Cohen’s LING 5200

Verbs + Particles n tgrep -w 'VP << kick' > kick tgrep 'VP <<

Verbs + Particles n tgrep -w 'VP << kick' > kick tgrep 'VP << /kick. */ <2 PRT' kick tgrep 'VP <1 VB <2 PRT' kick tgrep -nw 'VP <1 /VB. */ <2 PRT' kick tgrep 'VP <1 (VB < kick) <2 PRT' kick tgrep 'VP <1 (/VB. */ < kick) <2 PRT' kick LING 5200, 2006 29 BASED on Kevin Cohen’s LING 5200