z Maia Petee Ant Conc Analysis COCA News

  • Slides: 14
Download presentation
z Maia Petee Ant. Conc Analysis: COCA News Corpus

z Maia Petee Ant. Conc Analysis: COCA News Corpus

z Corpus § A sample corpus downloaded from the byu. corpus. edu website §

z Corpus § A sample corpus downloaded from the byu. corpus. edu website § Comprised of equal samples of COCA: 1990 – 2012, Spoken, News, and Magazine genres § 1. 7 million words total § Ant. Conc technical difficulties necessitated a smaller corpus

z Cleaning the corpus § Stop word list used § Lemma list used §

z Cleaning the corpus § Stop word list used § Lemma list used § Many small files combined into one; duplicate lines in accompanying lemma lists cleaned with Reg. Ex: § ^(. *? )$s+? ^(? =. *^1$)

z Concordance: “Catastrophe” Seen alongside “decline, ” “mortality, ” and “failures”

z Concordance: “Catastrophe” Seen alongside “decline, ” “mortality, ” and “failures”

z Concordance: “Stun”

z Concordance: “Stun”

z Concordance: “Hunger”

z Concordance: “Hunger”

z N-Grams N-Gram Occurrences New York 624 years ago 230 the first 623 the

z N-Grams N-Gram Occurrences New York 624 years ago 230 the first 623 the country 222 the same 480 thank you 218 the world 469 the president 213 of course 271 the past 209 United States 260 this year 209 last year 241

z Content Words 1. people 2. time 3. year(s) 4. footage 5. way 6.

z Content Words 1. people 2. time 3. year(s) 4. footage 5. way 6. good 7. day 8. world 9. game 10. man 11. long 12. season 13. life 14. big 15. state 16. old 17. team 18. home 19. president 20. money As with the n-grams, a few areas of focus become clear: areas of interest to the most people in the most general sense, i. e. “time, ” “years, ” “day, ” “season, ” “life, ” and “home. ” The broad genres of finance, government, and sports are represented with “money, ” “president, ” “state, ” and “team, ” and simple descriptors for these things with “long, ” “old, ” “big, ” and “good. ”

z Collocations: “People” To search for collocates, parameters were constrained to three words to

z Collocations: “People” To search for collocates, parameters were constrained to three words to the left and three to the right. Collocate frequency was set at a minimum of four, to select for relatively common topics of news reporting. § If the names of newscasters Kaledin, Trippi, and Ingraham (used here to note who is speaking in transcriptions of newscasts) are removed from the list, a trend of reporting stories that focus on human tragedy becomes clear. The top five collocates for “people” are “disabilities, ” “pollution, ” “innocent, ” “dying, ” and “Haitian, ” not all of which are immediately linkable to negative news. However, consider how “innocent” is usually used in news broadcasts: to refer to the injury, demise, or harm of those who do not deserve it. “Haitian, ” similarly, refers to the denizens of one of the world’s poorest countries, who are frequently reported on specially for the poverty and corruption they have to endure.

z Collocations: “Time” The second collocate, “time, ” is less telling about the pervasively

z Collocations: “Time” The second collocate, “time, ” is less telling about the pervasively negative nature of the news. Removing the metaanalytical proper noun “Newsweek” gives us “consuming, ” “limits, ” “spend, ” “rounds, ” and “ordinary” as the top five collocates. “Consuming is present because of the common adjective “time-consuming, ” “limits” along with “time” refers to deadlines for various initiatives, “spend some time” is used most frequently to describe either neutral or desirable ways of passing the days, specifically “spending some time” with a wife and children (the subjects of these sentences are usually men in positions of prominence, so the context is that spending time with family doesn’t happen enough). Spending time is also used in broadcasts to introduce the subject of a particular segment (“spend some time with X”), and to entreat viewers not to be anxious about things out of their control (“don’t spend too much time worrying…”). “Rounds” is used twice to refer to ammunition (in the context of “varmint hunting”), and once to refer to a golf game. “Ordinary, ” interestingly enough, is used here in religious context. “Ordinary time” refers to a lengthy period in the Catholic Christian liturgical calendar, and for the duration of this period sermons are printed in the newspaper.

z Collocations: “Year” Two of the top collocates to “year, ” “olds” and “old,

z Collocations: “Year” Two of the top collocates to “year, ” “olds” and “old, ” are there mostly in a functional rather than a content-rich sense; they refer to the age of a subject or group that is being talked about (“ 6 -year-olds, ” “ 20 -year-old woman, ” etc. ). For the purposes of this analysis, they will be disregarded. The other collocates of “year, ” “minors, ” “hander, ” “rookie, ” “contract, ” and “freshman, ” are entirely sports-related. “Minors” here refers to the minor leagues, “hander” to which hand a player bats with, ” “rookie” to a player new to the game in that particular year, “contract” to an agreement to keep a player on a team for the specified number of years, and “freshman” to college sports players in their first year.

z Keyword List Due to the difficulty of making generalizations from a keyword list

z Keyword List Due to the difficulty of making generalizations from a keyword list 790 words long, screenshots of the list towards the beginning, middle, and end will be included. The top four keywords aren’t “words” at all — they’re the letters after the apostrophe in contractions. “S” is the top letter, followed by “p, ” “n, ” and “t. ” Data cleaning using Regex needs to be done to exclude these from appearing as data points. Keywords related to the political, financial, and business spheres, as seen in previous steps, are normal.

z Keywords

z Keywords

z Keywords, Cont’d

z Keywords, Cont’d