CORPUS LINGUISTICS Corpus linguistics is the study of

  • Slides: 34
Download presentation
CORPUS LINGUISTICS • Corpus linguistics is the study of language as expressed in samples

CORPUS LINGUISTICS • Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. • An approach to derive at a set of abstract rules by which a natural language is governed or relates to another language. • Originally done by hand, corpora are now largely derived by an automated process.

Corpus • “Corpus", "body", derived from the Latin word meaning may be used to

Corpus • “Corpus", "body", derived from the Latin word meaning may be used to refer to any text in written or spoken form. • • In modern Linguistics, this term is used to refer to large collections of texts which represent a sample of a particular variety or use of language(s) that are presented in machine readable form.

Scope of Studies : n n n The possible words, structures or uses in

Scope of Studies : n n n The possible words, structures or uses in a language Their probable occurrence of an aspect in a language The description and explanation of the nature, structure and use of language with particular matters such as language acquisition, variation and change.

Types of Corpora n n n n n spoken (transcribed) language, Written language from:

Types of Corpora n n n n n spoken (transcribed) language, Written language from: modern or old texts, texts from one language or several languages, texts from whole books, newspapers, journals, speeches, extracts of varying length. Online data

n n Corpus Linguistics is now seen as the study of linguistics phenomena through

n n Corpus Linguistics is now seen as the study of linguistics phenomena through large collections of machine-readable texts: corpora. These are used within a number of research areas going from the Descriptive Study of the Syntax of a Language to Language Learning, etc.

List of corpora n LIST OF CORPORA

List of corpora n LIST OF CORPORA

Examples of Corpora n n n n Brown Corpus The Brown Corpus of Standard

Examples of Corpora n n n n Brown Corpus The Brown Corpus of Standard American English was the first of the modern, computer readable, general corpora. It was compiled by W. N. Francis and H. Kucera, Brown University, Providence, RI. The corpus consists of one million words of American English texts printed in 1961. The texts for the corpus were sampled from 15 different text categories to make the corpus a good standard reference. The LOB corpus (British English) and the Kolhapur Corpus (Indian English) are two examples of corpora made to match the Brown corpus. The availability of corpora which are so similar in structure is a valuable resourse for researchers interested in comparing different language varieties, for example.

BNC-British National Corpus n n The British National Corpus (BNC) is a 100 million

BNC-British National Corpus n n The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20 th century, both spoken and written. The latest edition is the BNC XML Edition, released in 2007.

Sample Corpus n Sample

Sample Corpus n Sample

MALAY CORPUS n http: //mcp. anu. edu. au/ n http: //dbp. gov. my/korpus_DBP. pdf

MALAY CORPUS n http: //mcp. anu. edu. au/ n http: //dbp. gov. my/korpus_DBP. pdf n http: //www. ukessays. com/essays/linguisti cs/malay-speech-corpus-linguisticsessay. php

n n Quranic Corpus http: //corpus. quran. com/

n n Quranic Corpus http: //corpus. quran. com/

Corpora of CMC n http: //www. cmc-corpora. de/ n http: //michael-beisswenger. de/pub/hsk-corpora. pdf

Corpora of CMC n http: //www. cmc-corpora. de/ n http: //michael-beisswenger. de/pub/hsk-corpora. pdf

Role of The Computer in Corpus Linguistics n n n To store huge amount

Role of The Computer in Corpus Linguistics n n n To store huge amount of text To quickly retrieve huge amounts of texts To retrieve words, phrases or whole texts in context To sort out linguistic items To increase reliability in searching, counting and sorting linguistic items To provide accurate probability of occurrence of specific linguistic items.

Corpus-Related Research n n n n n Computational Linguistics Cultural Studies Discourse Analysis and

Corpus-Related Research n n n n n Computational Linguistics Cultural Studies Discourse Analysis and Pragmatics Grammar/Syntax Historical Linguistics Language Acquisition Language Teaching Language Variation Lexicography Linguistics Machine Translation Natural Language Processing (NLP) Psycholinguistics Semantics Social Psychology Sociolinguistics Speech Stylistics

Computational Linguistics (The use of computers to process or produce human language) Corpora are

Computational Linguistics (The use of computers to process or produce human language) Corpora are used as a resource to solve various problems.

Cultural Studies The existence of comparable corpora makes it possible to compare the language

Cultural Studies The existence of comparable corpora makes it possible to compare the language use in different countries. The result can point to differences in culture.

Grammar/ Syntax The existence of large corpora allows for the study of language as

Grammar/ Syntax The existence of large corpora allows for the study of language as it is produced or to study the performance of people. By confronting the grammar with unrestricted corpus data, it can be tested on its correctness and its completeness.

Historical Linguistics Machine-readable corpora from different times allow historical linguists to conduct research related

Historical Linguistics Machine-readable corpora from different times allow historical linguists to conduct research related to development of a language over time

Language Acquisition Could provide data from learners of a target language from different countries,

Language Acquisition Could provide data from learners of a target language from different countries, different age etc

Language Teaching -Corpus is used as data driven learning -more for higher level -investigate

Language Teaching -Corpus is used as data driven learning -more for higher level -investigate idiolect, idiosyncrasy, or certain aspects of grammar usage READING ASSIGNMENT Corpus Linguistics: What It Is and How It Can Be Applied to Teaching Daniel Krieger dannykrieger 99 [at] hotmail. com Siebold University of Nagasaki (Nagasaki, Japan) http: //iteslj. org/Articles/Krieger-Corpus. html

Language Variation To study or compare how language varies between different text types, domains,

Language Variation To study or compare how language varies between different text types, domains, regions, speakers, writers, etc.

Lexicography Corpora is used for the production of dictionary and grammar books. Examples-Collins Cobuild,

Lexicography Corpora is used for the production of dictionary and grammar books. Examples-Collins Cobuild, British National Corpus (BNC) & Longman Corpus Network.

Linguistics To provide traditional linguistics descriptions.

Linguistics To provide traditional linguistics descriptions.

Psycholinguistics Contribute to the creation of hypothesis about the way the language is processed

Psycholinguistics Contribute to the creation of hypothesis about the way the language is processed by the mind.

Semantics Study the meanings of words or utterances by looking at the context in

Semantics Study the meanings of words or utterances by looking at the context in which the words or phrase occurs.

Sociolinguistics To study the speakers’ age, sex, social class, writers’ age, etc.

Sociolinguistics To study the speakers’ age, sex, social class, writers’ age, etc.

Speech To be used for speech science and speech technology. To compare spoken and

Speech To be used for speech science and speech technology. To compare spoken and written language. Teaching computers to produce and understand speech. Example- London-Lund Corpus (LLC)

Stylistics To find specific features of text types. To compare with different texts. To

Stylistics To find specific features of text types. To compare with different texts. To detect changes of styles in authors’ writings.

Computational Stylistics n n The style of a text is a function of the

Computational Stylistics n n The style of a text is a function of the aggregate of the ratios between the frequencies of its phonological, grammatical and lexical items, and the frequencies of the corresponding items in a contextually related norm Computers are used to study the stylistic characteristics of particular texts, authors, genres, periods etc.

Forensic Linguistics n n n Forensic linguistics is the application of linguistics knowledge, methods

Forensic Linguistics n n n Forensic linguistics is the application of linguistics knowledge, methods and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. It is a branch of applied linguistics. Basically, there are three areas of application for linguists working in forensic contexts – 1) understanding language of the written law, 2) understanding language use in forensic and judicial processes and 3) the provision of linguistic evidence.

BASIC TOOL Concordancer n n n Example of a software used for corpus linguistics

BASIC TOOL Concordancer n n n Example of a software used for corpus linguistics What is a concordancer Examples of concordance programs How does it assist in the field of Corpus Linguistics and teaching and learning. Simple demonstration of the usage of a concordancer

Studies on Corpus Linguistics n http: //ieeexplore. ieee. org/xpl/abstract. Authors. jsp? reload=true&arnumber=5278382 n International

Studies on Corpus Linguistics n http: //ieeexplore. ieee. org/xpl/abstract. Authors. jsp? reload=true&arnumber=5278382 n International Journal of Education and Development using Information and Communication Technology (IJEDICT), 2011, Vol. 7, Issue 3, pp. 96 -101 n EDICT-2011 -1303. pdf n

Journal of Corpus Linguistics n n International Journal of Corpus Linguistics EDICT-2011 -1303. pdf

Journal of Corpus Linguistics n n International Journal of Corpus Linguistics EDICT-2011 -1303. pdf http: //benjamins. com/catalog/ijcl

Reflection n n In what ways could the availability of corpus enrich your studies

Reflection n n In what ways could the availability of corpus enrich your studies as a BENL student. Include a suggestion for a possible (Corpus Linguistic) topic for your MA thesis