CORPUS LINGUISTICS 1 A revision of corpus linguistics

  • Slides: 33
Download presentation
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL

CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom

WHAT IS A CORPUS? A corpus can be defined as a collection of texts

WHAT IS A CORPUS? A corpus can be defined as a collection of texts assumed to be representative of a given language put together so that it can be used for linguistic analysis. Usually the assumption is that the language stored in a corpus is naturally-occurring, that is gathered according to explicit design criteria, with a specific purpose in mind, and with a claim to represent natural chunks of language selected according to specific typology Tognini-Bonelli (2001: 2)

“nowadays the term 'corpus' nearly always implies the additional feature of 'machine-readable'”. Mc. Enery

“nowadays the term 'corpus' nearly always implies the additional feature of 'machine-readable'”. Mc. Enery & Wilson, Corpus Linguistics. Online manual.

English language corpora: General vs. Specific

English language corpora: General vs. Specific

ENGLISH CORPORA: GENERAL LANGUAGE CORPORA First generation corpora: -Brown Corpus of Written American English

ENGLISH CORPORA: GENERAL LANGUAGE CORPORA First generation corpora: -Brown Corpus of Written American English -Lancaster Oslo-Bergen of Written British English -500 texts of around 2000 words each -no spoken data -wide variety of written texts

ENGLISH CORPORA: GENERAL LANGUAGE CORPORA Second generation corpora: -Bank of English -monitor corpus -both

ENGLISH CORPORA: GENERAL LANGUAGE CORPORA Second generation corpora: -Bank of English -monitor corpus -both spoken and written text -different regional varieties of English -British National Corpus (BNC) -90 million written words -10 million spoken words -freely accessible: Mark Davies‘ interface

OTHER TYPES OF ENGLISH LANGUAGE CORPORA -speech corpora: -sound recordings -SPOKEN ENGLISH CORPUS -detailed

OTHER TYPES OF ENGLISH LANGUAGE CORPORA -speech corpora: -sound recordings -SPOKEN ENGLISH CORPUS -detailed description of spoken phenomena: phonology, prosody (stress, tone units…), etc -multimedia corpora: -transcripts synchronised audio/video recordings -TALKBANK Website: SANTA BARBARACORPUS OF SPOKEN AMERICAN ENGLISH (SBCSAE)

audiovisual element some markup for context space for our own annotation

audiovisual element some markup for context space for our own annotation

OTHER TYPES OF ENGLISH LANGUAGE CORPORA -parsed corpora: -syntactically analysed -SURFACE AND UNDERLYING STRUCTURAL

OTHER TYPES OF ENGLISH LANGUAGE CORPORA -parsed corpora: -syntactically analysed -SURFACE AND UNDERLYING STRUCTURAL ANALYSES AND NATURALISTIC ENGLISH CORPUS (SUSANNE) -historical corpora: -English of earlier periods -may cover specific historical periods or genres -track and describe how language has evolved -A REPRESENTATIVE CORPUS OF HISTORICAL ENGLISH REGISTERS (ARCHER)

OTHER TYPES OF ENGLISH LANGUAGE CORPORA -specialised corpora: -focus on concrete genres/domains -BUSINESS LETTERS

OTHER TYPES OF ENGLISH LANGUAGE CORPORA -specialised corpora: -focus on concrete genres/domains -BUSINESS LETTERS CORPUS (BLC) -lingua franca corpora: -ENGLISH AS A LINGUA FRANCA IN ACADEMIC SETTINGS (ELFA) CORPUS -intercultural exchanges among speakers who use English as a lingua franca

OTHER TYPES OF ENGLISH LANGUAGE CORPORA -developmental language corpora: -non-adult English native speakers' output

OTHER TYPES OF ENGLISH LANGUAGE CORPORA -developmental language corpora: -non-adult English native speakers' output -not as proficient as native-speaker corpora -POLYTECHNIC OF WALES (POW) CORPUS -ESL/EFL learner corpora: -learners of English's output -one and the same L 1 background or different mother tongues -JAPANESE EFL LEARNER CORPUS (JEFLL)

WORDSMITH: FLEXIBLE CORPUS -Computer program which permits users to compile their own corpus -Texts

WORDSMITH: FLEXIBLE CORPUS -Computer program which permits users to compile their own corpus -Texts must be in. txt format -Any text can be subjected to the same process of analysis that official corpora undergo: concordance lines, word lists, etc -No need to pre-process such texts in advance

Corpus linguistics -Insights into the internal workings of real language -Knowledge in turn also

Corpus linguistics -Insights into the internal workings of real language -Knowledge in turn also used in other fields of enquiry -Planning, designing, compiling and tagging -Frequency lists and concordance lines (+further analysis) -Sinclair’s (2003) “degeneralisation”: -sceptical about 'received' descriptions -patterns found in the data: more precise or alternative descriptions -Corpus-based dictionaries and grammars -how lexis and grammar are “really” used -COLLINS COBUILD LEARNER'S DICTIONARY -THE LONGMAN GRAMMAR OF SPOKEN AND WRITTEN ENGLISH

CORPORA IN THE ESL/EFL CLASSROOM: PEDAGOGICAL FOUNDATIONS -Mixture between instructional and naturalistic LL -Fulfilment

CORPORA IN THE ESL/EFL CLASSROOM: PEDAGOGICAL FOUNDATIONS -Mixture between instructional and naturalistic LL -Fulfilment of both the input and output hypotheses -”Scaffolding” (though loosely speaking) -insights concerning English culture(s) -Student-centred and related to constructivism: mastering corpora = learning autonomy

CORPUS-BASED ESL/EFL ACTIVITIES -Focus on lexis, grammar and register -introductory notions concerning collocation, colligation,

CORPUS-BASED ESL/EFL ACTIVITIES -Focus on lexis, grammar and register -introductory notions concerning collocation, colligation, and formal vs. informal -For already motivated students: BNC

Activity one: contractions, formal or informal? spoken or written? The key * ? ’?

Activity one: contractions, formal or informal? spoken or written? The key * ? ’? ?

Quotation marks!

Quotation marks!

Activities two and three: Corpora as a source of knowledge concerning collocation and colligation

Activities two and three: Corpora as a source of knowledge concerning collocation and colligation

[v*] mistakes

[v*] mistakes

[aj*] powerful, not strong!!!

[aj*] powerful, not strong!!!

Activity four: meaning via collocations and co -text

Activity four: meaning via collocations and co -text

For non-motivated students: Word. Smith -Contact with the English language: input (at least lexis-wise)

For non-motivated students: Word. Smith -Contact with the English language: input (at least lexis-wise) -Popular culture: MUSIC IN ENGLISH!!!

Activity one: music corpora, lexis, and the BNC for grammar accuracy

Activity one: music corpora, lexis, and the BNC for grammar accuracy

author corpus reference corpus

author corpus reference corpus

Select the text you want a list of

Select the text you want a list of

Save both lists to compare them with Keyword

Save both lists to compare them with Keyword

author corpus list reference corpus list

author corpus list reference corpus list

That was all! The nightmare is over! Thank you for listening! ^. ^

That was all! The nightmare is over! Thank you for listening! ^. ^