Combining intuition with corpus linguistic analysis A study
Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria Leedham m. e. leedham@open. ac. uk FLa. RN 2010
BACKGROUND TO STUDY 2 FLa. RN 2010 Maria Leedham
Chunking through intuition: Study 1 RQ: • To what extent can NSs and NNSs chunk NNS speech? Data: • transcripts of 2 intermediate-level Japanese students’ speech • students were recorded 3 times with a 2 -month gap between each • total of approx. 1500 words across the 6 transcripts Method • Step 1: 3 NS linguists asked to underline chunks in the 6 transcripts (training, examples and practice given first) • Step 2: Japanese students asked to identify chunks in their own transcripts • Step 3: author chunks transcripts with assistance from Word. Smith Tools (Leedham, 2006) FLa. RN 2010 Maria Leedham
Example of chunked transcript from Study 1 Key: italics - words classified by the NNS as a chunk. underline – words 2 or 3 out of the 3 NSs classified as a chunk 1 ahh…first err I, I learned, learnt? (mmhmm I learnt) err (2. 0) I should err. . I 2 should be more positive? (right) positive… in UK because ahh…when, when I 3 went to London err… last Sunday (mhmm) ahh (2. 0) some, some of the 4 underground line (mm) line was no service (oh dear) ((speaker laughs)) I was 5 really surprised and, because it can, cannot be (mm) in Japan (mm) you know, 6 sun- in, in Sunday, on? (mm) on Sunday many, many people (mm) come to 7 London (mm) and go around some place (mm). . . so everyone need to, need a 8 train (mm) so, but maybe four or five lines… was not, no service (mm) so… 9 I… I have to think err what I should do ((speaker laughs)) and no, I’ve never, I 10 have never been to London that, so, this was the first time I’ve been to London 11(mm) so… FLa. RN 2010 Maria Leedham
Findings from Study 1 • • • Findings: little inter or intra-rater reliabilitiy many ‘missing’ chunks (eg ‘of course’, ‘you know’) both across and within raters frustrating and time-consuming task for NSs BUT… the Japanese ss could do this task AND also offered insights into when/why… (eg student M: “I used to say that but now I know it’s not usual”. ) the more time spent looking for chunks, the more will be found Coda • a further recording, transcribing & awareness-raising cycle suggests that this resulted in uptake • both students found it highly motivating to record analyse transcripts of their talk FLa. RN 2010 Maria Leedham
Chunking through intuition: Study 1 Method • Step 1: 3 NS linguists asked to underline chunks in the 6 transcripts (training, examples and practice given first) • Step 2: Japanese students asked to identify chunks in their own transcripts • Step 3: author chunks transcripts with assistance from Word. Smith Tools v. 5 FLa. RN 2010 Maria Leedham
STUDY 2: FLa. RN 2010 Maria Leedham
Outline 1. Research questions 2. The students and the texts 3. The two methods 4. Findings 4. 1. Method 1 4. 2 Method 2 5. Conclusions and Implications FLa. RN 2010 Maria Leedham
Research Questions 1. What can a study of lexical chunks reveal about these Chinese students’ writing? 2. What does each method contribute? FLa. RN 2010 Maria Leedham
The Students Criteria - L 1 Chinese (Mandarin or Cantonese) - All secondary education in home country - Contributions from years 1 & 2 and year 3 of undergraduate study Wei • Male • BSc Engineering Ping • Female • BA Hospitality, Leisure & Tourism Management (HLTM) Feng • Female • BSc Food Science with Business Hong • Male • BA HLTM FLa. RN 2010 Maria Leedham
The texts Reference corpora FLa. RN 2010 Maria Leedham
Combining intuition and corpus searches Method 1: Manual analysis • Read all 4 Chinese students’ texts • Read twice, with 6 months between • Read equivalent, randomlyselected English students’ texts • Noted ‘salient’ features, then searched corpora of the individual’s texts, the discipline, all Chinese students’ writing, all English students’ writing. Method 2: Key n-gram searches • Used Word. Smith Tools, v. 5 (Scott, 2008) • Searched for key n-grams in the corpus of texts from each student, using relevant discipline corpus from L 1 English as reference • Setting p=0. 00001, deleted short ngrams within longer n-grams • Compiled key n-gram lists • Looked at concordance lines and texts for more context FLa. RN 2010 Maria Leedham
Formulaic sequences in sample of Wei’s writing (Engineering) Introduction A design methodology for a gearbox is presented in this report. The input horse power, the input speed and net reductions in the gearbox are the parameters to be specified. A gearbox takes an input shaft rotating and converts it via a gear train into up to three outputs, the process of designing a gearbox is to figure out which ratios are needed and to implement those ratios in the form of positioning various sizes of connected gears. The specification of the gearbox depends on its area of application. • In this report, a gearbox is designed for a commercial meat slicer which has its final shaft rotating at between 80 and 100 rev/min. The input of the meat slicer is a constant speed AC motor running at 1800 rev/min and delivering 1. 2 k. W. A few points have to be considered on this system, the size of the gearbox is severe restricted, since it has to go onto a work surface where there is severe competition for space. And the motor may be in-line or at right angles to the grinder. Furthermore, the duty is expected to be up to 6 hours per day. FLa. RN 2010 Maria Leedham
Outline 1. Research questions 2. The students and the texts 3. The two methods 4. Findings 4. 1. Method 1 4. 2 Method 2 5. Conclusions and Implications FLa. RN 2010 Maria Leedham
Idiosyncratic language In one word computer based tools contribute an… In one word the overall system can be described… (Wei, years 2 & 3) In light of this, it is suggested that buying IHG… In light of this, it can be suggested that… In light of this, it is recommended that buying IHG… (Ping, year 3, in 1 text) … but simply writing a responsible tourism policy is no longer enough. It is a must to show practical action, … (Hong, Year 1) a winning city, the authorities of Liverpool have to rebuild its image to get rid of the negative picture. (Hong, Year 2) …and boost its marketing campaigns in order to catch the world’s eyes on Scotland. (Hong, year 3) FLa. RN 2010 Maria Leedham
Vague language • In catering services, restaurants in Oxford and Bath are more or less the same. (Hong, Year 1) • From those tables, the same thing as section 3. 1 could be found … (Wei, Year 1). • …a measurement system for measuring low-lever force, a kind of cantilever rig which is called… • A kind of variable inductance sensor has been chosen… • …Furthermore, with processing data, a kind of filter is always needed to separate certain… (Wei, year 2, same assignment) • At that time, I found that this hotel is a little bit out of my expectation. (Hong, Year 2) FLa. RN 2010 Maria Leedham
Vague language • L 1 English students use: ‘a bit of a ‘ + N eg ‘a bit of a problem’, ‘a bit of a shock’, ‘a bit of a dog’s breakfast’ • Often this is from reflective writing ‘The conclusion was also a bit of a victim in my editings, bringing it down to one small sentence for each of the areas of discussion’. (6101 c Cybernetics Year 3 essay) FLa. RN 2010 Maria Leedham
Chunks with – and without – ‘I’ & ‘we’ • From the experiment, it was known that the mechanical properties of carbon steel AN and carbon steel N…. • It was found out the mechanical properties of carbon steel AN was incorrect in this experiment, … • Meanwhile, if we clipped the current probe round one of the motor supply leads, and connected it to Ch 1 of the oscilloscope, we could get two copies of the transient starting current of the motor from the oscilloscope. From these two copies, we could calculated… (Wei, Year 1) FLa. RN 2010 Maria Leedham
Chunks with – and without – ‘I’ & ‘we’ L 1 English students FLa. RN 2010 Maria Leedham
Linkers • • • This can create a positive image for Scotland, on the other hand, (Ping Year 3) …In other words, people are buying expectations. . . (Hong, year 3) As a consequence, it can attract many travelers… (Hong, Year 2) On the contrary, the predominance of SMEs. . . (Ping, Year 2) First of all, the dimension of the brake disc is decided. (Wei, Year 3) What is more, Bath is served by a large number of local bus services… (Hong, Year 1) References to data • ‘as shown in table’ (Wei x 2, Ping x 2) • ‘according to’ (Wei x 4) • ‘as illustrated in table + NUMBER’ (Ping x 2) FLa. RN 2010 Maria Leedham
Summary of method 1 findings Salient chunks in the Chinese students’ writing were: • Idiosyncratic chunks (‘in light of the’) • Vague language (‘a bit of’) – though note English students’ use of ‘a little bit of’ • High use of chunks with ‘we’ and low use of chunks with ‘I’ – partly due to English students’ reflective writing • Use of favoured linkers (‘on the other hand’) • Reference to data in tables and figures (‘according to the equation’) • BUT… very difficult to intuit chunks in unfamiliar disciplines FLa. RN 2010 Maria Leedham
Outline 1. Research questions 2. The students and the texts 3. The two methods 4. Findings 4. 1. Method 1 4. 2 Method 2 5. Conclusions and Implications FLa. RN 2010 Maria Leedham
Method 2: Key n-gram searches • Used Word. Smith Tools, version 5 (Scott, 2008) • Searched for key n-grams (= ‘key clusters’) in the corpus of texts from each of the 4 students • Relevant discipline corpus from L 1 English used as reference corpus • P=0. 00001, deleted short n-grams within longer n-grams • Compiled a key n-gram list for each student • Grouped these key n-grams into themes • Looked at concordance lines for more context FLa. RN 2010 Maria Leedham
N-grams FLa. RN 2010 Maria Leedham
Idiosyncratic language Ping's year 2 proposal ‘aim of the’ ‘of the assignment is to design’ ‘to develop an understanding of’ (Wei) FLa. RN 2010 Maria Leedham
Discipline-specific n-grams • “Marriott Liverpool city centre”, “the Liverpool tourism industry”, ‘the tourism industry’ (Hong) • ‘the hospitality industry’, ‘recruitment and selection’, • ‘in the hospitality industry’ (Ping) Passive voice • ‘be worked out’, ‘can be calculated’ (Wei) • ‘there will be’, ‘it is believed that’ (Ping) References to data • ‘with reference to appendix’, ‘please see appendix’ (Ping) • ‘in the appendix’, ‘briefing sheet in appendix’, ‘is shown as’, ‘tables of data’, ‘were recorded as below’ • ‘was calculated with eq. ’ (Wei) FLa. RN 2010 Maria Leedham
Favoured linkers decrease over time FLa. RN 2010 Maria Leedham
Summary of method 2 findings • Many of the same findings from method 1 – idiosyncratic chunks – some linkers –esp. ‘on the other hand’ – low use of chunks with ‘I’ – references to data • Also…. discipline-specific chunks • Easy to compare one student’s texts with the discipline reference corpus & each L 1 reference corpus • Similar findings occur within the Chinese students overall • NB Keyness measures difference FLa. RN 2010 Maria Leedham
Outline 1. Research questions 2. The students and the texts 3. The two methods 4. Findings 4. 1. Method 1 4. 2 Method 2 5. Conclusions and Implications FLa. RN 2010 Maria Leedham
Intuitive reading Key n-grams analysis Finds frequent chunks (n-grams) Finds semantically whole units (formulaic sequences) Plus • A person can recognise single instances • Large quantities of data can be analysed quickly that a computer would miss • The text is read as a complete document • Accurate - as intended by the writer • Easily replicable • • Minus Time-consuming and tiring Problem of inter-rater reliability Problem of intra-rater consistency Hard to replicate • • Minus Single chunks are missed Arbitrary parameters Conflation of writing from lots of individuals Sense of text as complete document is lost FLa. RN 2010 Maria Leedham 30 of 10
Combining methods… • Combine the two methods through a recursive process of reading texts and checking the sequences in a corpus, also searching for key n-grams for less intuitive sequences. “ultimately, the most revealing insights… will be gained from a closer look at the texts, the speakers, and the situational variables; quantitative analysis alone can never provide a satisfactory picture” (Simpson, 2004: 41). FLa. RN 2010 Maria Leedham
FLa. RN 2010 Maria Leedham
References • Foster, P. (2001). "Rules and routines: A consideration of their role in the task -based langage production of native and non-native speakers", in M. Bygate, P. Skehan, and M. Swain, (eds. ), Task-Based Learning: Language Teaching, Learning and Assessment. Longman: London. • Heuboeck, A. , Holmes, J. & Nesi, H. 2007 The Bawe Corpus Manual. Retrieved from http: //www. coventry. ac. uk/researchnet/d/505/a/5160. • Leedham, 2006. “Do I speak better? – A longitudinal study of lexical chunking in the spoken language of two Japanese students”. In The East Asian Learner. • Scott, M. 2008. Word. Smith Tools v. 5. Oxford University Press. • Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press. • BAWE corpus- ESRC project number: RES-000 -23 -0800 FLa. RN 2010 Maria Leedham
- Slides: 33