RESEARCH DESIGN CORPUS COMPILATION Corpus design is intrinsic

  • Slides: 31
Download presentation
RESEARCH DESIGN & CORPUS COMPILATION

RESEARCH DESIGN & CORPUS COMPILATION

 • Corpus design is intrinsic and a fundamental part of the analysis. •

• Corpus design is intrinsic and a fundamental part of the analysis. • It is guided by the RQ and affects the results. • Design criteria are interpretative and must be explicit (why you chose the texts you did, how and why you organised them in the way you did) • Different purposes = different corpora.

Corpus design • • • What? Which? When? Where? How? Why?

Corpus design • • • What? Which? When? Where? How? Why?

What • • • Choosing discourse type(s) there are epistemological considerations And there are

What • • • Choosing discourse type(s) there are epistemological considerations And there are practical considerations General vs. topical epistemological considerations practical considerations

Which • Choosing variables • You need to have at least one variable constant

Which • Choosing variables • You need to have at least one variable constant or your corpora are not really comparable • e. g. same time period different newspapers • Same kind of newspaper different time period

comparison • You are comparing and looking for patterns • One occurrence of anything

comparison • You are comparing and looking for patterns • One occurrence of anything is not enough, a pattern is: • a) a figure that emerges from a homogeneous background by means of differentiation and • b) the accumulation of similar things. • C) recurring regularities of form

Comparative analysis • • • Looking at: DIFFERENCE SIMILARITY across corpora within corpora

Comparative analysis • • • Looking at: DIFFERENCE SIMILARITY across corpora within corpora

parameters across corpora mode (written vs. spoken) discourse type (e. g. factual vs. fiction)

parameters across corpora mode (written vs. spoken) discourse type (e. g. factual vs. fiction) time (diachronic studies) variety (e. g. British English vs. American English) geography (e. g. national vs. local newspapers) political tendency (Democrats vs. Republican’s speeches) • individual (e. g. George Elliot vs. Thomas Hardy) • . . . • • •

Parameters within corpora • • sub-corpora (e. g. headlines vs. articles; news vs. comment)

Parameters within corpora • • sub-corpora (e. g. headlines vs. articles; news vs. comment) Specific lexical items (e. g. moral vs. ethic; boy vs. girl; immigrant vs. asylum seeker vs. refugee. . . )

Collections of texts – not one text • Integral output of a source-unit •

Collections of texts – not one text • Integral output of a source-unit • (e. g. a whole edition of a newspaper) • The corpus of works by one author (not a single text)

Topic based corpus • Search-term(s) based collection • You gather texts by searching a

Topic based corpus • Search-term(s) based collection • You gather texts by searching a database for all the texts containing the search-term(s) • identifying the list of search items to ensure the coverage of the topic is as complete as possible.

Time based • • Historical linguistics diachronic change/stability of language modern diachronic analysis See

Time based • • Historical linguistics diachronic change/stability of language modern diachronic analysis See edition of Corpora MD-CADS for examples (Partington 2010)

Research questions • All the choices we make in the corpus design and data

Research questions • All the choices we make in the corpus design and data collection phase • e. g. what to collect, how to collect it, from which platform, in • which format, etc. • all depend on the RQ!

Practical considerations • • • availability access collection speed storage format

Practical considerations • • • availability access collection speed storage format

The research question • All the choices we make in the corpus design and

The research question • All the choices we make in the corpus design and data collection phase • e. g. what to collect, how to collect it, from which platform, in • which format, etc. • depend on the RQ!

RQ example 1 • 1. How are muslims represented in the British press? •

RQ example 1 • 1. How are muslims represented in the British press? • What are the appropriate search terms? • muslim*, moslem*, islam*. . . ? • Consider synonyms and near-synonyms, alternative spellings etc.

RQ example 2 • 2. How is religion represented in the British press? •

RQ example 2 • 2. How is religion represented in the British press? • How many terms do I need to add? • How many terms can I add?

RQ example 3 • 3. How much attention does the British press give to

RQ example 3 • 3. How much attention does the British press give to religion? • A search-term based corpus will not tell you. • How will you find out? • How will you delimit the work? (by limiting and defining the RQ a bit more, e. g. by defining a time period or the type of newspapers under consideration)

storage • FOLDERS • folders and file names (a repository of information, a sort

storage • FOLDERS • folders and file names (a repository of information, a sort of level 0 of mark-up) • FILES become our definition for what is a text • unit of analysis

Best practice • Distribute information between FOLDER and FILE according to the structure of

Best practice • Distribute information between FOLDER and FILE according to the structure of your corpus (and to your RQ) • Avoid having more than 2 or 3 levels of folders • Keep names short but dense with information

 • example 1: Do newspapers use the same language at a 20 years

• example 1: Do newspapers use the same language at a 20 years distance? • Which among British broadsheets has changed the most?

storage for example 1 • CORPUS • year 1 • Newspaper 1 y 1_n

storage for example 1 • CORPUS • year 1 • Newspaper 1 y 1_n 1_f 2 y 1_n 1_f 3 y 1_n 1_f 4. . . • • • N 2 N 3 year 2 N 3 N 1 N 2

 • example 2: • How are science and religion represented in political discourse?

• example 2: • How are science and religion represented in political discourse?

 • Solution 1 Science corpus Religion corpus Solution 2 Democrat corpus Republican corpus

• Solution 1 Science corpus Religion corpus Solution 2 Democrat corpus Republican corpus

How much? • • The bigger the better BUT also the size depends on

How much? • • The bigger the better BUT also the size depends on the purpose! I ask for a minimum of 100, 000 words

 • The transformation of texts into textual resources is a process of interpretation

• The transformation of texts into textual resources is a process of interpretation and therefore compilers have the responsibility typically associated with an editor. • The questions we ask (and those we do not ask), affect the answers we can get, it is important to keep track of our expectations and choices and the reasons behind them.

Epistemological reflexivity: you need to ask yourself • How has the research question defined

Epistemological reflexivity: you need to ask yourself • How has the research question defined and limited what can be ‘found’? • How has the design of the study and the method of analysis ‘constructed’ the data and the findings? • How could the research question have been investigated differently? • To what extent would this have given rise to a different understanding of the phenomenon under investigation?

Reflexivity is an unavoidable aspect of research: • epistemological reflexivity encourages us to reflect

Reflexivity is an unavoidable aspect of research: • epistemological reflexivity encourages us to reflect upon the assumptions (about the world, about knowledge) that we have made in the course of the research, • and it helps us to think about the implications of such assumptions for the research and its findings(Nightingale and Cromby, 1999: 228).

 • Principles of accountability • Replicability • These principles are important in researqch

• Principles of accountability • Replicability • These principles are important in researqch and you need to learn to ask yourself how your research follows the principles • We will be looking at all these issues again and in more detail

Exam • The exam includes: the first draft consisting of the abstract and corpus

Exam • The exam includes: the first draft consisting of the abstract and corpus description presented to the group • A final draft consisting of abstract and a copy of your corpus and its description sent to me • A presentation on the day of the exam. • Don’t forget you need to have proof of B 2 competence in English to be able to register the exam.