Workshop Ant Conc as a corpuslinguistic program for
- Slides: 41
Workshop: Ant. Conc as a corpus-linguistic program for ad-hoc LSP purposes Presentation for the NDSU Conference by Birthe Mousten MA, Ph. D. 19 July, 2016
ABSTRACT Corpus linguistics is a field connected with tracking language for mega-corpora such as Oxford’s, Longman’s and Webster’s dictionaries. Even though such dictionaries are based on huge corpora, they often fail to provide answers to even fairly simple technical and scientific terminology and lexicography questions. This knowledge representation gap cannot be referred to corpus linguistics being useless for LSP purposes, but must be referred to the lack of LSP inputs in the corpora used for dictionary tracking. This is where an ad-hoc corpus comes into the picture. An ad-hoc corpus is collected on the fly, typically for a certain LSP task at a certain time. It can therefore be used as a tool for technical writers and translators who need to swiftly map the lexicographic, terminological and genre characteristics of a new, delimited field. The ad-hoc corpus tool is the freeware program Ant. Conc. Join me for an ad-hoc LSP task.
CORPUS LINGUISTICS - OVERVIEW � Articles about ad-hoc corpus linguistics � Mega-corpora � Ad-hoc-corpora � Traditional corpus use � Cross-linguistic corpus use � Specialized language corpora � Training with Ant. Conc – freeware program
CORPUS LINGUISTICS - ARTICLES Our humble start:
CORPUS LINGUISTICS - TUTORIAL
CORPUS LINGUISTICS – CASE STUDY
Mega corpora American National Corpora: http: //corpus. byu. edu/ British National Corpus: http: //www. natcorp. ox. ac. uk/ Wordschatz: Http: //wortschatz. uni-leipzig. de/ Korpus. DK http: //ordnet. dk/
OVERVIEW AMERICAN NATIONAL CORPORA
BRITISH NATIONAL CORPUS
GERMAN NATIONAL CORPUS
DANISH NATIONAL CORPUS
TASK LATER: KNOWLEDGE ABOUT INSULIN ANALOGS � You will be asked to work with insulin analogs later, so why not test that one right now? � We will search the American megacorpora for insulin analog and see where it gets us. � Let us try the American megacorpus first.
AMERICAN NOW CORPUS Result: 8 hits = 8 texts; at a closer look maybe only 4, of which one is not American English, but probably Indian English, and one is international English. The three hits from FDA are probably from only one text. So in practice, from an American point of view, there is the FDA text and the Seeking Alpha text. Not very impressing: Shows the need for a lack of further search to work with the area. However, why not take the FDA text for our corpus now that we have it. So we click the text and…
THE FIRST REFERENCE RENDERS THIS: …. get this text, which we copy for our text corpus.
OUR CORPUS TEXT NO. 1 The same text copied to Word.
STARTING OUR COLLECTION FOR THE CORPUS 1) 2) 3) Copy text Open Word Saving text in Word in a file in a folder: - Save as –> source/title/date (or your chosen parameters) - Save as. txt by choosing plain text => (all codes from html removed)
Save as My chosen folder Source text date Plain text
BUILDING UP THE CORPUS �I have to search elsewhere for text, and why not the largest big-data corpus in the world: Google. � I use Google advanced search – it is easier, and quicker. � But a small step before that – I read my wiki � https: //www. google. ca/advanced_search
WIKIPEDIA � � � To help me make a very precise search on Google, I sneak peak in Wikipedia to see whether it knows anything about insulin analog: An insulin analog is an altered form of insulin, different from any occurring in nature, but still available to the human body for performing the same action as human insulin in terms of glycemic control. Through genetic engineering of the underlying DNA, the amino acid sequence of insulin can be changed to alter its ADME (absorption, distribution, metabolism, and excretion) characteristics. Officially, the U. S. Food and Drug Administration (FDA) refers to these as "insulin receptor ligands", although they are more commonly referred to as insulin analogs. These modifications have been used to create two types of insulin analogs: those that are more readily absorbed from the injection site and therefore act faster than natural insulin injected subcutaneously, intended to supply the bolus level of insulin needed at mealtime (prandial insulin); and those that are released slowly over a period of between 8 and 24 hours, intended to supply the basal level of insulin during the day and particularly at nighttime (basal insulin). The first insulin analog approved for human therapy (insulin Lispro r. DNA) was manufactured by Eli Lilly and Company.
THEN GOOGLE ADVANCED I want these parameters My key concept Most recent year
GOOGLE ADVANCED RESULTS Ok – let’s get to it then Give yourself 10 minutes to copy paste into your folder.
TEN MINUTES AFTER I HAVE MY CORPUS Then I am ready for my corpus work
Now --- our task We just got a task from a company, say Eli Lilly or Novo Nordisk about writing or translation something about insulin analog How can a corpus help you? 1) 2) 3) 4) 5) 6) Register Collocations Definitions Synonyms Knowledge! Etc.
Getting the program Find Laurence Anthony’s website – Just google search the name. By the way, the address is here: http: //www. antlab. sci. waseda. ac. jp/software. html #antpconc Press: Ant. Conc 3. 4. 4 (or the Mac or other version that is compatible with your computer) (NB: The language code must be Western Latin 1! (Check under Global Settings – Language encoding – set it to Western Latin 1) Please join me here!
Loading your corpus into the program Guide: • Press Ant. Conc 3. 4. 4 (sprogudgave skal være Latin 1) • Load your folder (Windows explorer method) • Then you are ready to search.
ANTCONC OPENED – BUT EMPTY Ant. Conc is now open. 1) Press File 2) Open Director y 3) Load your corpus folder in the Windows
YOUR CORPUS LIST
PRESS WORD LIST
LANTUS – WHAT IS THAT? Only three sources: -> Product name? Check in texts.
WHAT IS HYPOGLYCEMIA If you press some of the words, you get directly into the texts. Scroll down -> different sources use the word => ESP word
THE OR TRICK Finding: Content knowledge Alternatives Synonyms (try also Aka Referred to Known as (
THE ( TRICK – FINDING PARENTHETICAL INFO Which kind of info would you find in a parenthesis and what does that data tell you?
FINDING COLLOCATIONS - RIGHT Set the sorting parameters at the bottom. 1 R 2 R 3 R Findings: Metabolic changes Metabolic control Metabolic decompensation Metabolic deterioration All of them concepts in their own right.
FINDING COLLOCATIONS - LEFT Set the sorting parameters at the bottom. 1 L 2 L 3 L Findings: Good metabolic Poor metabolic Rapid metabolic … . . but for instance not bad metabolic!
STATISTICS & COLLOCATION – FOR THE NERDS Shows ranking, frequency left, frequency right, statistics and the collocate. Note for instance regimen, can be used with dosing a L and R collocate. The same meaning?
FILE VIEW – FOR BETTER CONTEXT
CONCORDANCE PLOT FOR PRECAUTIONS Shows in which texts a word exists and how the word use is distributed throughout the text. Precautions has a strong front tendency in the texts. A possible genre tool?
YOUR TURN � Let your phantasy loose. � What do you find out? � What can it be used for? � Is it useful in the first place?
MY OPINION – GOOD The program is intuitive in use – I never learned it from anyone. General Microsoft processes work in Ant. Conc. Good as an ad-hoc writing and translation tool. Good for register work. Good for collocations. Good for proof of what you are doing. Forget about your intuitive ideas and check how it works. Define a problem and devise your own solution method. Even 10 texts as in our case can provide a wealth of information. Ten times faster and 100 times more reliable than tiresome open-and-read x number of Google docs. Must nowadays necessarily replace any oldfashioned pencil-and-paper work.
MY OPINION - LIMITATIONS � The quality of the findings depends on the user Quality in => quality out � You have got to get started to like it � You have to learn approx. five shortcuts in order not to tire out
� Thank you for joining me in this. �I wish you the best of luck with the rest of the conference. � If you want to contact me, please write me: � bmo@dac. au. dk / bmo@expo-com. dk
- Conc 3
- Prograstion
- Concentration mole and volume
- Ourtego
- Electrolysis of concentrated sulphuric acid
- Iso 22301 utbildning
- Typiska novell drag
- Nationell inriktning för artificiell intelligens
- Vad står k.r.å.k.a.n för
- Varför kallas perioden 1918-1939 för mellankrigstiden?
- En lathund för arbete med kontinuitetshantering
- Personalliggare bygg undantag
- Tidbok för yrkesförare
- Sura för anatom
- Förklara densitet för barn
- Datorkunskap för nybörjare
- Boverket ka
- Mall debattartikel
- För och nackdelar med firo
- Nyckelkompetenser för livslångt lärande
- Påbyggnader för flakfordon
- Formel för lufttryck
- Offentlig förvaltning
- Jag har gått inunder stjärnor text
- Presentera för publik crossboss
- Teckenspråk minoritetsspråk argument
- Kanaans land
- Klassificeringsstruktur för kommunala verksamheter
- Fimbrietratt
- Bästa kameran för astrofoto
- Cks
- Byggprocessen steg för steg
- Mat för idrottare
- Verktyg för automatisering av utbetalningar
- Rutin för avvikelsehantering
- Smärtskolan kunskap för livet
- Ministerstyre för och nackdelar
- Tack för att ni har lyssnat
- Vad är referatmarkeringar
- Redogör för vad psykologi är
- Stål för stötfångarsystem
- Tack för att ni har lyssnat