Typology Language Sampling Anna Siewierska Dik Bakker Typology
Typology: Language Sampling Anna Siewierska & Dik Bakker Typology: Language Sampling
Empirical Cycle L PROVISIONAL Hypotheses L L Definition Categories: C 1 … C 3 L L L DATA L L L L L Typology: Language Sampling 2
Empirical Cycle L PROVISIONAL Hypotheses L L Definition Categories: C 1 … C 3 L L L DATA L L L L L Typology: Language Sampling 3
Empirical Cycle L PROVISIONAL Hypotheses L L Definition Categories: C 1 … C 3 L L L DATA L L L TEST L L L Typology: Language Sampling 4
Empirical Cycle Hypotheses L L Definition DATA L L L PROVISIONAL Categories: C 1 … C 3 L L L L Typology: Language Sampling L L L TEST L L 5
Empirical Cycle Hypotheses L L Definition DATA L L L PROVISIONAL Categories: C 1 … C 3 L L L L Typology: Language Sampling L L L TEST L L 6
Overview Typology: Language Sampling 7
Overview 1. 2. 3. 4. 5. 6. Collecting language data Why a sample? Types of biases in samples Two strategies Samples in typological literature The DV method Typology: Language Sampling 8
Data collecting Languages of the world: Typology: Language Sampling n 7000 9
Data collecting Languages of the world: n 7000 S A M P L E (50 – 500) Typology: Language Sampling 10
Data collecting Why not all languages in our database? - Too many - Only <1000 well described (grammar) <2000 partial sketch - Not (always) necessary - Sometimes even wrong - Impossible even in principle Typology: Language Sampling 11
All Languages: impossible Extant languages: 7000 Typology: Language Sampling 12
All Languages: impossible Extant languages: Extinct languages: 7000 500 (Ruhlen 1991) Typology: Language Sampling 13
All Languages: impossible Extant languages: Extinct languages: 7000 500 (Ruhlen 1991) - Latin, Cl. Greek, Gothic, Hebrew, Hittite, … Typology: Language Sampling 14
All Languages: impossible Extant languages: Extinct languages: 7000 500 (Ruhlen 1991) - Latin, Cl. Greek, Gothic, Hebrew, Hittite, … - Cl. Turkic, Cl. Tibetan, Archaic Chinese, … Typology: Language Sampling 15
All Languages: impossible Extant languages: Extinct languages: 7000 500 (Ruhlen 1991) - Latin, Cl. Greek, Gothic, Hebrew, Hittite, … - Cl. Turkic, Cl. Tibetan, Archaic Chinese, … - Manx, Cornish, … Typology: Language Sampling 16
All Languages: impossible Extant languages: Extinct languages: 7000 500 (Ruhlen 1991) - Latin, Cl. Greek, Gothic, Hebrew, Hittite, … - Cl. Turkic, Cl. Tibetan, Archaic Chinese, … - Manx, Cornish, … Problem? Typology: Language Sampling 17
All Languages: impossible Extant languages: Extinct languages: 7000 500 (Ruhlen 1991) - Latin, Cl. Greek, Gothic, Hebrew, Hittite, … - Cl. Turkic, Cl. Tibetan, Archaic Chinese, … - Manx, Cornish, … No native speaker intuitions … Typology: Language Sampling 18
All Languages: impossible Extant languages: Extinct languages: 7000 500 (Ruhlen 1991) - Latin, Cl. Greek, Gothic, Hebrew, Hittite, … - Cl. Turkic, Cl. Tibetan, Archaic Chinese, … - Manx, Cornish, … - Illinois, Mohican, Massachusett, Carolina, … Typology: Language Sampling 19
All Languages: impossible Extant languages: Extinct languages: 7000 500 (Ruhlen 1991) - Latin, Cl. Greek, Gothic, Hebrew, Hittite, … - Cl. Turkic, Cl. Tibetan, Archaic Chinese, … - Manx, Cornish, … - Illinois, Mohican, Massachusett, Carolina, … - X 1, X 2, X 3, …, Xn Typology: Language Sampling 20
All Languages: impossible Extant languages: Extinct languages: X 1, X 2, X 3, …, Xn 7000 500 (Ruhlen 1991) ? ? Typology: Language Sampling 21
All Languages: impossible Homo Sapiens 200, 000 BP Geat Leap Forward 40, 000 BP Average n of lgs 6000 Diachronic change 1000 year X lgs: (40, 000 / 1000) * 6000 = 240, 000 Typology: Language Sampling 22
All Languages: impossible Extant languages: 7000 Extinct languages: 500 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 Typology: Language Sampling 23
All Languages: impossible Extant languages: 7000 Extinct languages: 500 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 Typology: Language Sampling 3. 0% 24
All Languages: impossible Extant Documented: Extinct languages: 1500 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 Typology: Language Sampling 0. 6% 25
All Languages: impossible Extant Documented: Extinct languages: 1500 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 0. 6% spoken anno 2000 Typology: Language Sampling 26
All Languages: impossible Extant Documented: Extinct languages: 1500 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 0. 6% spoken anno 2000 Typology: Language Sampling 27
All Languages: impossible Extant Documented: Extinct languages: 1500 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 Typology: Universals of Human Language Typology: Language Sampling 0. 6% spoken anno 2000 28
All Languages: impossible Extant Documented: Extinct languages: 1500 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 “Human Language” Typology: Language Sampling 0. 6% spoken anno 2000 29
All Languages: impossible Extant Documented: 1500 Extinct Documented: <100 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 “Human Language” Typology: Language Sampling spoken anno 2000 30
All Languages: impossible Extant Documented: 1500 Extinct Documented: <100 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 “Human Language” Typology: Language Sampling Uniformitarianism (Lass 1997) spoken anno 2000 31
All Languages: impossible Extant Documented: 1500 Extinct Documented: <100 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 “Human Language” Typology: Language Sampling Uniformitarianism (Lass 1997) spoken anno 2000 32
All Languages: impossible Extant Documented: 1500 Extinct Documented: <100 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 “Human Language” Typology: Language Sampling Uniformitarianism (Lass 1997) spoken anno 2000 33
All Languages: impossible Extant Documented: Extinct languages: 1500 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 Typology: Variety among human languages Typology: Language Sampling 0. 6% spoken anno 2000 34
All Languages: impossible Extant Documented: Extinct languages: 1500 X 1, X 2, X 3, …, Xn 240, 000 Human languages 247, 500 Variety among human languages Typology: Language Sampling 0. 6% spoken anno 2000 35
Variety: rare types Variety: Typology: Language Sampling 36
Variety: rare types Variety: Clicks (only in one family – Khoisan: 30 lgs) Active nominal marking (Pomo, Laz) Opposite person hierarchy Acc-Erg (Tib. Burm. ) Tripartite agreement on ditransitives Syntactic ergativity (Aus, Maya) Adverbial agreement with focal (Aus, Cauc) OSV main clause order (S. Am) N. B. combination of (rare) features (cf. Greenberg) Typology: Language Sampling 37
Variety: rare types Variety: Clicks (only in one family – Khoisan: 30 lgs) Active nominal marking (Pomo, Laz) Opposite person hierarchy Acc-Erg (Tib. Burm. ) Tripartite agreement on ditransitives Syntactic ergativity (Aus, Maya) Adverbial agreement with focal (Aus, Cauc) OSV main clause order (S. Am) “Rara et Rarissima” Typology: Language Sampling 38
Data collecting Why not all languages in our database? - Too many - Only <1000 well described (grammar) <2000 partial sketch - Not (always) necessary - Sometimes even wrong - Impossible even in principle à Problematic for variety à Possibly not for universality Typology: Language Sampling 39
Data collecting Why not all languages in our database? - Too many - Only <1000 well described (grammar) <2000 partial sketch - Not (always) necessary - Sometimes even wrong Typology: Language Sampling 40
Data collecting Why not all languages in our database? - Too many - Only <1000 well described (grammar) <2000 partial sketch - Not (always) necessary - Sometimes even wrong Typology: Language Sampling 41
Too many languages Samples in the typological literature: Greenberg (1963) – Word order Hawkins (1983) – Word order Tomlin (1986) – Word order Nichols (1992) – Head/Dep marking Bybee (1994) – Tense/Aspect/Mood Siewierska & Bakker (1990 -) – Pers. Agr. Dryer (1985 -) – Word order 30 225 402 174 76 450 1200 Typical Ph. D project (1 person, 3 years): 50 - 100 Typology: Language Sampling 42
Data collecting Why not all languages in our database? - Too many - Only <1000 well described (grammar) <2000 partial sketch - Not (always) necessary - Sometimes even wrong Typology: Language Sampling 43
Data collecting Why not all languages in our database? - Too many sample inevitable - Only <1000 well described (grammar) <2000 partial sketch - Not (always) necessary - Sometimes even wrong Typology: Language Sampling 44
Data collecting Why not all languages in our database? - Only <1000 well described (grammar) <2000 partial sketch - Not (always) necessary - Sometimes even wrong Typology: Language Sampling 45
Data collecting Why not all languages in our database? - Only <1000 well described (grammar) <2000 partial sketch - Not (always) necessary - Sometimes even wrong Typology: Language Sampling 46
Lack of material Bibliographic bias: - (very) old - scarce - theory specific (Tagmemics; GG) - restricted to phonology and morphology - biased selection of the world’s languages: Typology: Language Sampling 47
Lack of material Further types of bias: Typology: Language Sampling 48
Lack of material Further types of bias: - Genetic Typology: Language Sampling 49
Lack of material Further types of bias: - Genetic Indo-European, Ugric, Bantu Australian, Amerindian, Papuan Typology: Language Sampling ++ -- 50
Lack of material Further types of bias: - Genetic - Areal Typology: Language Sampling 51
Lack of material Further types of bias: - Genetic - Areal Sprachbund: Balkan Circum-Baltic C. America S. E. Asia … Typology: Language Sampling 52
Lack of material Further types of bias: - Genetic - Areal - Typological Typology: Language Sampling 53
Lack of material Further types of bias: - Genetic - Areal - Typological Parametric variables (Hawkins 1983): Typology: Language Sampling 54
Lack of material Further types of bias: - Genetic - Areal - Typological Parametric variables (Hawkins 1983): Adposition Typology: Language Sampling 55
Lack of material Further types of bias: - Genetic - Areal - Typological Parametric variables (Hawkins 1983): Prep Typology: Language Sampling 56
Lack of material Further types of bias: - Genetic - Areal - Typological Parametric variables (Hawkins 1983): Prep [ Dem Num Adj Gen Rel N ]NP Typology: Language Sampling 57
Lack of material Further types of bias: - Genetic - Areal - Typological Parametric variables (Hawkins 1983): PRep. Noun. Mod. Hierarchy: Typology: Language Sampling 58
Lack of material Further types of bias: - Genetic - Areal - Typological Parametric variables (Hawkins 1983): PRep. Noun. Mod. Hierarchy: Prep ((NDem OR NNum NA) AND (NA NGen) AND (NGen NRel)) Typology: Language Sampling 59
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural Typology: Language Sampling 60
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural Linguistic relativity (Sapir; Whorf) Typology: Language Sampling 61
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural Linguistic relativity (Sapir; Whorf) Lucy (1992): count nouns vs classifiers ~ counting tasks Typology: Language Sampling 62
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural - Community size Typology: Language Sampling 63
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural - Community size Small high genetic drift (Kimura 1983) Typology: Language Sampling 64
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural - Community size Small high genetic drift (Kimura 1983) Also linguistic drift? (Dahl: hunter/gatherer) Typology: Language Sampling 65
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural - Community size Small high genetic drift (Kimura 1983) Also linguistic drift? (Dahl: hunter/gatherer) N. B. OSV/OVS only in < 3000 languages Typology: Language Sampling 66
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural - Community size - Language contact Typology: Language Sampling 67
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural - Community size - Language contact Borrowed phenomenon measured twice Typology: Language Sampling 68
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural - Community size - Language contact BUT: contact may also create new types Typology: Language Sampling 69
Lack of material Further types of bias: - Genetic - Areal - Typological - Cultural - Community size - Language contact BUT: contact may also create new types Typology: Language Sampling 70
Data collecting Why not all languages in our database? - Only <1000 well described (grammar) <2000 partial sketch (= bibliographical bias) - Not (always) necessary - Sometimes even wrong Typology: Language Sampling 71
Data collecting Why not all languages in our database? - Only <1000 well described (grammar) < 2000 partial sketch à Cater for biases by stratifying for the relevant dimensions - Not (always) necessary - Sometimes even wrong Typology: Language Sampling 72
Data collecting Why not all languages in our database? - Not (always) necessary - Sometimes even wrong Typology: Language Sampling 73
Small is beautiful A good sample may be better than a large sample: Sample type and size depends on goal of project: Establish the probability of a language type (e. g. prepositional vs postpositional) Probability sample Explore the existing variety on a certain dimension (e. g. case systems; combination of order patterns) Variety sample Typology: Language Sampling 74
Small is beautiful 1. Probability sample - Only independent cases Control for: - genetic relations - language contact But: relative stability of relevant variables - Reflexive passive (Romance vs Slavic) Typology: Language Sampling 75
Small is beautiful Samples in the typological literature: Greenberg (1963) – Word order Hawkins (1983) – Word order Tomlin (1986) – Word order Nichols (1992) – Head/Dep marking Bybee (1994) – Tense/Aspect/Mood Siewierska & Bakker (1990 -) – Pers. Agr. Dryer (1985 -) – Word order Typology: Language Sampling 30 225 402 174 probab 76 450 1200 76
Large may be better 2. Variety sample - Maximum (all? ) different cases Cater for: - variation in genetic/areal groups - typically cyclical - stop when no new cases found Research parameters typically unknown ! Typology: Language Sampling 77
Probability vs Variety Probability sample: - relatively small (30 – 150) - may be too large (double cases) Variety sample: - relatively large (> 200) - can not be too large (just superfluous) Typology: Language Sampling 78
Sampling in the literature Introductions to Typology: Comrie (1981) 9 -12 (4) Croft (1990) 18 -26 (9) Whaley (1997) 36 -43 (8) Song (2001) 17 -38 (22) Typology: Language Sampling 79
Probability sampling Bell (1978) - genetic, areal and typological bias - 478 genetic groups (> 3000 year depth) - per family: n of lgs proportional to n of groups - problems: sample < 478: selection small families ‘disappear’ Typology: Language Sampling 80
Probability sampling Perkins (1980) - Bell stratified for culture (Murdock 1967) - 50 languages with optimal genetic and cultural distance - good for probability, too small for variety Typology: Language Sampling 81
Probability sampling Dryer (1989) - ~ Bell, but: - 322 established genera, 3500 – 4000 years deep - variable values established per genus not language (mainly stable, else the most frequent) - 5 macro-areas, counting genera per area: Typology: Language Sampling 82
Probability sampling Africa Eurasia Aus-NG N. Amer SOV 22 26 29 26 18 SVO 21 19 6 6 5 SOV > SVO Typology: Language Sampling 83
Probability sampling Africa Eurasia Aus-NG N. Amer SOV 22 26 29 26 18 SVO 21 19 6 6 5 Good for universal preferences on stable variables Unclear how to generalize to other types of sampling, with languages central Typology: Language Sampling 84
Variety sampling Characteristics: Create variety samples of any size Free choice of classification used (Gen/Ar/Typ) Stratification on other parameter (Gen: Ar/Typ) Generate new samples + evaluate existing samples Fully formalized and computer implemented Typology: Language Sampling 85
Variety sampling Central idea: - classifications express linguistic (dis)similarities between languages - established on the basis of expert knowledge - subject to cyclical improvement and refinement - best starting point for explorative research into variation among languages Typology: Language Sampling 86
Variety sampling Afro-Asiatic Amerindian Caucasian Typology: Language Sampling Dravidian 87
Variety sampling Afro-Asiatic Amerindian Caucasian Dravidian Mimimum sample: 1 language per family Typology: Language Sampling 88
Variety sampling Afro-Asiatic HBR ARB Amerindian QUE GUA Dravidian Caucasian GEO Typology: Language Sampling CHE KAN TAM 89
Variety sampling Afro-Asiatic HBR ARB Amerindian QUE GUA Dravidian Caucasian GEO Typology: Language Sampling CHE KAN TAM 90
Variety sampling Afro-Asiatic HBR ARB Amerindian QUE GUA Dravidian Caucasian GEO CHE KAN TAM Select language with the best description (for the purpose) Typology: Language Sampling 91
Variety sampling Afro-Asiatic HBR ARB Amerindian QUE GUA Dravidian Caucasian GEO CHE KAN TAM Includes all ISOLATES: Basque, Burushaski, Ket, Nahali, … Typology: Language Sampling 92
Variety sampling Afro-Asiatic Amerindian Caucasian Dravidian Mimimum sample: 1 language per family Ruhlen (1991): 27 Ethnologue (2005): 120 Basic Sample Murdock (1967): 50 Typology: Language Sampling 93
Variety sampling DV=3 Afro-Asiatic DV=6 Amerindian DV=2 Caucasian Dravidian Extending the Basic Sample to preferred size: e. g. extend Ruhlen-based sample from 27 50 KEY: relative complexity of family tree Typology: Language Sampling 94
Variety sampling DV=3 Afro-Asiatic DV=6 Amerindian DV=2 Caucasian Dravidian Adjusting DV values to full tree structure: Recursively down the trees Lower levels contribute relatively less to DV Typology: Language Sampling 95
Variety sampling DV=3 Afro-Asiatic DV=6 Amerindian DV=2 Caucasian Dravidian Formula for weight per level: Ck = Ck-1 + ( Nk - Nk-1 ) * ( MAX – (k-1) ) / MAX ) See Rijkhoff & Bakker (1998) Typology: Language Sampling 96
Variety sampling DV=55. 5 DV=178. 4 Afro-Asiatic Amerindian DV=8. 5 Caucasian Dravidian Formula for weight per level: Ck = Ck-1 + ( Nk - Nk-1 ) * ( MAX – (k-1) ) / MAX ) Typology: Language Sampling 97
Variety sampling DV=55. 5 Afro-Asiatic 3 DV=178. 4 Amerindian 6 DV=8. 5 2 Caucasian Dravidian Formula for weight per level: Ck = Ck-1 + ( Nk - Nk-1 ) * ( MAX – (k-1) ) / MAX ) Typology: Language Sampling 98
Variety sampling DV=55. 5 DV=178. 4 Afro-Asiatic Amerindian DV=8. 5 Caucasian Dravidian Computer program: Typology: Language Sampling 99
Variety sampling DV=55. 5 DV=178. 4 Afro-Asiatic Amerindian DV=8. 5 Caucasian Dravidian Computer program: à Number of lgs per family given sample size Typology: Language Sampling 100
Variety sampling Sample. Size Family (n=5273) 30 50 100 250 Afro-Asiatic (258) 1 2 6 16 Amerind (854) 2 7 18 51 14 39 Austric (1186) RUHLEN (1991) 2 5 Caucasian (38) 1 1 1 3 Chukchi (5) 1 1 Indo-European (180) 1 2 4 11 Typology: Language Sampling 101
Variety sampling Sample. Size Family (n=5273) 30 50 100 250 Afro-Asiatic (258) 1 2 6 16 Amerind (854) 2 7 18 51 5. 9% Austric (1186) 2 5 14 39 3. 3% Caucasian (38) 1 1 1 3 Chukchi (5) 1 1 Indo-European (180) 1 2 4 11 Typology: Language Sampling 6. 1% 102
Variety sampling DV=55. 5 DV=178. 4 Afro-Asiatic Amerindian DV=8. 5 Caucasian Dravidian Computer program: à Number of lgs per family given sample size Typology: Language Sampling 103
Variety sampling DV=55. 5 DV=178. 4 Afro-Asiatic Amerindian DV=8. 5 Caucasian Dravidian Computer program: à Number of lgs per family given sample size à Optimal distribution over subbranches (maximum distance maximum variety) Typology: Language Sampling 104
Variety sampling Amerind Main Branch (n=854) Total DV in 250 sample (n = 51) Central 60 19. 1 6 Ge-Pano Carib 193 29. 3 9 Northern 232 45. 5 14 Eq-Tucanoan 268 45. 0 14 Chibchan-Paezan 71 16. 9 5 Andean 30 9. 9 3 Typology: Language Sampling 105
Variety sampling Amerind Main Branch (n=854) Total DV in 250 sample (n = 51) Central 60 19. 1 6 Ge-Pano Carib 193 29. 3 9 Northern 232 45. 5 14 Eq-Tucanoan 268 45. 0 14 Chibchan-Paezan 71 16. 9 5 Andean 30 9. 9 3 Typology: Language Sampling 106
Variety sampling Amerind Main Branch (n=854) Total DV in 250 sample (n = 51) Central 60 19. 1 6 Ge-Pano Carib 193 29. 3 9 Northern 232 45. 5 14 Eq-Tucanoan 268 45. 0 14 Chibchan-Paezan 71 16. 9 5 Andean 30 9. 9 3 Typology: Language Sampling 107
Variety sampling Amerind Main Branch (n=854) Total DV in 250 sample (n = 51) Central 60 19. 1 6 Ge-Pano Carib 193 29. 3 9 Northern 232 45. 5 14 Eq-Tucanoan 268 45. 0 14 Chibchan-Paezan 71 16. 9 5 Andean 30 9. 9 3 Typology: Language Sampling 108
Variety sampling Amerind Main Branch (n=854) Total DV in 250 sample (n = 51) Central 60 19. 1 6 Ge-Pano Carib 193 29. 3 9 Northern 232 45. 5 14 Eq-Tucanoan 268 45. 0 14 Chibchan-Paezan 71 16. 9 5 Andean 30 9. 9 3 Typology: Language Sampling 109
Variety sampling Amerind Main Branch (n=854) Total DV in 250 sample (n = 51) Central 60 19. 1 6 = 10% Ge-Pano Carib 193 29. 3 9 Northern 232 45. 5 14 Eq-Tucanoan 268 45. 0 14 Chibchan-Paezan 71 16. 9 5 Andean 30 9. 9 3 Typology: Language Sampling 110
Variety sampling Amerind (51 / 854) Andean (3 / 30) Typology: Language Sampling 111
Variety sampling Amerind (51 / 854) Andean (3 / 30) NORTH URA CAHUA QUECH Typology: Language Sampling AYMA SOUTH 112
Variety sampling Amerind (51 / 854) Andean (3 / 30) NORTH URA CAHUA QUECH Typology: Language Sampling AYMA SOUTH 113
Variety sampling Amerind (51 / 854) Andean (3 / 30) NORTH URA CAHUA QUECH Typology: Language Sampling AYMA SOUTH 114
Variety sampling: output Typical output: Typology: Language Sampling 115
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Typology: Language Sampling 116
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Typology: Language Sampling 117
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Afro-Asiatic (55. 53/6/258) Altaic (15. 07/2/62) Amerind (178. 44/6/854) Australian (67. 58/30/262) … 6 2 18 7 Typology: Language Sampling 118
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Afro-Asiatic (55. 53/6/258) 6 Altaic (15. 07/2/62) 2 Amerind (178. 44/6/854) 18 Australian (67. 58/30/262) 7 … Na-Dene (9. 44/2/41) 1 Niger-Kordofanian (90. 38/2/1068) 9 … Typology: Language Sampling 119
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Afro-Asiatic (55. 53/6/258) 6 Altaic (15. 07/2/62) 2 Amerind (178. 44/6/854) 18 Australian (67. 58/30/262) 7 … Na-Dene (9. 44/2/41) 1 Niger-Kordofanian (90. 38/2/1068) 9 … Basque (1. 00/0/0) 1 Etruscan (1. 00/0/0) 1 Gilyak (1. 00/0/0) 1 Typology: Language Sampling 120
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Afro-Asiatic (55. 53/6/258) 6 Altaic (15. 07/2/62) 2 Amerind (178. 44/6/854) 18 Australian (67. 58/30/262) 7 … Na-Dene (9. 44/2/41) 1 Niger-Kordofanian (90. 38/2/1068) 9 … Basque (1. 00/0/0) 1 Etruscan (1. 00/0/0) 1 Gilyak (1. 00/0/0) 1 Typology: Language Sampling 121
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) … Niger-Kordofanian (90. 38/2/1068) 9 … Typology: Language Sampling 122
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Niger-Kordofanian (2/1068) 9 Niger-Congo (2/1036) 8 Niger-Congo Proper (2/1007) 7 Central Niger-Congo (2/961) 6 South Central Niger-Congo (3/755) Eastern (9/703) Western (2/47) Ijo-Defaka (2/5) North Central Niger-Congo (4/206) West Atlantic (3/46) 1 Mande (3/29) 1 Kordofanian (2/32) 1 Typology: Language Sampling 3 1 1 1 3 123
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Niger-Kordofanian (2/1068) 9 Niger-Congo (2/1036) 8 Niger-Congo Proper (2/1007) 7 Central Niger-Congo (2/961) 6 South Central Niger-Congo (3/755) Eastern (9/703) Western (2/47) Ijo-Defaka (2/5) North Central Niger-Congo (4/206) West Atlantic (3/46) 1 Mande (3/29) 1 Kordofanian (2/32) 1 Typology: Language Sampling 3 1 1 1 3 124
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Niger-Kordofanian (2/1068) 9 Niger-Congo (2/1036) 8 Niger-Congo Proper (2/1007) 7 Central Niger-Congo (2/961) 6 South Central Niger-Congo (3/755) Eastern (9/703) Western (2/47) Ijo-Defaka (2/5) North Central Niger-Congo (4/206) West Atlantic (3/46) 1 Mande (3/29) 1 Kordofanian (2/32) 1 Typology: Language Sampling 3 1 1 1 3 125
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Niger-Kordofanian (2/1068) 9 Niger-Congo (2/1036) 8 Niger-Congo Proper (2/1007) 7 Central Niger-Congo (2/961) 6 South Central Niger-Congo (3/755) Eastern (9/703) Western (2/47) Ijo-Defaka (2/5) North Central Niger-Congo (4/206) West Atlantic (3/46) 1 Mande (3/29) 1 Kordofanian (2/32) 1 Typology: Language Sampling 3 1 1 1 3 126
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Niger-Kordofanian (2/1068) 9 Niger-Congo (2/1036) 8 Niger-Congo Proper (2/1007) 7 Central Niger-Congo (2/961) 6 South Central Niger-Congo (3/755) Eastern (9/703) Western (2/47) Ijo-Defaka (2/5) North Central Niger-Congo (4/206) West Atlantic (3/46) 1 Mande (3/29) 1 Kordofanian (2/32) 1 Typology: Language Sampling 3 1 1 1 3 127
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Niger-Kordofanian (2/1068) 9 Niger-Congo (2/1036) 8 Niger-Congo Proper (2/1007) 7 Central Niger-Congo (2/961) 6 South Central Niger-Congo (3/755) Eastern (9/703) Western (2/47) Ijo-Defaka (2/5) North Central Niger-Congo (4/206) West Atlantic (3/46) 1 Mande (3/29) 1 Kordofanian (2/32) 1 Typology: Language Sampling 3 1 1 1 3 128
Variety sampling: output Classification: Ruhlen 91 Criterion 1: Diversity Value: dynamic/global/average Sample size: 100 ( 1. 90 % of 5273) Niger-Kordofanian (2/1068) 9 Niger-Congo (2/1036) 8 Niger-Congo Proper (2/1007) 7 Central Niger-Congo (2/961) 6 South Central Niger-Congo (3/755) Eastern (9/703) Western (2/47) Ijo-Defaka (2/5) North Central Niger-Congo (4/206) West Atlantic (3/46) 1 Mande (3/29) 1 Kordofanian (2/32) 1 Typology: Language Sampling 3 1 1 1 3 129
Variety sampling Side effect of large (variety) sample: Hidden diachrony Typology: Language Sampling 130
Variety sampling Problems: - works only on tree-shaped classifications - time depth in genetic trees: unbalanced - not good for probability samples - Creoles? Extinct languages? Typology: Language Sampling 131
Round off Typology: Language Sampling 132
Round off Two Sample Strategies: Typology: Language Sampling 133
Round off Two Sample Strategies: 1. Probability sample - relatively small - control for Gen/Ar/Typ bias Typology: Language Sampling 134
Round off Two Sample Strategies: 1. Probability sample - relatively small - control for Gen/Ar/Typ bias 2. Variety sample - relatively large - may be stratified for bias parameters - may have diachronic dimension Typology: Language Sampling 135
Round off Sample Types: 1. Probability sample 2. Variety sample 3. Random sample: when bias is unimportant Typology: Language Sampling 136
Round off Sample Types: 1. Probability sample 2. Variety sample 3. Random sample: when bias is unimportant 4. Convenience sample: when bibliographical constraints kick in. . . Typology: Language Sampling 137
? Typology: Language Sampling 138
- Slides: 138