Data Mining Sample Questions Data Mining lectures Dr



























































- Slides: 59
ﺩﺍﺩﻩ کﺎﻭی ﻧﻤﻮﻧﻪ ﺳﺆﺎﻝ Data Mining Sample Questions ﺩکﺘﺮ ﻣﺤﻤﺪ ﺣﺴﻱﻦ ﻧﺪﻱﻤﻱ ﺩﺍﻧﺸکﺪﻩ ﻣﻬﻨﺪﺳﻱ کﺎﻣپﻱﻮﺗﺮ ﺩﺍﻧﺸگﺎﻩ آﺰﺍﺩ ﺍﺳﻼﻣﻱ ﻭﺍﺣﺪ ﻧﺠﻒ آﺒﺎﺩ Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 1
Star • ﺷﻤﺎﻱ ﺳﺘﺎﺭﻩ ﺍﻱ ﻱﺎ Snowflake • ﺷﻤﺎﻱ ﺩﺍﻧﻪ ﺑﺮﻓﻱ ﻱﺎ Fact Constellation • ﺷﻤﺎﻱ ﺻﻮﺭﺕ ﻓﻠکﻱ ﻭﺍﻗﻌﻱ ﻱﺎ Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 4
: Fact table. ﻱک ﻣﺠﻤﻮﻋﻪ ﺟﺪﺍﻭﻝ کﻮچکﺘﺮ ﺑﻪ ﺍﺯﺍﻱ ﻫﺮ ﺑﻌﺪ : Dimension table ﻱک ﺟﺪﻭﻝ ﻣﺮکﺰﻱ ﺑﺰﺭگ کﻪ ﺷﺎﻣﻞ ﻣﺠﻤﻮﻋﻪ ﺍﻱ ﺑﺪﻭﻥ ﺍﻓﺰﻭﻧگﻱ ﺍﺯ ﺩﺍﺩﻫﺎﺳﺖ Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 5
( چﻬﺎﺭ ﺩﻱﺪگﺎﻩ ﻣﺘﻔﺎﻭﺗﻱ کﻪ ﺩﺭ ﻣﻮﺭﺩ ﻱک ﺍﻧﺒﺎﺭ ﺩﺍﺩﻩ ﻭﺟﻮﺩ 7. ﺩﺍﺭﺩ ﺭﺍ ﺑﻱﺎﻥ ﻧﻤﺎﻱﻱﺪ • Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 8
. ﺭﺍ ﺑﻱﺎﻥ کﻨﻱﺪ OLAP ( ﺍﻧﻮﺍﻉ ﺳﺮﻭﻱﺲ ﺩﻫﻨﺪﻩ ﻫﺎﻱ 11 ROLAP MOLAP HOLAP ﺭﺍﺑﻄﻪﺍﻱ OLAP ( ﺳﺮﻭﻱﺲ ﺩﻫﻨﺪﻩﻫﺎﻱ 1 چﻨﺪ ﺑﻌﺪﻱ OLAP ( ﺳﺮﻭﺭ ﻫﺎﻱ 2 ﺗﺮکﻱﺒﻱ OLAP ( ﺳﺮﻭﻱﺲ ﺩﻫﻨﺪﻩﻫﺎﻱ 3 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 12
. ( ﺍﺑﺰﺍﺭﻫﺎﻱ ﺍﻧﺒﺎﺭ کﺮﺩﻥ ﺩﺍﺩﻩ ﺭﺍ ﻧﺎﻡ ﺑﺒﺮﻱﺪ 12 access and retrieval tools ( ﺍﺑﺰﺍﺭﻫﺎﻱ ﺩﺳﺘﻱﺎﺑﻱ ﻭ ﺑﺎﺯﻱﺎﺏ 1 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 13
Scan D for count Of each candidate C 1 Item Sup set {A} 4 {B} 5 {C} 4 {D} 4 {E} 2 {F} 3 Compare candidate Itemset Support with min -sup L 1 sup Generate C 2 from L 1 Itemset C 2 sup {A} 4 {A, B} 4 {B} 5 {A, C} 3 {C} 4 {A. D} 3 {D} 4 {A. F} 3 {B. C} 4 {B. D} 4 {B. F} 3 {C, D} 4 {C. F} 2 {D, F} 2 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 15
itemset sup {A, B} 4 {A, C} 3 3 {A, D} 3 {A, B, F} 3 {A, C, D } 3 {B, C} 4 {B, C, D } 4 {B, D} 4 {B, F} 3 {C, D} 4 sup Itemset sup {A, B, C} 3 3 {A, B, D} 3 {A, B, C } {A, B, D } Itemset {A, B, F} 3 {A, C, D} 3 {B, C, D} 4 C 4 L 4 Itemset Sup {ABCD 3 } L 2 C 3 L 3 } closed frequent itemset(1 {B}&{A, B}&{B, C, D}&{A, B, C, D} Maximal frequent itemset(2 {A, B, F}&{A, B, C, D} Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 16
ﺭﺍ ﺭﺳﻢ ﻭ fp-tree ﺍﻟﻒ( ﺑﺮﺍﻱ پﺎﻱگﺎﻩ ﺩﺍﺩﻩ ﺯﻱﺮ ﺩﺭﺧﺖ : 14. ﺭﺍ ﺭﻭﻱ آﻦ ﺍﻋﻤﺎﻝ کﻨﻱﺪ fp-growth ﺍﻟگﻮﺭﻱﺘﻢ • Min-Sup = 2 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 18
L = {(a: 8)(b: 7)(c: 6)(d: 5)(e: 3)} Item ID Sup Count a 8 b 7 c 6 d 5 e 3 Node. Link Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 19
Item Conditional Pattern base Conditional fp-tree Frequent Pattern Generate e {{a, d: 1}{a, c, d: 1}{b, c: 1}} <a: 2> {a, e: 2} d {{a, b, c: 1}{a, b: 1}{a, c: 1}{a: 1}{b, c: 1}} <a: 4, b: 2> {a, d: 4}{a, b: 2}{a, b, d: 2} c {{a, b: 3}{a: 1}{b: 2}} <a: 3, b: 3><b: 2> {a, c: 3}{b, c: 3}{a, b, c: 2} b {{a: 5}} <a: 5> {a, b: 5} Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 20
ﺭﺍ ﺭﻭﻱ آﻦ Eclat ﺏ( ﺑﺮﺍﻱ پﺎﻱگﺎﻩ ﺩﺍﺩﻩ ﺳﻮﺍﻝ ﻗﺒﻞ ﺍﻟگﻮﺭﻱﺘﻢ : 14. ﺍﻋﻤﺎﻝ کﻨﻱﺪ • Min-Sup = 2 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 21
ﺩﺭ ﻓﺮﻣﺖ ﻋﻤﻮﺩﻱ ﺩﺍﺩﻩ 1 -itemset ﺩﺭ ﻓﺮﻣﺖ ﻋﻤﻮﺩﻱ ﺩﺍﺩﻩ 2 -itemset Itemset TID-set a 1, 3, 4, 5, 6, 7, 8, 9 A, b 1, 5, 6, 8, 9 b 1, 2, 5, 6, 8, 9, 10 A, c 3, 5, 6, 8 c 2, 3, 5, 6, 8, 10 A, d 3, 4, 6, 9 d 2, 3, 4, 6, 9 A, e 3, 4, 10 B, c 2, 5, 6, 8, 10 B, d 6, 9 B, e 10 C, d 2, 3, 6 C, e 3, 10 D, e 3, 4 ﺩﺭ ﻓﺮﻣﺖ ﻋﻤﻮﺩﻱ ﺩﺍﺩﻩ 3 -itemset Itemset TID-set A, b, c 6, 8 A, b, d 6, 9 A, c, d 3, 6 B, c, d 2, 6 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 22
jim ﻭ jack, mary ﺑﻱﻤﺎﺭ 3 ﻓﺎﺻﻠﻪ ﺑﻱﻦ ﻫﺮ ﺟﻔﺖ ﺍﺯ : پﺎﺳﺦ Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 27
( ﻓﺮﺽ کﻨﻱﺪ ﻣﺎ ﻱک ﻧﻤﻮﻧﻪ ﺍﻃﻼﻋﺎﺕ ﺯﻱﺮ ﺭﺍ ﺩﺍﺭﻱﻢ 18. ﻣﺎﺗﺮﻱﺲ ﻋﺪﻡ ﺗﺸﺎﺑﻪ آﻦ ﺭﺍ ﺭﺳﻢ کﻨﻱﺪ Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 29
ﺍﺳﺖ ﻭ ﻭﻗﺘﻱ ﺑﺎﻫﻢ 0 ﻱکﺠﻮﺭ ﺍﻧﺪ ﺑﺮﺍﺑﺮ j ﻭ I ﻭﻗﺘﻱ d(i, j) : ﺍﺳﺖ ﺑﻨﺎﺑﺮﺍﻱﻦ 1 ﻣﺘﻔﺎﻭﺕ ﺍﻧﺪ ﺑﺮﺍﺑﺮ Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 31
ﺩﺍﺭﻱﻢ y=(0, 1, 1, 0) ﻭ x=(1, 1, 0, 0) ﻣﺘﻐﻱﺮ 2 ( ﻓﺮﺽ کﻨﻱﺪ 19 ﻃﺒﻖ ﺭﺍ ﺑﺪﺳﺖ آﻮﺭﻱﺪ y ﻭ x ﻣﻌﺎﺩﻟﻪ ﺗﺸﺎﺑﻪ کﺴﻱﻨﻮﺳﻱ ﺷﺒﺎﻫﺖ ﺑﻱﻦ = Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 32
( ﺑﺎ ﺍﺳﺘﻔﺎﺩﻩ ﺍﺯﻣﺠﻤﻮﻋﻪ ﺩﺍﺩﻩ ﻫﺎﻱ آﻤﻮﺯﺷﻱ ﺯﻱﺮ 21 ﺍﺣﺘﻤﺎﻝ ﺍﻧﺠﺎﻡ ﺑﺎﺯﻱ ﺗﻨﻱﺲ ﺑﺎ ﺷﺮﺍﻱﻂ ﺯﻱﺮ ﺭﺍ ﻣﺤﺎﺳﺒﻪ کﻨﻱﺪ؟ <Outlk=sun, Temp=cool, Humid=high, Wind=strong>? : پﺎﺳﺦ P(yes) = 9/14, P(no) = 5/14 P(Wind=strong|yes) = 3/9 P(Wind=strong|no) = 3/5 … P(y) P(sun|y) P(cool|y) P(high|y) P(strong|y) =. 005 P(n) P(sun|n) P(cool|n) P(high|n) P(strong|n) =. 021 • Therefore this new instance is classified to “no” Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 34
Officer Drew IS a female! Officer Drew Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 36
c) FOIL’s information gain ∅ + R 0 P 0=100 & n 0=400 R 1 P 1=4 & n 1=1 R 2 p 1=30 & n 1=10 R 3 p 1=100 & n 1=90 . ﻗﺎﻧﻮﻥ ﺧﻮﺑﻱ ﻧﻱﺴﺖ R 1 ﺑﻬﺘﺮﻱﻦ ﻗﺎﻧﻮﻥ ﺍﺳﺖ ﻭ R 3 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 38
d) The likelihood ratio statistic ﻓﺮکﺎﻧﺲ ﻣﻮﺭﺩ ﺍﻧﺘﻈﺎﺭ ﺑﺮﺍﻱ ﺗﺎپﻠﻬﺎﻱ ﻣﺜﺒﺖ ﻭ ﻣﻨﻔﻱ : R 1 5 × 100/500 = 1 5 × 400/500 = 4 pos neg the likelihood ratio for R 1 is: 2 × [ 4 × log 2(4/1) + 1 × log 2(1/4) ] = 12 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 39
40× 100/500 = 8 ﻓﺮکﺎﻧﺲ ﻣﻮﺭﺩ ﺍﻧﺘﻈﺎﺭ ﺑﺮﺍﻱ ﺗﺎپﻠﻬﺎﻱ ﻣﺜﺒﺖ ﻭ ﻣﻨﻔﻱ : R 2 pos 40 × 400/500 = 32 neg the likelihood ratio for R 2 is : 2 × [ 30 × log 2(30/8) + 10 × log 2(10/32) ] = 80. 85 ﻓﺮکﺎﻧﺲ ﻣﻮﺭﺩ ﺍﻧﺘﻈﺎﺭ ﺑﺮﺍﻱ ﺗﺎپﻠﻬﺎﻱ ﻣﺜﺒﺖ ﻭ ﻣﻨﻔﻱ : R 3 190 × 100/500 = 38 190 × 400/500 = 152 pos neg the likelihood ratio for R 3 is : 2 × [ 100 × log 2(100/38) + 90 × log 2(90/152) ] = 143. 09. ﻗﺎﻧﻮﻥ ﺧﻮﺑﻱ ﻧﻱﺴﺖ R 1 ﺑﻬﺘﺮﻱﻦ ﻗﺎﻧﻮﻥ ﺍﺳﺖ ﻭ R 3 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 40
( ﺭﻭﺷﻬﺎﻱ ﻃﺒﻘﻪ ﺑﻨﺪﻱ ﻭ پﻱﺶ ﺑﻱﻨﻱ ﺭﺍ ﺑﺮﺍﺳﺎﺱ چﻪ 26 ﻣﻌﻱﺎﺭﻫﺎﻱﻱ ﻣﻱ ﺗﻮﺍﻥ ﺍﺭﺯﻱﺎﺑﻱ کﺮﺩ؟ Predictive accuracy ﺩﻗﺖ 1 speed ﺳﺮﻋﺖ -2 Robustness ﺍﺳﺘﺤکﺎﻡ - 3 Interpretability: ﺗﻮﺍﻧﺎﻱﻱ ﺗﻔﺴﻱﺮ 4 scalability ﻣﻘﻱﺎﺱ پﺬﻱﺮﻱ 5 Goodness of rules 6 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 43
X 1 = Acid Durability (seconds) Square Distance to query instance (3, (kg/square meter) 7) X 2 = Strength 7 7 7 4 3 4 1 4 Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 51
( ﻣﺮﺗﺐ ﺳﺎﺯﻱ ﻓﺎﺻﻠﻪ ﻫﺎ ﻭﺗﻌﻱﻱﻦ ﻧﺰﺩﻱکﺘﺮﻱﻦ ﻫﻤﺴﺎﻱﻪ ﺑﺮﺍﺳﺎﺱ 3 K کﻤﺘﺮﻱﻦ ﻓﺎﺻﻠﻪ Square Distance to query instance (3, 7) X 1 = Acid Durability (seconds) X 2 = Strength 7 7 3 7 4 4 3 4 1 1 4 2 (kg/square meter) Rank minimum distance Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 52
ﺩﻗﺖ کﻨیﺪ. Y ( ﺟﻤﻊ آﻮﺭی ﻧﺰﺩیکﺘﺮیﻦ ﻫﻤﺴﺎیﻪ ﻫﺎی ﺩﺳﺘﻪ 4 ﻧیﺴﺖ ﺯیﺮﺍﺭﺗﺒﻪ آﻦ Y کﻪ ﺭﺩیﻒ ﺩﻭﻡ ﺳﻄﺮآﺨﺮﺟﺰﺩﺳﺘﻪ ﺑﻨﺪی . ( ﺍﺳﺖ K=3) کﻤﺘﺮﺍﺯ Square X 2 = Is it Y= Distance X 1 = Acid Rank Strength included in Category of Durability to query minimum 3 -Nearest nearest (kg/square (seconds) distance instance neighbors? Neighbor meter) (3, 7 ) 7 7 3 Yes Bad 7 4 4 No - 3 4 1 Yes Good 1 4 2 Yes Good Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 53
( ﺭﻭﺵ ﺧﻮﺷﻪ ﺑﻨﺪﻱ ﺳﻠﺴﻠﻪ ﻣﺮﺍﺗﺒﻲ ﺭﺍ ﻧﺎﻡ ﺑﺒﺮﻳﺪ ﻭ ﻫﺮﻛﺪﺍﻡ ﺭﺍ 33 ﺑﻄﻮﺭ ﺧﻼﺻﻪ ﺷﺮﺡ ﺩﻫﻴﺪ Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 58
ﺑﺎ ﺗﺸکﺮ Data Mining lectures, Dr. Mohammad Hossein Nadimi, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University 59