SASSAS Enterprise Miner IBMIntelligent Miner SolutionClementine Simon Fraser
典型的知识发现系统 § SAS公司的SAS Enterprise Miner § IBM公司的Intelligent Miner § Solution公司的Clementine § 加拿大Simon Fraser Univ. 的DBMiner § 中科院计算技术研究所的MSMiner § 等 2021/9/12 史忠植 高级人 智能 3
Weka 作者: Ian H. Witten / Eibe Frank 副标题: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems) 页数: 525 出版社: Morgan Kaufmann 出版年: 2005 -06 -08 9/12/2021 University of Waikato 9
WEKA: the bird(译:秧鸡) Copyright: Martin Kramer (mkramer@wxs. nl) 9/12/2021 University of Waikato 11
WEKA: versions n There are several versions of WEKA: n n WEKA 3. 4: “book version” compatible with description in data mining book WEKA 3. 6: “GUI version” adds graphical user interfaces WEKA 3. 7: “development version” with lots of improvements This talk is based on the snapshot of WEKA 3. 3 9/12/2021 University of Waikato 14
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . . 9/12/2021 University of Waikato 16
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63, male, typ_angina, 233, not_present 67, male, asympt, 286, yes, present 67, male, asympt, 229, yes, present 38, female, non_anginal, ? , not_present. . . 9/12/2021 University of Waikato 17
9/12/2021 University of Waikato 18
9/12/2021 University of Waikato 19
9/12/2021 University of Waikato 20
Explorer: pre-processing the data n n Data can be imported from a file in various formats: ARFF, CSV, C 4. 5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: n Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 9/12/2021 University of Waikato 21
9/12/2021 University of Waikato 22
9/12/2021 University of Waikato 23
9/12/2021 University of Waikato 24
9/12/2021 University of Waikato 25
9/12/2021 University of Waikato 26
9/12/2021 University of Waikato 27
9/12/2021 University of Waikato 28
数据仓库: OLAP n ROLAP: Relational OLAP n MOLAP: Multidimensional OLAP n HOLAP: Hybrid OLAP 2021/9/12 史忠植 高级人 智能 32
数据挖掘集成 具:数据挖掘任务模型 DMTask = (V, R) V = {x | x ∈Step. Objects} R = {<x, y> | P(x, y) ∧ x, y∈V} Step 3 Step 1 Step 2 Step 5 Step 4 2021/9/12 史忠植 高级人 智能 52
数据挖掘集成 具:数据挖掘任务模型 步骤对象BNF语法定义: <Step. Object> : : = <Attribute_List>; <Method_List> <Attribute_List> : : = [<Attribute>|<Attribute>; <Attribute_List>] <Attribute> : : = <Name>, <Value> <Method_List> : : = [<Method>|<Method>; <Method_List>] <Method> : : = <Name>, <Script> <Name> : : = [<char>|<string>] <Value> : : = [<char>|<string>|<integer>|<float>] <Script> : : = <DML_Sentence>* 2021/9/12 史忠植 高级人 智能 53
云计算时代的分布并行编程技术 n 分布并行数据处理技术 n n n 分布式文件系统 n n n Google Map/Reduce Hadoop Map/Reduce Google File System Hadoop Distributed File System 分布式数据库 n n Google Big. Table Hadoop HBase
www. intsci. ac. cn/shizz/ Questions? ! 2021/9/12 史忠植 高级人 智能 84
- Slides: 84