ICT 619 Intelligent Systems Topic 6 Data Mining

Data Mining § § § Introduction Business Applications of Data Mining Activities Data Mining

Why data mining? § “Customers who bought this title also bought … “ -

Why data mining? (cont’d) § Useful information (business intelligence) to gain competitive advantage can

Data mining in perspective § OLAP with data warehouses tells us what is happening

Data mining in perspective (cont’d) § Statistical methods do not scale up to today's

Business Applications of Data Mining § Fastest growing segment of business intelligence market §

Business Applications of Data Mining (cont’d) § Data mining for customer relationship management §

Data Mining Activities § Two broad groups – directed and undirected data mining §

Different types of data mining tasks ICT 619 10

Data Mining Tasks Classification § Assigns a given object to a predefined category (class)

Data Mining Tasks Estimation § Continuously varying outcomes § Eg income, probability of a

Data Mining Tasks (cont’d) Finding affinity grouping or association rules § Finds out, which

Data Mining Tasks (cont’d) Description and visualisation § To help increase our understanding of

Data Mining Techniques § Our aim is a basic understanding of data mining techniques

Decision Trees § Visual representation of a reasoning process § Particularly suitable for solving

Decision Trees (cont’d) § Each leaf node is labelled with a class label §

Decision Trees (cont’d) § Each edge originating from an internal node is labelled with

How decision trees work § Each record with N attributes is a point in

How decision trees work For example, in the example decision tree, any selfemployed person

How decision trees work (cont’d) Overfitting in decision trees § A decision tree that

How decision trees are built Recursive partitioning § An iterative process of splitting the

How decision trees are built (cont’d) Recursive partitioning (cont’d) § The most important task

How decision trees are built (cont’d) Recursive partitioning (cont’d) § The splitting process is

How decision trees are built (cont’d) Different types of decision trees § Types depend

Algorithms for building decision trees § Most notable are - CHAID, C 4. 5/C

Application of decision trees § Useful when the data mining task is classification of

Automatic Cluster Detection § Aims to discover structure in a complex data set as

K-means clustering § Available in a wide variety of commercial data mining tools §

K-means clustering (cont’d) § Records are assigned to the clusters through an iterative process

K-means clustering (cont’d) § The centroid of a cluster of records calculated by taking

K-means clustering (cont’d) § In the k-means method, the original choice of the value

Interpreting clusters § Automatic clustering is undirected data mining § - We look for

Application of clusters Clustering is used § When natural groupings are suspected, Eg, groups

Artificial Neural Networks § Main generic application of artificial neural networks is pattern recognition

Artificial Neural Networks (cont’d) § SOM learning algorithms are unsupervised § Clusters are represented

Application of neural nets § Artificial neural networks can produce very good results §

How to Apply Data Mining Four ways of utilising data mining expertise in business:

Purchasing Software Two possibilities: § Software may be an actual model Eg, in the

Tasks for the data mining model builder § Model building software automate the process

Hiring Outside Experts § Recommended approach if § Organization in early stages of integrating

Hiring Outside Experts (cont’d) Outside expertise for data mining is likely to be available

Developing In-house Expertise for data mining § Applies particularly to companies which have many

Data Mining Development Methodology § Best practice yet to emerge (Hirji 2001) § A

Data Mining Development Methodology (cont’d) Cabena’s five-stage model: § Business objective determination § Clearly

A Case Study (Hirji 2001) § Involved a large fast food outlet § Brought

A Case Study (Hirji 2001) (cont’d) § Case study used IBM’s Intelligent Miner for

A Case Study (Hirji 2001) (cont’d) Interactive data mining and results analysis stage §

REFERENCES § Berry, M. , & Linoff, G. Mastering Data Mining, Wiley Computer Publishing,

Slides: 49

Download presentation

ICT 619 Intelligent Systems Topic 6: Data Mining ICT 619 S 2 -05

Data Mining § § § Introduction Business Applications of Data Mining Activities Data Mining Techniques How to Apply Data Mining Development Methodology ICT 619 2

Why data mining? § “Customers who bought this title also bought … “ - from Amazon. com Why? – More effective (targeted) marketing How? – Targeting through association § Abundance of business data typically in terabytes - point-of-sale (POS) devices, customer call detail databases, web log files in e-commerce etc § Data is being collected mostly for improving efficiency of underlying operations But not for analysis ICT 619 3

Why data mining? (cont’d) § Useful information (business intelligence) to gain competitive advantage can be extracted by "mine"-ing data § Examples: underlying trends, associations or patterns in market behaviour § According to (Hirji 2001), “ … data mining is the analysis and non-trivial extraction of data from databases for the purpose of discovering new and valuable information, in the form of patterns and rules, from relationships between data elements. ” ICT 619 4

Data mining in perspective § OLAP with data warehouses tells us what is happening and how § Data mining tells us what is likely to happen § Data mining is knowledge discovery in (commercial) databases - KDD § Data mining is a process rather than a product ICT 619 5

Data mining in perspective (cont’d) § Statistical methods do not scale up to today's problems § New "intelligent" tools are needed § Data mining draws from artificial intelligence/soft computing, database theory, data visualization, marketing, statistics, and so on § Our objectives: § § § Understand the role of data mining in business Distinguish between different data mining techniques Understand how to go about making use of data mining ICT 619 6

Business Applications of Data Mining § Fastest growing segment of business intelligence market § Increasingly an integral and necessary component of an organization’s portfolio of analytical techniques § Data mining for marketing § Uses data on customer behaviour to identify target groups for marketing § Reduces cost by avoiding groups unlikely to respond ICT 619 7

Business Applications of Data Mining (cont’d) § Data mining for customer relationship management § Anticipating customers’ needs and responding to them proactively § Data mining in R&D § Can lower costs during the R&D phase of the product life cycle by analysing voluminous test data § Bioinformatics - data mining in biology and medicine ICT 619 8

Data Mining Activities § Two broad groups – directed and undirected data mining § Directed data mining § We know what we are looking for § We aim to find the value of a pre-identified target variable in terms of a collection of input variables, eg, classifying insurance claims § Undirected data mining § Finds patterns in data § Leaves it to the user to find the significance of these patterns § Eg, identifying groups of customers with similar buying patterns ICT 619 9

Different types of data mining tasks ICT 619 10

Data Mining Tasks Classification § Assigns a given object to a predefined category (class) based on the object’s attributes (features) § Objects to be classified are generally database records. § Discrete outcomes – yes/no, low/medium/high etc, § Examples of classification tasks: § Assigning keywords to articles § Classifying credit applicants as low, medium and high risk § Assigning customers to predefined customer segments ICT 619 11

Data Mining Tasks Estimation § Continuously varying outcomes § Eg income, probability of a customer leaving (known in data mining circles as churning) § Outcomes can also be used for classification by ranking and thresholding Prediction § Classification or estimation task performed to predict some future behaviour § Examples include: - Predicting which customers will churn in the next six months - Predicting the size of a balance that will be transferred ICT 619 12

Data Mining Tasks (cont’d) Finding affinity grouping or association rules § Finds out, which things go together, eg, in a supermarket shopping trolley § Used for arranging items in shelves or catalogues § Identifying cross-selling opportunities Clustering § Segments a group of diverse records into subgroups or clusters containing similar records § No predefined classes in clustering; records grouped based on similarities in their attributes § Eg, people with similar buying habits § Data miner must interpret clusters and decide what to do ICT 619 13

Data Mining Tasks (cont’d) Description and visualisation § To help increase our understanding of people, products or processes that produced the data § A good description can provide an explanation of their behaviour § Data visualisation can be very effective in explaining things by exploiting our ability to utilise visual clues ICT 619 14

Data Mining Techniques § Our aim is a basic understanding of data mining techniques to find out § § § When to apply them How to interpret their results How to evaluate their performance § Three major approaches are: § Decision trees § Automatic cluster detection § Artificial neural networks (supervised and unsupervised) ICT 619 15

Decision Trees § Visual representation of a reasoning process § Particularly suitable for solving classification problems § Consists of internal nodes, leaf nodes and edges Fig. A sample decision tree for catalogue mailing (Ganti et al. 1999). ICT 619 16

Decision Trees (cont’d) § Each leaf node is labelled with a class label § The class label decided by the class of the records that ended up in that leaf during training § A leaf node may also contain a value depending upon the average of the values of such records Fig. A sample decision tree for catalogue mailing (Ganti et al. 1999). Group A contains any self-employed person aged <=40 and earning a salary of more than $50, 000. ICT 619 17

Decision Trees (cont’d) § Each edge originating from an internal node is labelled with a splitting predicate involving that node’s splitting attribute § The splitting predicate forces any record to take a unique path from the root to exactly one leaf node. Fig. A sample decision tree for catalogue mailing (Ganti et al. 1999). Group A contains any self-employed person aged less than 41 and earning a salary of more than $50, 000. ICT 619 18

How decision trees work § Each record with N attributes is a point in an Ndimensional record space § Each branch in a decision tree is a test on a single variable that splits the space into two or more regions § With each successive test and split, the resulting regions get more and more segregated with increasing homogeneity among the records § Ultimately, the leaf nodes will contain the purest batch of records ICT 619 19

How decision trees work For example, in the example decision tree, any selfemployed person aged <= 40 and earning a salary of more than $50, 000 will be classified as belonging to group A. ICT 619 20

How decision trees work (cont’d) Overfitting in decision trees § A decision tree that correctly classifies every single record § Such a tree is unlikely to generalise to new data sets § To prevent overfitting, test data set are used to prune decision trees once it has been built using the training data set. ICT 619 21

How decision trees are built Recursive partitioning § An iterative process of splitting the training data into partitions (regions of record space) § Initially, all records are in a training set consisting of pre -classified records § An algorithm splits up the data, using every possible binary split on every field of the records § The best split is defined as one that creates partitions where a single class predominates ICT 619 22

How decision trees are built (cont’d) Recursive partitioning (cont’d) § The most important task in building a decision tree is to decide which of the attributes (independent fields in a record) gives the best split § The measure used to evaluate a potential splitter is the reduction in diversity (or increase in purity) § The best split has the largest reduction in diversity § One measure of diversity is the Gini index: 2 p 1 * (1 – p 2) ICT 619 23

How decision trees are built (cont’d) Recursive partitioning (cont’d) § The splitting process is applied to each of the new parts and so on until no more useful splits can be found § A node becomes a leaf node when no split can be found that significantly decreases the diversity Pruning § The full decision tree needs to be pruned to improve its performance § Pruning is done by removing leaves and branches (edges leading to leaves) that fail to generalise § There a number of pruning methods Eg, a tree is pruned back to the subtree that minimises error on the test set. ICT 619 24

How decision trees are built (cont’d) Different types of decision trees § Types depend upon § § § the number of splits allowed at each level how these splits are chosen when the tree is built how the tree is pruned to prevent overfitting § More broadly, decision trees can be grouped as: § Classification trees (leaves represent classes) § Regression trees (leaves represent a numeric value) ICT 619 25

Algorithms for building decision trees § Most notable are - CHAID, C 4. 5/C 5. 0 and CART § Data mining software tools allow approximation of any of these algorithms by providing choice of § splitting criteria and pruning strategies § control parameters such as maximum tree depth ICT 619 26

Application of decision trees § Useful when the data mining task is classification of records or prediction of outcomes § Also chosen to generate understandable rules, which can be explained and translated into SQL or a natural language § For example, IF age < 41 AND income < $50, 000 AND employment = self THEN belongs to group A ICT 619 27

Automatic Cluster Detection § Aims to discover structure in a complex data set as a whole in order to carve it up into simpler groups § Examples of clustering - finding products that should be grouped together in a catalogue, or - identifying groups of customers with similar tastes in music § Many methods for finding clusters in data, a prominent one is K-means clustering ICT 619 28

K-means clustering § Available in a wide variety of commercial data mining tools § Divides the data set into a predetermined number, k, of clusters § Initial clusters centred at random points (seeds) in the record space ICT 619 29

K-means clustering (cont’d) § Records are assigned to the clusters through an iterative process § In the first step, k data points are selected to be the seeds § Each seed is an embryonic cluster with only one element § In the second step, each record is assigned to the cluster whose centroid is nearest to that record § This forms the new clusters with new intercluster boundaries. ICT 619 30

K-means clustering (cont’d) § The centroid of a cluster of records calculated by taking average of each field for all the records in that cluster § Euclidean distance most commonly used for measuring distance by data mining software. § Distance between two points P(x 1, x 2, . . , xn) and Q(y 1, y 2, . . , yn) in n-dimensional space is ((x 1 -y 1)2 + (x 2 -y 2)2 +. . + (xn-yn)2). ICT 619 31

K-means clustering (cont’d) § In the k-means method, the original choice of the value of k determines the number of clusters that will be found § Unless advanced knowledge is available on the likely number of clusters, value of k is determined by trial-and -error § Best results are obtained when k matches the underlying structure of the data. ICT 619 32

Interpreting clusters § Automatic clustering is undirected data mining § - We look for something useful without having to know what we are looking for § Both an advantage and possible disadvantage! § The most frequently used approaches interpreting clusters are § Building a decision tree with the cluster labels as target variables, and using it to derive rules explaining how to assign new records to the correct cluster § Using visualisation to see how the clusters are affected by changes in input variables. § Examining the differences in the distributions of variables from cluster to cluster, one variable at a time. ICT 619 33

Application of clusters Clustering is used § When natural groupings are suspected, Eg, groups representing customers or products that have a lot in common with each other § When there are many competing patterns in the data making it hard to spot any single pattern § Creating clusters reduces the complexity within clusters so that other data mining techniques are more likely to succeed ICT 619 34

Artificial Neural Networks § Main generic application of artificial neural networks is pattern recognition or classification § Estimation and prediction can be viewed as variants of classification § The best ANN model for performing classification is the backpropagation network (or the multilayer perceptron) § The ANN model particularly suited for clustering is the Kohonen net or the self-organising map (SOM) ICT 619 35

Artificial Neural Networks (cont’d) § SOM learning algorithms are unsupervised § Clusters are represented in a SOM by groups of adjacent neurons in output layer § SOM reduces dimensionality from N to 2 § SOM can serve as a clustering tool as well as visualisation tool for high-dimensional data § SOMs claimed to be often more effective than k-means for complex shaped clusters ICT 619 36

Application of neural nets § Artificial neural networks can produce very good results § But require extensive data preparation involving normalisation and conversion of categorical values to numeric values § Do not work well when there are many hundreds or thousands of input features - long training phases § Difficult to understand because they represent complex non-linear models § Unlike decision trees, do not produce rules readily. § A good choice for most classification and prediction tasks when the results are more important than understanding how the model works ICT 619 37

How to Apply Data Mining Four ways of utilising data mining expertise in business: 1. By purchasing readymade scores (such as on credit worthiness for a loan applicant) from outside vendors. 2. By purchasing software that embodies data mining expertise designed for a particular application such as credit approval, fraud detection or churn prevention 3. By hiring outside consultants to perform data mining for special projects 4. By developing own data mining skills within the business organization Purchasing scores is quick and easy, but the intelligence limited to single score values ICT 619 38

Purchasing Software Two possibilities: § Software may be an actual model Eg, in the form of a set of rules for decision support, or a fully-trained neural network applied to a particular domain § Software may embody knowledge of the process of building models for a particular domain in the form of a model-creation wizard or template Purchased models work well if the products, customers, and market conditions match those used to develop the model ICT 619 39

Tasks for the data mining model builder § Model building software automate the process of creating candidate models and selecting the ones that perform best § Significant tasks left for the user: § Choosing a suitable business problem to be addressed by data mining. § Identifying and collecting data that is likely to contain the information needed to answer the business question. § Pre-processing the data so that the data mining tool can make use of it § Transforming the database so that the input variables needed by the model are available § Designing a plan of action based on the model and implementing it in the marketplace § Measuring results of the actions and feeding them back into the database for future mining. ICT 619 40

Hiring Outside Experts § Recommended approach if § Organization in early stages of integrating data mining in its business § Data mining activity is to be an one-off process § Not if § it is to be an ongoing process, eg, data mining for customer relationship management § Outside expertise for data mining is likely to be available in three possible places ICT 619 41

Hiring Outside Experts (cont’d) Outside expertise for data mining is likely to be available in three possible places: 1. From a data mining software vendor 2. Data mining centres Usually collaborations between universities and private companies Eg, Monash Data Mining Centre 3. Consulting companies Consulting company chosen should have had experience specifically in the area of interest to the organisation ICT 619 42

Developing In-house Expertise for data mining § Applies particularly to companies which have many products and customers § Should be a core competency of all large scale businesses ICT 619 43

Data Mining Development Methodology § Best practice yet to emerge (Hirji 2001) § A proposed a five-stage model (Cabena 1998): 1. 2. 3. 4. 5. Business objective determination Data preparation Data mining Results analysis Knowledge assimilation ICT 619 44

Data Mining Development Methodology (cont’d) Cabena’s five-stage model: § Business objective determination § Clearly identifying the business problem to be mined § Data preparation § Data selection, preprocessing and transformation § Data mining § Algorithm selection and execution § Results analysis § Has anything new or interesting been found § Knowledge assimilation § Formulate ways of exploiting the new information extracted ICT 619 45

A Case Study (Hirji 2001) § Involved a large fast food outlet § Brought out some deficiencies of the above methodology § A new set of stages for data mining development and use proposed: 1. Business objective determination 2. Data preparation 3. Data audit 4. Interactive data mining and results analysis 5. Back end data mining 6. Results synthesis and presentation ICT 619 46

A Case Study (Hirji 2001) (cont’d) § Case study used IBM’s Intelligent Miner for Data on AIX as the data mining tool § Took 20 actual days of effort across the 6 stages above § Back end data mining involves data enrichment and additional data mining algorithm execution by the data mining specialist § Distribution of time required § 45% taken up by stages 4, 5 and 6 § 30% required by the data preparation stage (70% predicted in the earlier model) § Use of a data warehouse saved time needed for selecting, cleaning, transforming, coding, and loading the data ICT 619 47

A Case Study (Hirji 2001) (cont’d) Interactive data mining and results analysis stage § Linking data mining results with business strategy and using application software such as spreadsheets to perform sensitivity analysis of results obtained § Aims to demonstrate how data mining results support business strategy ICT 619 48

REFERENCES § Berry, M. , & Linoff, G. Mastering Data Mining, Wiley Computer Publishing, New York 2000. § Cabena, P. , Hadjinian, P. , Stadler, R. , Verhees, J. , and Zanasi, A. Discovering Data Mining: From Concept to Implementation. Prentice Hall, Englewood Cliffs, NJ 1998. § Dhar, V. , & Stein, R. , ”Deriving Rules from Data” in Seven Methods for Transforming Corporate Data into Business Intelligence. , Prentice Hall 1997, pp. 167 -189, 251 -258. § Ganti, V. , Gehrke, J. , & Ramakrishnan, R. Mining Very Large Databases, IEEE Computer, Vol. 32 No. 8, August 1999, pp. 38 -45. § Hirji, K. , Exploring Data Mining Implementation, Communications of the ACM, Vol. 44, No. 7, July 2001, pp. 87 -93. § Web site on Data Mining and Web Mining http: //www. kdnuggets. com/software/suites. html ICT 619 49