Understanding and presenting your findings Present the basics
Understanding and presenting your findings.
Present the basics to tell a story. Do not present advanced statistics and confusion. Today is about creating business intelligence.
Solve problems – you want a raise a promotion more funding for your group only one way long term add value. Create Business Intelligence.
The Presentation The presentation is a very important part. Often it can be the most important part of a project. � 1. A good presentation should support the findings not just mention the findings. 1. 2. The supporting statistics, and graphs within the presentation can help people understand or confuse people. Management will often rely on the presentation to understand the findings from data mining. 1. 2. Management needs to trust the findings, if the findings are presented poorly, it is difficult to trust the findings. A poor presentation can even cause projects to fail. Management will not implement what they do not trust nor understand. Unfortunately, many statisticians and computer scientists are lacking in this critical area. � � They tend to merely look at the results and the numbers in the computer output. This makes many data analysis projects not as successful as they should be. � The poor presentation, explanation often leaves management unclear on how to understand proceed with the findings from the project.
GLM Example
T-Log Data: How can we use this information to understand about the different configuration? � Comparing different types of checkout counter styles and cash registers using transaction log data (T-log) in terms of speed. � Partial T-Log Data: � Configuration of checkout counter. � There are 4 types. 2 Different shapes and 2 different cash register types. � nitems=number of items purchased during transaction � tender=0 if cash is used, 1 if credit is used � massist=1 if manager assist 0 otherwise � timer 1=time first item is scanned � timer 2=time last item is scanned � timer 3=time transaction completed A snapshoot of the data.
T-Log Data: It is necessary to create new variables for modeling, cannot use the data as is. � We would want to do a general linear model to investigate speed in terms of configuration for the different shapes and register types. � An estimate for the time of a transaction could be a new variable equal to timer 3 -timer 1. � What about configuration. Really we would desire to variables, one variable for the shape of the counter and another variable for the register type. A snapshoot of the data.
The General Linear Model (GLM). Do Not Show in presentation! Note: This could be done better, but that is for another day.
From This we can see the approx 0. 9 second difference per item by caused the two shapes
No difference for cash, all have a beta of about 10 seconds
We can see about a 5 second difference for the two register types for credit.
From This we can see the approx 0. 9 second difference per item by caused the two shapes No difference for cash, all have a beta of about 10 seconds We can see about a 5 second difference for the two register types for credit. This could be done better, but that is for another day.
Now how to present the results
Which Cashier Register and Counter Design Are Best? Comparing different types of checkout counter styles and cash registers using transaction log data (T-log)
Main Objective �To Understand The Differences Among The Checkout Counters �There Are 4 different configurations � Two different shapes of counters � Two different types of cash registers
First the High Level Findings – depends on your style, I like at end seen both ways �Checkout counter shape and cash register type both have an impact on speed/time of transaction. � These findings were statistically significant. Remember File Layout - very important!!! There are 4 types: 2 different shapes and 2 different cash register types. �Using various statistical techniques, we found that configuration types 1 and 2 were best. �Although configuration type 2 on average was faster than type 1, we could not statistically prove that type 2 was faster than type 1 in general. � There was a large difference in average time between Type 2 and the other types for manager assists but, we could not substantiate whether it was not just random chance. � We found that Type 2 was faster than all other types including Type 1 when credit was used.
The Next Few Slides Will Highlight the Differences Among The Checkout Configurations
The Total Transaction Time: Final Time Minus the Time First Item Was Scanned On Average Configuration 2 fastest and is 3 Seconds Faster per transaction than Configuration 1.
Understanding Manager Assistance For The Cashier and the Configurations A huge time difference when a manager has to assist a cashier due to issues with the cash register and a typical unassisted transaction.
Understanding Manager Assists Configuration 1 has the lowest percent of manager assists but after considering 5, 000 transactions for each configurations, it is possible that the difference is between the configurations is mere random chance.
Time is a Function of the Number of Items Purchased As expected, a positive linear relationship.
There is Definitely A Difference Resulting From Shape and Cash Register Type Time Per Item: Scan and Bag The counter shape helps about 0. 9 seconds per item. Shape 1 Shape 2
Another Way of Looking At The Time Per Item: Not nearly as nice as a graph, opinion. The counter shape helps about 0. 9 seconds per item. Shape 1 Shape 2
There is Definitely A Difference Resulting From Shape and Cash Register Type 1: Configuration 1, 3 Average Time To Make Payment Register Type 2: Configuration 2, 4 Type 2 is better by approximately 5 seconds for when credit cards are used. Cash no difference.
Looking At Total Transaction Time Without Manager Assists Configurations 1 and 2 are best when looking at cash transactions. Configurations is the best overall when looking at credit cards only. Thus configuration 2 is best in terms of overall speed.
Conclusions/Recommendations �Focus should be on reducing the need for manager assistance. � A Major cause of time wasted is when a manager needs to assist the cashier. Approximately an additional 9 minutes spent. �Configuration Types 1 and 2 perform the best. �Given that Type 2 performs better than type 1 when credit is used and since we expect the use of credit to extend in Thailand, we would recommend Type 2. � For a day with 2, 000 transactions with an average savings of 3 seconds per transaction, the total savings time is 6, 000 seconds or 100 minutes in man labor per day. For a day with 12, 000 transactions it can lead to a savings time of 600 minutes or 10 hours in man labor per day. Note: a company looking to down size - eliminate cashiers this would be useful information.
Presenting a logistic regression model
The Presentation �A key to understanding is presentation. How do we view our results. � Visualization and presentation is very important. �It is important to know your audience. � Your audience determines how you will present what you learn from the logistic regression model. � Senior management in a business is not interested in a theoretical data mining discussion. S/he is interested in how your fraud detection model will help the company. � A fellow statistician would need less visualization as they already understand, but in my opinion a nice presentation of results can only help. �We will next cover how to look at the variables that enter into your model. � This is very important for gaining trust in your work.
How Do We View the Independent Variables in the Model? �It is important to interpret the variable in the model and then look at the variable individually compared to the dependent variable. � Often the variable when viewed in the model might have the opposite relationship with the dependent variable than it does when looked at separately. � This can result from multicollinearity. � Multicollinearity will not be covered. �Often when creating a model, it is good to think about the variables that enter into the model and why they are entered. You may be asked to explain why you choose to keep a certain variable and use it in the model. � One way to investigate the independent variable’s relationship with the dependent variable is in the same way as when investigating the model.
Sample Partial Presentation Of A Fraud Detection Model Included is only an explanation of variables in the model and model validation.
Most Important Factors For Detecting Fraud
Number Of Inquiries For Credit In The Past 6 Months This slide is showing that people with more inquiries (applications) for credit are more likely to be a victim of fraud. Perhaps some of the inquiries for credit were made by someone attempting to commit fraud and not the actual individual.
Number Of Inquiries For Credit In The Past 6 Months This slide is showing the same information as the previous slide. This slide is more informative, but many people will think the previous slide is better and easier to understand. Know your audience (who you present to)!
Percent Match and Mismatch Database On Driver License Number People who are committing fraud are more likely to write a driver license number on the application different from the database you have.
Percent Match and Mismatch Database On Zip Code People who are committing fraud are more likely to write a zip code on the application different from the database you have.
Average Age Of Applicant Younger people are more often victims of fraud.
Gender Of Applicant Females are more often victims of fraud.
Gender Of Applicant Again there is more than one way to present the same thing. Know your audience (who you present to)!
An More Graphs �Those simple graphs would be produced for all variables in the model.
Understanding The Fraud Detection Model Performance This Model has a KS of 25. 82. By refusing the bottom 10% of applicants you can reduce fraud by 32% (25, 532/80, 000) By refusing the bottom 10% you would have 32 good loans to one fraud, before 24 good loans to one
Customer Profiling and Customer Value Sample Marketing Project – students were to rank customers according to revenue and risk.
Two Main Objectives �To Understand Your Customers �To Understand the Value of Your Customers �To help make marketing strategies.
First, Who Are Your Customers We Looked At All 15, 045 Customers To Understand Who They Are
Gender: More Women than Men
Plan Type Approximately 45% are in plan type 5. Plan Type 5 is the most popular plan type without a doubt.
Minutes Used Plan Type and Average Minutes Used
Plan Type and Mean Minutes Used Only in plan type 5 do the customers use less than the minimum. In this plan type, customers give you “free” money. The total “free” money is $62, 942 for December 2005, 0. 2% of revenue, total=$34, 282, 000.
Plan Type and Mean Minutes Used This slide is terrible! It has the same information as the last slide but without highlighting what the reader should learn from the slide. This is to show the importance of making your point clear in a presentation.
Present Payment Status Although most people are paying you and are current at the present time, 3% are in default.
Worst Payment Status Less than 25% have paid on time all the time, for the past 12 months. During the past 12 months, a little more than 50% of your customers have been 30 days overdue in payment at least once. During the past 12 months, a little more than 18% of your customers have been 60 days overdue in payment at least once.
Present and Worst Ever Status Most customers are late at some point, but ultimately pay approximately 66%. These people will most likely pay.
You Should Do Many More Graphs … �Also, the graphs should be made much prettier.
Ranking Your Customers According to Their Value to You First we will discuss what is value and then we will discuss one way to rank your customers in terms of value.
What is Value � Value is a combination of two things: 1. Profit l 2. The more profitable the customer, the more valuable he or she is to your company. Risk l l l Unfortunately, some of your customers are not paying you. � Approximately 3%. This is lost money. People using their cell phone but not paying you are really a negative value, as they cost you money. For this reason, we feel risk is an important factor in understanding customer value.
Profit �Due to the sensitive nature of profit margins, we could not use actual profit. �Thus, to understand your most profitable customers we used revenue. �Revenue was calculated using plan type and minutes used. (Should give more details).
Revenue: We created Five Categories For Revenue
Risk �To understand Risk we looked at all 12 months of the payment history data. �Ultimately we decided to use the worst ever payment status for the past 12 months as a proxy for risk. The higher the value, the higher the risk. �Thus Risk has 5 categories, with values ranging from 0 -4; the value 0 being the least risky and 4 being the most risky. A value of 4 actually means the person is in default.
Risk This is a very basic concept of risk, should incorporate present status as well. These are the Least Risky Each Level Indicates an Additional Increased Level in Risk
Understanding Value as a Function of Risk and Revenue Percent of Customers These are your 2 nd best customers, green. They pay, low risk, and they don’t use much phone time. They make up 21. 67% of your customers. These are your best customers, gold. They pay, low risk, and they use a lot of phone time. They make up 52. 50% of your customers. This is the 5 th, group; red for stop. These customers are in 3 rd, grey. They are grey, since they are in a grey area, and are risky. This is the 4 th, yellow, for caution.
Understanding Value as a Function of Risk and Revenue Percent of Revenue Your 2 nd best customers, green, make up approximately 14. 84% of your revenue. Your best customers, gold, make up approximately 59. 33% of your revenue. Your two top groups make up 74. 17% of your total revenue.
Value and Phone Usage
Value And Gender
Recommendations �You want to keep your highest value customers happy. �Consider creating an additional phone plan since more than 40% of your customers are in phone plan type 5. � This recommendation is in part a result of finding customers that do not even use the minimum minutes for plan type 5. �Most of your customers pay late at least once in 12 months. � Consider charging late fees to increase revenue. � Adding all of the 12 months late for all of your customers, that are not in default equals approximately 27, 478 late fee charges per year. If you charge a 10 baht per month late fee, you could make approximately an additional 274, 780 baht per year. Imagine a project costing 500, 00 baht leading to an extra 274, 780 baht/year in addition to doing the requested work. You have proven you’re worth and more as a consultant. Statements such as the one above should be checked with the client before making them. Are they already using late fees?
Simple is better in presenting
Keep It Simple �Don’t present what you don’t understand. �People want to understand what you are presenting. �Do you buy products you don’t understand just because the salesman says you should? No. �Leverage what you learn from the advanced statistics and create the supporting materials to convey the story – Business Intelligence.
Thanks For basic questions: dryver@gmail. com
- Slides: 69