Our Secret Data Mart This and other misadventures

  • Slides: 99
Download presentation
Our Secret Data Mart: This and other misadventures in my ten years as a

Our Secret Data Mart: This and other misadventures in my ten years as a business analyst Presentation to the NEODWSIG 3/23/2006 by Michael Mina http: //michaelmina. info Copyright © 2006 Michael Mina

Agenda • • My two bios The goal of this presentation Why "misadventures"? Lessons

Agenda • • My two bios The goal of this presentation Why "misadventures"? Lessons learned Some tools I use The story of our secret data mart Parting advice http: //michaelmina. info Copyright © 2006 Michael Mina 2

My real world bio • Worked as a business analyst since 1995 – Medical

My real world bio • Worked as a business analyst since 1995 – Medical Mutual – Antares Management Solutions – Cleveland State University – National City • Medical Mutual – Completed ad hoc data requests for many different departments. – Developed and programmed the disease management outcomes reporting process. Recognized by CEO for efforts resulting in NCQA accreditation. – Coordinated and prioritized development of reporting, system modifications and resolution of reporting problems for the Underwriting division. Recognized by VP for helping save $400 K annually. http: //michaelmina. info Copyright © 2006 Michael Mina 3

My real world bio (cont'd) • Antares Management Solutions – Working in IT, I

My real world bio (cont'd) • Antares Management Solutions – Working in IT, I developed reporting from multiple data warehouses for business units and partner corporations – Developed web-enabled GUI front-end metadata repository/data dictionary. – Developed and taught courses in business intelligence tools and the metadata repository for business units and partner corporations. – Contributed to data warehouse redesign project. Conceived and developed Quality Assurance methodology for maintenance of dimension tables. – Assisted in the development of data marts and reporting for several departments. http: //michaelmina. info Copyright © 2006 Michael Mina 4

My real world bio (cont'd) • Cleveland State University – Managed staff and developed

My real world bio (cont'd) • Cleveland State University – Managed staff and developed analyses, reporting processes and databases for various departments – Managed the Institutional Research component of the twoyear People. Soft 8 upgrade project. Recognized by both President and CIO for outstanding contributions to the project. – Chaired Data Stewardship Committee (25 members) to manage University data and ensure that the University possesses the accurate data it requires. • National City – Currently working on various data warehouse-related projects, including data warehouse metrics. http: //michaelmina. info Copyright © 2006 Michael Mina 5

My real world bio (cont'd) • Founder of GCPCUG Data Warehousing SIG (met monthly

My real world bio (cont'd) • Founder of GCPCUG Data Warehousing SIG (met monthly 1/2000 - 4/2002) • Developed and currently teach "Business Intelligence & Data Warehousing" continuing education course at CSU http: //michaelmina. info Copyright © 2006 Michael Mina 6

My Bizarro world bio • I'm a citizen of Afghanistan, with Hispanic and Japanese

My Bizarro world bio • I'm a citizen of Afghanistan, with Hispanic and Japanese ancestry. I'm eligible for the Medicare Rx benefit. • My wife is also Hispanic (but not from Afghanistan. ) She is interested in dating others, and has a newborn that I do not know about. • That's what some companies think--My family demographic data needs serious revision. • (For more about Bizarro world, see "This am Bizarro page" at http: //theages. superman. ws/ Encyclopaedia/bizarro. php) http: //michaelmina. info Copyright © 2006 Michael Mina 7

My Bizarro world bio (cont'd) • My point? Enterprises vary greatly in the degree

My Bizarro world bio (cont'd) • My point? Enterprises vary greatly in the degree to which they pursue data quality. • How important is data quality? And to whom does it matter? • Some departments actually consider data quality a threat, regardless of what they say (more on that later. …) http: //michaelmina. info Copyright © 2006 Michael Mina 8

The goal of this presentation • Share some of my experiences and the experiences

The goal of this presentation • Share some of my experiences and the experiences of others (with their permission) for your edification • Focus your attention on matters I believe are not addressed frequently enough • Help you survive and thrive in this line of business http: //michaelmina. info Copyright © 2006 Michael Mina 9

Why "misadventures"? • "Accentuate the positive" or learn from mistakes? • In his classic

Why "misadventures"? • "Accentuate the positive" or learn from mistakes? • In his classic "The Data Warehouse Lifecycle Toolkit", Kimball clearly expects almost everything to go well. • He's a highly paid consultant, so it's no surprise he gets the cooperation he needs. • I had NO experiences that ran as smoothly as his. http: //michaelmina. info Copyright © 2006 Michael Mina 10

Why "misadventures"? (cont'd) • Since becoming a business analyst in 1995, I've reported to

Why "misadventures"? (cont'd) • Since becoming a business analyst in 1995, I've reported to 8 different managers at 4 different companies. • I've seen more missteps than successfully completed data warehouse-related projects. – Projects running past deadline – Chameleon-like project requirements – Broken promises, false accusations – Other unorthodox, unprofessional behaviors http: //michaelmina. info Copyright © 2006 Michael Mina 11

Why "misadventures"? (cont'd) • As a part-time career consultant for eight years, people have

Why "misadventures"? (cont'd) • As a part-time career consultant for eight years, people have sought my advice for dealing with a variety of unpleasant situations. • Experience is the best teacher and bad experience can be the best of the best--especially if it's someone else's. http: //michaelmina. info Copyright © 2006 Michael Mina 12

Lessons I've learned as a Business Analyst • I want to share lessons that

Lessons I've learned as a Business Analyst • I want to share lessons that are data analysis-related and career-related • The latter are important: Do not ignore your own best interests. • I've seen – People hired on Monday who had their department downsized on Friday – People told their department survived a downsizing, just before someone was terminated (not for cause) • So with that in mind, here are some… http: //michaelmina. info Copyright © 2006 Michael Mina 13

Lessons I've learned as a Business Analyst… http: //michaelmina. info Copyright © 2006 Michael

Lessons I've learned as a Business Analyst… http: //michaelmina. info Copyright © 2006 Michael Mina

People define "Data Warehouse" differently • Some people refer to the location of physical

People define "Data Warehouse" differently • Some people refer to the location of physical servers as the DW • Some people call questionably modeled data sets that are used for reporting purposes a DW • Some even call a TNF copy of a TNF production database a DW • Analyst impact: – Make sure you understand which definition is being used – This is especially important during your interviews with people unaware of these multiple definitions. – Don't bother "correcting" people, its counterproductive http: //michaelmina. info Copyright © 2006 Michael Mina 15

True dimensional models are a rarity • With one possible exception, every DW data

True dimensional models are a rarity • With one possible exception, every DW data model I've encountered is TNF, or somewhere between TNF and dimensional, but never purely dimensional • Examples: – TNF portions of DWs – Every date dimension table I've seen uses date as the key rather than a surrogate key • Many decisionmakers cannot be sold on the value of dimensional modeling – One Senior VP claimed he did not want "his" data warehoused. He wanted reporting from the operational system - it was "closer to the source" – Fact: The data governance group was controlled by his rival. Was it ignorance, politics, or both? http: //michaelmina. info Copyright © 2006 Michael Mina 16

True dimensional models are a rarity (cont'd) • Analyst impact: – Modeling is often

True dimensional models are a rarity (cont'd) • Analyst impact: – Modeling is often easier without a dimension model requirement – The opportunities to fully develop dimensional modeling skills may be limited http: //michaelmina. info Copyright © 2006 Michael Mina 17

Companies love MSRE ("misery") • Multi-Source Reporting Environments more common than DWs – One

Companies love MSRE ("misery") • Multi-Source Reporting Environments more common than DWs – One version of the truth often unworkable • Too expensive and time consuming • Deployment of revenue generating systems (and the need for information therefrom) will not wait for the data to be warehoused – More common than DW "one version of the truth": • Source systems as system of record • Several witnesses (data sources) testify, and their "testimony" is adjudicated. Hence, MSRE. http: //michaelmina. info Copyright © 2006 Michael Mina 18

Companies love MSRE (cont'd) • Analyst impact: – More data sources to understand –

Companies love MSRE (cont'd) • Analyst impact: – More data sources to understand – Newer analysts need more mentoring, even when there is effective metadata management – Opportunity to become very valuable to employer by understanding these sources http: //michaelmina. info Copyright © 2006 Michael Mina 19

Data quality can be a threat • Do NOT assume the desire for data

Data quality can be a threat • Do NOT assume the desire for data quality is universal • "Jim" found out that many dimension tables were missing values that were needed to allow analysts to join them to the fact tables. • He took it upon himself to develop a process to systematically detect values in the fact tables that were missing from the dimension tables. It generated 60+ pages (single spaced) of missing codes and descriptions. • Jim proudly showed his manager, who looked as if he had seen a ghost. He thanked Jim, but the system was not implemented. http: //michaelmina. info Copyright © 2006 Michael Mina 20

Data quality can be a threat (cont'd) • The problem: – Jim later learned

Data quality can be a threat (cont'd) • The problem: – Jim later learned that his department created the problem, and that not too many users were aware of it. – There was no glory to be had in cleaning up one's own mess, especially when few people knew about the mistake in the first place. OOPS! • Analyst impact: – Be more politically astute – Weigh the risks of acting independently. In this case, Jim had no regrets. Any guesses why? http: //michaelmina. info Copyright © 2006 Michael Mina 21

Cost-effectiveness can be career-ineffective • "Dave" worked for a state government plagued by significant

Cost-effectiveness can be career-ineffective • "Dave" worked for a state government plagued by significant budget constraints. The directive from on high was to cut costs. • Dave's boss wanted to have her annual department meeting at an expensive location off site. Dave reminded her of the cost cutting directive. It was not appreciated. • When his department was tasked with developing an interactive PDF document containing standard reports from a data mart, Dave's staff put together a very low cost, scalable prototype using Access, Excel and pdf 995 (freeware). http: //michaelmina. info Copyright © 2006 Michael Mina 22

Cost-effectiveness can be career-ineffective (cont'd) • His boss wound up having the system developed

Cost-effectiveness can be career-ineffective (cont'd) • His boss wound up having the system developed using SQL server, running on a $30 K server, and paying $10 K to graphic designers for the interactive PDF. It was completed 18 months after the prototype. • The prototype was about 80% as functional as the final version. • The problem: – Dave's boss believed that one of her staff would gain more job satisfaction using SQL Server. – Also, she needed help from a peer who had purchased a $30 K server that was underutilized. – And, her peer needed to justify his purchase. And so, a friendship was born… http: //michaelmina. info Copyright © 2006 Michael Mina 23

Cost-effectiveness can be career-ineffective (cont'd) • Analyst impact: – Understand differences between the public

Cost-effectiveness can be career-ineffective (cont'd) • Analyst impact: – Understand differences between the public sector and private sectors. Some people just like to spend taxpayer money, and you can't stop them. – The 80/20 rule works both ways: if you are strapped for cash, sometimes it is better to settle for 80% of the functionality at 20% of the cost. – What's cost-effective for your employer may not be careereffective for you. You must decide which is a higher priority for you. In most cases, those who try to work cost-effectively are appreciated. http: //michaelmina. info Copyright © 2006 Michael Mina 24

Requirements management = danger management • Requirements management is the sine qua non of

Requirements management = danger management • Requirements management is the sine qua non of business analysis. It is challenging at best, dangerous at worst. • Challenging: – Determining who has asthma when that information is not explicitly captured in the claims data. – Determining who is an employee • Cleveland State University could not provide one single answer because they were required to report to multiple agencies that had multiple specifications. • There was a Dept of Education definition of Employee • There was an Ohio Civil Rights Commission definition as well. http: //michaelmina. info Copyright © 2006 Michael Mina 25

Requirements management = danger management (cont'd) • Challenging (cont'd) – Suppose I wanted a

Requirements management = danger management (cont'd) • Challenging (cont'd) – Suppose I wanted a list of all Cleveland customers. Do I want: • Customers that live in Cleveland? • Customers that live in the Cleveland sales region? • Customers that live in the Cleveland administrative region? • Customers that are assigned to the Cleveland sales region, regardless of where they actually live? • You get the idea… – Large scale requirements management (e. g. , using IBM Rational Requisite. Pro. ) http: //michaelmina. info Copyright © 2006 Michael Mina 26

Requirements management = danger management (cont'd) • Dangerous #1: Variable requirements - fixed deadline

Requirements management = danger management (cont'd) • Dangerous #1: Variable requirements - fixed deadline – Consider Rob's case: – Rob's manager kept reversing himself on his data requests • "Paint this wagon red. " • "You know, this wagon would look better blue. " • "Why is this wagon blue? Everyone knows wagons should be red. " – On one occasion, Rob's boss significantly changed project requirements the day before the deadline. At 5: 00 pm on the day of the deadline (he usually left at 5: 00), he wrote Rob an e -mail criticizing him for not having the project complete. – Rob went to talk to his boss, but he had gone for the day. – People on Rob's staff even provided proof that Rob's boss reversed himself on requirements. http: //michaelmina. info Copyright © 2006 Michael Mina 27

Requirements management = danger management (cont'd) • The problem: – Rob reported to someone

Requirements management = danger management (cont'd) • The problem: – Rob reported to someone who had a reality-allergy. – Rob was eventually let go, officially because of downsizing. • The solution: – Rob documented his case to his boss in writing, along with supporting documentation. His boss toned down the criticism, but the situation did not revert to normalcy. – Rob also saw an attorney, only to learn that as a result of special circumstances, and through no fault of his own, he didn't have a case. – Rob was eventually able to convince his boss to write him an outstanding letter of recommendation. http: //michaelmina. info Copyright © 2006 Michael Mina 28

Requirements management = danger management (cont'd) • Analyst impact: – You must be able

Requirements management = danger management (cont'd) • Analyst impact: – You must be able to clearly articulate your concerns about requirements, especially to those who don't understand the complexities of your work. – You must document your concerns in writing, preferably in nonrepudiable form (e. g. , e-mail), at minimum in a personal log. – You must keep excellent records of project requirements, changes in requirements, preferably in nonrepudiable form. http: //michaelmina. info Copyright © 2006 Michael Mina 29

Requirements management = danger management (cont'd) • Analyst impact (cont'd): – Even better: publish

Requirements management = danger management (cont'd) • Analyst impact (cont'd): – Even better: publish all changes to the requirements, and all issues related to the requirements, and ensure that stakeholders receive this information. This prevents any pretense to ignorance. – These must be records in your personal possession. General access records are fine as a supplement to personal records. – You cannot be too attached to your job, or your employer. You too may suddenly find yourself reporting to realityaverse management. – Pay as much attention to managing your career as to doing your job. Rob did, and found a better job. http: //michaelmina. info Copyright © 2006 Michael Mina 30

Requirements management = danger management (cont'd) • Dangerous #2: Determining that requirements have been

Requirements management = danger management (cont'd) • Dangerous #2: Determining that requirements have been met – Simple? NO! – Who has final say as to whether or not a requirement has been met? Does he/she play political games? – Often need to negotiate to agree that stated requirements have been met • The problem: – Bonuses for some managers can depend upon completion of certain projects by certain dates. This may result in political pressure to declare that requirements were met. – Also, missing a deadline almost always impacts one negatively. http: //michaelmina. info Copyright © 2006 Michael Mina 31

Requirements management = danger management (cont'd) • Analyst impact: – Be aware that compromises

Requirements management = danger management (cont'd) • Analyst impact: – Be aware that compromises that often occur. These include • Restating the requirements until the work that has been done meets the revised requirements (often accompanied by pretending that these were what the requirements should have been all along. ) • Loosely interpreting the requirements (like penumbras, emanations and such) • Note: I am indicating neither approval nor disapproval for these. – Do not compromise on compliance reporting unless you are willing to go to prison to benefit other people's careers. And DON’T think you’re immune to such pressure, a friend of mine was not. He did the right thing by refusing to cooperate with senior management. (Note: he was eventually promoted into another department. ) http: //michaelmina. info Copyright © 2006 Michael Mina 32

"One-time-only" means "several-times-at-least" • I have yet to develop a one-time-only process one time

"One-time-only" means "several-times-at-least" • I have yet to develop a one-time-only process one time only! • Even if the results are due once, the requirements are often developed interatively, and that can mean rework. • I have never regretted building a repeatable process. • One example: – "Chris" was assigned a mentor that had started a particular analytical project manually rather than programmatically. – He was tasked by his manager to assume responsibility for that analysis and automate it. However, his mentor felt that continuing the analysis manually was the best course of action, even though Chris and his mentor had the same manager. http: //michaelmina. info Copyright © 2006 Michael Mina 33

"One-time-only" means "several-times-at-least" (cont'd) • One example (cont'd) – When Chris met with his

"One-time-only" means "several-times-at-least" (cont'd) • One example (cont'd) – When Chris met with his mentor to discuss his progress on the project, his mentor could not accept that he was to automate the process. She insisted that, had she continued working on the project, it would have been done by then. – Even though Chris followed his manager's instructions, his mentor's negative attitude toward his progress threatened to damage his career. • The reality: – The requirements for the analysis changed significantly one day. Had the process been done manually, the rework costs would have been high. – Two days later, the requirements changed significantly, including twice in the same morning. http: //michaelmina. info Copyright © 2006 Michael Mina 34

"One-time-only" means "several-times-at-least" (cont'd) • The problem: – Did Chris' mentor want the opportunity

"One-time-only" means "several-times-at-least" (cont'd) • The problem: – Did Chris' mentor want the opportunity to complete the "onetime-only" analysis? – Did he/she want to micromanage the project? – Was he/she jealous that Chris was able to automate this process when the mentor could not? – Who knows? http: //michaelmina. info Copyright © 2006 Michael Mina 35

"One-time-only" means "several-times-at-least" (cont'd) • Analyst impact: – Everyone likes to be right. •

"One-time-only" means "several-times-at-least" (cont'd) • Analyst impact: – Everyone likes to be right. • Understand that some people like to be right, so they learn when they're wrong and adjust (I hope that's you. ) • Other people like to be right, and think they are always right. You need to document their incorrectness. – Carefully document: • The requirements given to you • Your decisions regarding implementation • Any shortcomings of the alternatives to repeatibility. http: //michaelmina. info Copyright © 2006 Michael Mina 36

"One-time-only" means "several-times-at-least" (cont'd) • Analyst impact (cont'd) – Business rules are often developed

"One-time-only" means "several-times-at-least" (cont'd) • Analyst impact (cont'd) – Business rules are often developed iteratively. Common sense suggests that a process to implement those rules also be developed and executed iteratively. – Build repeatability into your processes. If you expect some opposition to this, then do not advertise your plans. – Repeatable processes often make for better discussion in job interviews than "one-time-only" projects. http: //michaelmina. info Copyright © 2006 Michael Mina 37

Understanding ETL process(es) is important • To become a more effective business analyst, it

Understanding ETL process(es) is important • To become a more effective business analyst, it is not enough to be familiar with the data in the warehouse. You must also understand: – How the data is loaded – The systems from which data is extracted – Into which warehouse tables that data is loaded – Which data is not loaded and why • This information is often not captured at a sufficient level of detail in metadata repositories • ETL breaks or fixes can cause strange differences in reports from one month to the next. These may not be detected until after the warehouse is loaded. http: //michaelmina. info Copyright © 2006 Michael Mina 38

Understanding ETL process(es) is important (cont'd) • Analyst impact: – More information to understand

Understanding ETL process(es) is important (cont'd) • Analyst impact: – More information to understand – Newer analysts need more mentoring, even when there is effective metadata management – Opportunity to become very valuable to employer by understanding these processes http: //michaelmina. info Copyright © 2006 Michael Mina 39

Beware of IT-focused decisionmaking • We must not be IT-bashers, but: – People who

Beware of IT-focused decisionmaking • We must not be IT-bashers, but: – People who understand technology better than business should not make business decisions, and – People who understand business better than technology should not make technology decisions. • There is a subculture within IT, less dominant than before, that I call "business-agnostic" - they don't really understand their employer's business, and some of them don't want to. • They tend to be technophilic. Some are actually businessphobic. When they have significant decisionmaking authority, they promote suboptimal outcomes. http: //michaelmina. info Copyright © 2006 Michael Mina 40

Beware of IT-focused decisionmaking (cont'd) • Examples of IT-focus to the detriment of businessfocus:

Beware of IT-focused decisionmaking (cont'd) • Examples of IT-focus to the detriment of businessfocus: • Example 1 – A trainer who makes it known that she enjoyed working as a trainer, "except for the people" (i. e. , the students. ) • Example 2 – IT tells its SQL trainer (me) not to teach business analysts about the LEFT JOIN and the RIGHT JOIN because their queries would adversely affect system performance. http: //michaelmina. info Copyright © 2006 Michael Mina 41

Beware of IT-focused decisionmaking (cont'd) • The problem: – The dimension tables are incomplete,

Beware of IT-focused decisionmaking (cont'd) • The problem: – The dimension tables are incomplete, and IT is at fault. A friend of mine ran a report that was short $12 million as a result. – IT must either accept responsibility for the incomplete dimension tables, or accept that business analysts must work around it somehow. http: //michaelmina. info Copyright © 2006 Michael Mina 42

Beware of IT-focused decisionmaking (cont'd) • Analyst impact: – It is NOT acceptable to

Beware of IT-focused decisionmaking (cont'd) • Analyst impact: – It is NOT acceptable to hide from business analysts the fact that using the INNER JOIN could cause their reports to be millions of dollars out of balance. – Weigh the cost of additional CPU time vs. the cost of allowing senior management to receive incorrect information - this is a no-brainer. – I explained that I would not comply for the above reasons, and because compliance would damage my reputation. – Your reputation is a personal asset. You should not allow an employer to squander it any more than you would allow them to raid your savings account. http: //michaelmina. info Copyright © 2006 Michael Mina 43

Beware of IT-focused decisionmaking (cont'd) • Example 3 – IT and business analysts are

Beware of IT-focused decisionmaking (cont'd) • Example 3 – IT and business analysts are displeased with the level of effort required to perform complex analyses. Consequently, IT seeks a replacement for the company standard BI tool. • The problem: – The warehouse's data model was antiquated and no longer supported the complexity of the evolving analytical environment. – Replacement of the BI tool was contraindicated. The solution must involve data model remediation. – The quick fix - changing BI tools - would have been a waste of time and money. http: //michaelmina. info Copyright © 2006 Michael Mina 44

Beware of IT-focused decisionmaking (cont'd) • Example 4 – IT tells business analysts to

Beware of IT-focused decisionmaking (cont'd) • Example 4 – IT tells business analysts to stop using an uppercase conversion function in the WHERE clause in SQL because it makes queries perform poorly. • The problem: – Many fields on which analysts must query are mixed case. – IT did not bother contacting analysts before issuing this "ruling". • Analyst impact: – Learn to identify business-agnosticism and make plans to deal with it in a manner that allows IT to save face. – Gradually bring IT decisionmakers to the understanding that business-agnostic decisions undermine their reputation and authority. http: //michaelmina. info Copyright © 2006 Michael Mina 45

Network intelligently • The problem: – The career advice often given regarding networking is

Network intelligently • The problem: – The career advice often given regarding networking is not sophisticated enough. – "Network, network!" is not a strategy. Volume cannot always compensate for lack of a definite plan. • Analyst impact: – Like any other undertaking, you should determine how much effort you are willing to give, what you want to take, and how to effectively accomplish both. – Why do you (audience) think I (presenter) am here? http: //michaelmina. info Copyright © 2006 Michael Mina 46

Network intelligently (cont'd) • Analyst impact (cont'd): – Primary goals in networking should include

Network intelligently (cont'd) • Analyst impact (cont'd): – Primary goals in networking should include • Building skills you otherwise might not get to develop. • Meeting people likely to help you with your career objectives. • The NEODWSIG meets both of these criteria – Seek opportunities to work with people in other departments and/or companies to achieve primary goals. – Volunteer at places that will help you accomplish those same goals. – If your job does not afford you intelligent networking opportunities, consider finding another. http: //michaelmina. info Copyright © 2006 Michael Mina 47

Don't become "VP of the Weather" • Avoid like the plague getting in situations

Don't become "VP of the Weather" • Avoid like the plague getting in situations where your responsibility greatly exceeds your authority to execute. • I call this being "Vice President of the Weather" - you are held responsible for actions you cannot control. • Often stems from decisionmakers not being held to account for their own unreasonable behavior. http: //michaelmina. info Copyright © 2006 Michael Mina 48

Don't become "VP of the Weather“ (cont’d) • For example: "Tim" was accused by

Don't become "VP of the Weather“ (cont’d) • For example: "Tim" was accused by a "Sam, " a Vice President of IT, of making a bad decision in product selection. • In front of a large group, Sam demanded to know why Tim made the bad selection. • Tim pulled out an e-mail from Sam proving that Sam made the product selection, and Tim answered "because you told me to. " • Note to e-mail users: Don't let this happen to you! http: //michaelmina. info Copyright © 2006 Michael Mina 49

Don't become "VP of the Weather" (cont'd) • The problem: – Sam was my

Don't become "VP of the Weather" (cont'd) • The problem: – Sam was my boss' boss, and he made me Vice President of the Weather on two occasions. • Occasion 1 – My manager and I were made responsible for completely understanding the data of a client. • The problem: – The client did not allow us to talk freely to their subject matter experts. – We were not permitted to study the client's data freely, nor were we allocated the time to do so. • The solution: – You think there is one? Fortunately, matters never came to a head. http: //michaelmina. info Copyright © 2006 Michael Mina 50

Don't become "VP of the Weather" (cont'd) • Occasion 2 – Sam tasks me

Don't become "VP of the Weather" (cont'd) • Occasion 2 – Sam tasks me with developing a large set of reports from a managed care system for a client. • The problem: – I was forbidden to talk with ANYONE working for the client. – I could not ask questions about the deliverables, or the source data. • The solution: – I used the Force. – Seriously, I had experience developing reports like this from a similarly structured database. With the judicious use of assumptions, I was able to deliver. http: //michaelmina. info Copyright © 2006 Michael Mina 51

Don't become "VP of the Weather" (cont'd) • Analyst impact – Always gather information

Don't become "VP of the Weather" (cont'd) • Analyst impact – Always gather information to document the unreasonableness of the request or assignment in the event that things go wrong. – The more times you pull a rabbit out of a hat, the more likely you will be looked upon as a wizard. This is usually bad because you will be relied upon to cover the incompetence of others. – Determine when failure IS an option. Properly managed, limited failure can deter unreasonable requests in the future. – Realistically, you should be responsible for • Your own actions • The actions of those who report to you • The actions of those you've misled (e. g. , by giving bad advice, false information, etc. ) http: //michaelmina. info Copyright © 2006 Michael Mina 52

Some tools I use as a Business Analyst… http: //michaelmina. info Copyright © 2006

Some tools I use as a Business Analyst… http: //michaelmina. info Copyright © 2006 Michael Mina

Tools • • • Source-to-target data maps Data flow charts Ad hoc validation Informal

Tools • • • Source-to-target data maps Data flow charts Ad hoc validation Informal metadata Creative cross joins – Date dimension table – Truth tables • Using Excel for query construction • Complex case catalog http: //michaelmina. info Copyright © 2006 Michael Mina 54

Source-to-target data maps • A table (e. g. , in Excel) showing columns needed

Source-to-target data maps • A table (e. g. , in Excel) showing columns needed for your analysis, and the tables from which you plan to source them. • Best used when your analysis sources data from many tables. http: //michaelmina. info Copyright © 2006 Michael Mina 55

Source-to-target data maps (cont'd) • This an example: http: //michaelmina. info Copyright © 2006

Source-to-target data maps (cont'd) • This an example: http: //michaelmina. info Copyright © 2006 Michael Mina 56

Source-to-target data maps (cont'd) • Advantages: – Complex data analyses could take much longer

Source-to-target data maps (cont'd) • Advantages: – Complex data analyses could take much longer without this tool. – Helps analysts quickly identify columns and tables already being used by the process, as well as best sources for additional columns. – Excel allows filtering on the columns, allowing for more focused review of the data. – Can serve as an addendum to a requirements document • Disadvantages: – Requires some additional effort to develop and maintain http: //michaelmina. info Copyright © 2006 Michael Mina 57

Data flow charts • Not the same as data flow diagrams • Similar to

Data flow charts • Not the same as data flow diagrams • Similar to both flow charts and UML activity diagrams, but not exactly like either http: //michaelmina. info Copyright © 2006 Michael Mina 58

Data flow charts (cont'd) Data sources are listed by name. qnnn = query tnnn

Data flow charts (cont'd) Data sources are listed by name. qnnn = query tnnn = intermediate table Queries are executed in numerical order. Data sources and intermediate tables point to queries that source data from them. Queries point to intermediate tables they create, update, etc. http: //michaelmina. info Copyright © 2006 Michael Mina 59

Data flow charts (cont'd) • Advantages: – Facilitates making inevitable changes to your process

Data flow charts (cont'd) • Advantages: – Facilitates making inevitable changes to your process – Gives a graphical overview of your process – Helps illustrate where changes in your process will have their impacts • Disadvantages: – Symbology not well developed (e. g. , is it a create table, update table, or append table? ) – Some additional time is required to develop and update the data flow chart. http: //michaelmina. info Copyright © 2006 Michael Mina 60

Ad hoc validation • Run an audit process parallel to your primary process. •

Ad hoc validation • Run an audit process parallel to your primary process. • Goal: systematically perform reasonableness checks at each point in your process. • At a minimum, create a table listing – The name of each table in your process, and – Its row count. http: //michaelmina. info Copyright © 2006 Michael Mina 61

Ad hoc validation (cont’d) • Advantages: – Helps identify where improper logic was used.

Ad hoc validation (cont’d) • Advantages: – Helps identify where improper logic was used. – Helps identify points in your process about which you are likely to be questioned by the stakeholders. – For example: Your row count drops after an inner join. Is that a desired or undesired result? • Disadvantages: – Requires some additional effort – Requires additional CPU time http: //michaelmina. info Copyright © 2006 Michael Mina 62

Informal metadata • Formal metadata – Often found in metadata repositories – Physical characteristics

Informal metadata • Formal metadata – Often found in metadata repositories – Physical characteristics of tables, columns, etc. – Definitions of tables, columns, etc. • Informal metadata – Often found in the memories of subject matter experts – May not be easily stored and retrieved using traditional metamodels http: //michaelmina. info Copyright © 2006 Michael Mina 63

Informal metadata (cont'd) • Certain types of metadata are better attached to topics than

Informal metadata (cont'd) • Certain types of metadata are better attached to topics than to tables or columns. • Consider this situation: – Suppose we want to determine if a customer account is open, and there is a closed_date field in table A, the table that is the main source of account information. – One would think that if closed_date were null, then the account would be open. But suppose that is the case only for accounts processed on system X. – Accounts processed on system Y, however, are always missing their closed date on table A for various reasons. – To determine if accounts processed on system Y are open, you must verify that, in another table B, the field open_status = 'Y' for the account in question. http: //michaelmina. info Copyright © 2006 Michael Mina 64

Informal metadata (cont'd) • Question: To which columns or tables should this very valuable

Informal metadata (cont'd) • Question: To which columns or tables should this very valuable metadata (i. e. , how to identify an open account) be attached? – To closed_date? – To open_status? – To table A? – To table B? • Answer: To something else entirely – This is an example of what I call topical metadata, the topic being "open customer account. " • Note: the term "topical metadata" is used frequently in the context of the semantic web, and rarely, if ever, in the context of relational databases. Nevertheless, I believe it is appropriate here. http: //michaelmina. info Copyright © 2006 Michael Mina 65

And Now, It's Time For… http: //michaelmina. info Copyright © 2006 Michael Mina

And Now, It's Time For… http: //michaelmina. info Copyright © 2006 Michael Mina

The Informal Metadata Game Show! http: //michaelmina. info Copyright © 2006 Michael Mina

The Informal Metadata Game Show! http: //michaelmina. info Copyright © 2006 Michael Mina

The Informal Metadata Game Show • Which weighs more? A pound of feathers or

The Informal Metadata Game Show • Which weighs more? A pound of feathers or a pound of gold? http: //michaelmina. info Copyright © 2006 Michael Mina 68

Answer: A pound of feathers weighs more • Metadata is key to this riddle

Answer: A pound of feathers weighs more • Metadata is key to this riddle because "a pound" is not "a pound" • Of course, a quantity of feathers that weighs the same as a quantity of gold will take up more volume than the gold, but this is not a matter of density. http: //michaelmina. info Copyright © 2006 Michael Mina 69

Answer: A pound of feathers weighs more (cont'd) • 1 "pound" of feathers =

Answer: A pound of feathers weighs more (cont'd) • 1 "pound" of feathers = 1 avoirdupois pound – This is equal to 7, 000 grains – See http: //www. britannica. com/eb/article-9011455 • 1 "pound" of gold = 1 Troy pound – One Troy pound is 5, 760 of the same grains used in the definition of avoirdupois pound – See http: //www. 24 carat. co. uk/weightsframe. html • 1 pound of feathers weighs more than 1 pound of gold because "pound" has different meanings in each context. http: //michaelmina. info Copyright © 2006 Michael Mina 70

Informal metadata (cont'd) • A column called "weight" may thus have a different meaning

Informal metadata (cont'd) • A column called "weight" may thus have a different meaning for a particular row if another column called item_desc has the value "gold“ in that row. http: //michaelmina. info Copyright © 2006 Michael Mina 71

Creative cross joins • What is a cross join (aka Cartesian product)? – Suppose

Creative cross joins • What is a cross join (aka Cartesian product)? – Suppose Table 1 has r 1 rows and c 1 columns – Suppose Table 2 has r 2 rows and c 2 columns – Table 1 cross join Table 2 is the combination of every row of Table 1 with every row of Table 2. – Table 1 cross join Table 2 has (r 1 x r 2) rows and (c 1 + c 2) columns • The problem is not so much the c 1 + c 2 as the r 1 x r 2. • I tell my students that cross joins are like snake venom: often deadly, but they can be used to make very powerful medicine in the right hands. http: //michaelmina. info Copyright © 2006 Michael Mina 72

Creative cross joins (cont'd) • Cross join applications – Date dimension table – Truth

Creative cross joins (cont'd) • Cross join applications – Date dimension table – Truth tables – Any application where you need to exhaust all possible combinations of N quantities http: //michaelmina. info Copyright © 2006 Michael Mina 73

Date dimension table • Cross joins can be used to help create a date

Date dimension table • Cross joins can be used to help create a date dimension table for your data warehouse, or any other database. • This is an example done using Microsoft Access Create (manually or otherwise) one table listing the digits 0 through 9. http: //michaelmina. info Copyright © 2006 Michael Mina 74

Date dimension table (cont'd) • Write a cross join query to create a Cartesian

Date dimension table (cont'd) • Write a cross join query to create a Cartesian product of the table with itself four times. • Our SELECT statement will use the Cartesian product to create the integers 0 through 10, 000. • The Date. Add function will add these integers to a starting date (in this case, 1/1/1990). • Different databases may use a different function to add integers to dates. • Save these results in a table. http: //michaelmina. info Copyright © 2006 Michael Mina 75

Date dimension table (cont'd) SELECT 1000*A. Digit+100*B. Digit+10*C. Digit+D. Digit AS Date. Key, Date.

Date dimension table (cont'd) SELECT 1000*A. Digit+100*B. Digit+10*C. Digit+D. Digit AS Date. Key, Date. Add("d", 1000*A. Digit+100*B. Digit+10*C. Digit+D. Digit , #01/01/1990#) AS Date. Value INTO Date. Values FROM Digits AS A, Digits AS B, Digits AS C, Digits AS D ORDER BY 1; http: //michaelmina. info Copyright © 2006 Michael Mina 76

Date dimension table (cont'd) • Then, create a date dimension table using the previous

Date dimension table (cont'd) • Then, create a date dimension table using the previous table. • Use the row functions provided by your database to add the information you need. • Remember to create a primary key on the Time dimension table. http: //michaelmina. info Copyright © 2006 Michael Mina 77

Date dimension table (cont'd) SELECT Date. Key, Date. Value, Day(Date. Value) AS Day, Month(Date.

Date dimension table (cont'd) SELECT Date. Key, Date. Value, Day(Date. Value) AS Day, Month(Date. Value) AS Month, Int((Month(Date. Value)+2)/3) AS Quarter, Year(Date. Value) AS Year INTO Time. Dimension FROM Date. Values http: //michaelmina. info Copyright © 2006 Michael Mina 78

Truth tables • Truth tables can be used to help determine and validate business

Truth tables • Truth tables can be used to help determine and validate business rules. • They are created by – Identifying a set of inputs – exhausting all combinations of True and False for those inputs – evaluating a business rule for each combination • This is an example done using Microsoft Access. http: //michaelmina. info Copyright © 2006 Michael Mina 79

Truth tables (cont'd) • Create a two-column, two-row table like so: • Cross join

Truth tables (cont'd) • Create a two-column, two-row table like so: • Cross join the tables (SQL on next slide) http: //michaelmina. info Copyright © 2006 Michael Mina 80

Truth tables (cont'd) SELECT tbl. True. False. status_boolean, tbl. True. False_1. status_boolean, tbl. True.

Truth tables (cont'd) SELECT tbl. True. False. status_boolean, tbl. True. False_1. status_boolean, tbl. True. False_2. status_boolean, tbl. True. False_3. status_boolean FROM tbl. True. False, tbl. True. False AS tbl. True. False_1, tbl. True. False AS tbl. True. False_2, tbl. True. False AS tbl. True. False_3 ORDER BY tbl. True. False. status_boolean, tbl. True. False_1. status_boolean, tbl. True. False_2. status_boolean, tbl. True. False_3. status_boolean; http: //michaelmina. info Copyright © 2006 Michael Mina 81

Truth tables (cont'd) • This is the output http: //michaelmina. info Copyright © 2006

Truth tables (cont'd) • This is the output http: //michaelmina. info Copyright © 2006 Michael Mina 82

Truth tables (cont'd) • Select the “Analyze It with Microsoft Excel” option http: //michaelmina.

Truth tables (cont'd) • Select the “Analyze It with Microsoft Excel” option http: //michaelmina. info Copyright © 2006 Michael Mina 83

Truth tables (cont'd) • Add your business rules and evaluate them to make sure

Truth tables (cont'd) • Add your business rules and evaluate them to make sure they correspond to reality. http: //michaelmina. info Copyright © 2006 Michael Mina 84

Truth tables (cont'd) • Advantages: – Useful when you must persuade others of the

Truth tables (cont'd) • Advantages: – Useful when you must persuade others of the validity of decision rules – Useful in identifying “holes” in decision logic • Disadvantages: – Requires effort to develop http: //michaelmina. info Copyright © 2006 Michael Mina 85

Using Excel for query construction • Text-related functions in Excel can be used to

Using Excel for query construction • Text-related functions in Excel can be used to write SQL. • Advantages: – Useful when you have complex and/or volatile decision rules – Useful when you have a set of queries you would like to have coded consistently • Disadvantages: – Time consuming http: //michaelmina. info Copyright © 2006 Michael Mina 86

Using Excel for query construction (cont'd) Input some basic information in this section http:

Using Excel for query construction (cont'd) Input some basic information in this section http: //michaelmina. info Copyright © 2006 Michael Mina 87

Using Excel for query construction (cont'd) This is the main workspace. Query parameters are

Using Excel for query construction (cont'd) This is the main workspace. Query parameters are outlined. http: //michaelmina. info Copyright © 2006 Michael Mina 88

Using Excel for query construction (cont'd) • This is the first query, ready to

Using Excel for query construction (cont'd) • This is the first query, ready to be copied into Access. http: //michaelmina. info Copyright © 2006 Michael Mina 89

Using Excel for query construction (cont'd) • This is the second query, ready to

Using Excel for query construction (cont'd) • This is the second query, ready to be copied into Access. http: //michaelmina. info Copyright © 2006 Michael Mina 90

Complex case catalog • Create a catalog of complex cases you've worked on over

Complex case catalog • Create a catalog of complex cases you've worked on over the course of your career • Business problem statements, specifications, personal notes, queries, code samples, etc. • Advantages: – Provides a portable knowledgebase to which you can refer regardless of your employer. – Facilitates resume and interview preparation. – Can provide ideas for writing articles. • Disadvantages: – Requires effort to maintain – Legal issues may be associated with keeping copies http: //michaelmina. info Copyright © 2006 Michael Mina 91

The story of our secret data mart… http: //michaelmina. info Copyright © 2006 Michael

The story of our secret data mart… http: //michaelmina. info Copyright © 2006 Michael Mina

The story of our secret data mart • A coworker friend and I built

The story of our secret data mart • A coworker friend and I built a data mart without official authorization. • Background: – I was developing the first disease management outcomes reporting process for my employer. • The problem: – I was running up against deadlines and system resource issues. – My manager offered next to no support. In fact, she was getting angry with me for system problems I could not control (that dreaded Vice President of the Weather thing. ) http: //michaelmina. info Copyright © 2006 Michael Mina 93

The story of our secret data mart (cont'd) • The solution – I spoke

The story of our secret data mart (cont'd) • The solution – I spoke with a good friend in IT about the issue, and we decided to build a data mart. – We knew that formal project authorization would not be forthcoming, so we worked on it behind the scenes. • Another problem: – We had to account for our time. • The solution – I was "working with IT to help improve system performance" for my project – My friend's time was spent on generic "performance improvement" http: //michaelmina. info Copyright © 2006 Michael Mina 94

The story of our secret data mart (cont'd) • Another problem: – How do

The story of our secret data mart (cont'd) • Another problem: – How do we suddenly reveal the existence of a data mart? • The solution – Evolutionary terminology – From "IT thinks they need to create a summarized data set for us. " – To "IT is summarizing data for us to try to improve performance. " – To "I'm working with IT to decide which columns from which tables we need to summarize. " – To "It's really like a data mart" – To "we used our data mart" – Our managers either didn't notice, or pretended not to notice. http: //michaelmina. info Copyright © 2006 Michael Mina 95

The story of our secret data mart (cont'd) • We got the work done

The story of our secret data mart (cont'd) • We got the work done by deadline. After that, people seemed to pretend that building a data mart was the plan all along. It's amazing how they love a winner. • We planned additional data marts for additional diseases, but it was not to be…. • How did it end? – Shortly thereafter, my friend was promoted into another department. – Sometime after that, I was promoted into my friend's former department. – Within one year, my former manager was demoted for unclear reasons. http: //michaelmina. info Copyright © 2006 Michael Mina 96

The story of our secret data mart (cont'd) • Analyst impact: – Consider very

The story of our secret data mart (cont'd) • Analyst impact: – Consider very carefully how willing you are to bend the rules. – If your manager wants you to do a job with one hand tied behind your back, you need to ask why. We risked damaging our careers to help managers who lacked foresight accomplish their goals. – I would not advise this, nor do it again. http: //michaelmina. info Copyright © 2006 Michael Mina 97

Parting advice • Deepen and expand your skill set. It increases – Your marketability

Parting advice • Deepen and expand your skill set. It increases – Your marketability – Your authority within your organization • Seek opportunities to achieve, and to understand the impact of those achievements • Develop a career portfolio • Network effectively • "As you do good work for your employer, whoever it may be, always ask 'what's in it for me? ' " You owe it to yourself, your spouse/ family/ causes you support • Think twice before building data marts, or anything else, in secret. I lucked out once. I wouldn’t do it again. http: //michaelmina. info Copyright © 2006 Michael Mina 98

Thank you! http: //michaelmina. info Copyright © 2006 Michael Mina

Thank you! http: //michaelmina. info Copyright © 2006 Michael Mina