Multidimensional Detective Alfred Inselberg Multidimensional Graphs Ltd Tel

Multidimensional Detective Alfred Inselberg, Multidimensional Graphs Ltd Tel Aviv University, Israel Presented by Yimeng Dou 04 -24 -2002 ydou@ics. uci. edu 1

Parallel Coordinates • We can use parallel coordinates to model relations among multiple variables, and turn our problem into a 2 -D pattern recognition problem. • It’s very useful for Visual Data Mining. • Two examples: VLSI chip and model of a country’s economy. • The model can be used to do trade-off analyses, discover sensitivities, do approximate optimizations, monitor and Decision Support. 2

Goals of The Program • Without any loss of information. • Low representational complexity O(N) (N is the number of dimensions). • Works for any N. • Treat every variable uniformly. • Can use transformations to recognize objects (rotation, translation, scaling, etc. ). • Easily/Intuitively convey information on the properties of the N-Dimensional object. • Should be based on rigorous mathematical and algorithmic results. 3

In order to discover patterns from a large data set… • Must use parallel coordinates effectively, with proper geometrical understanding and queries (hence the notion of “Multidimensional Detective”). • Instead of mimicking the experience derived from standard display, a good model should exploit the special strengths of the methodology, avoids its weakness. • This task is similar to accurately cutting complicated portions of an N-dimensional watermelon. The cutting tools should be well chosen and intuitive. 4

The VLSI Chip Problem • Understand Figure 1—the full real data set. 473 batches, 16 processes (X 1—X 16). • X 1—Yield (The percentage of useful chips produced in the batch). • X 2—Quality (Speed performance) • X 3 through X 12– 10 different types of defects. 0 defect appears on top. • X 13 through X 16—physical parameters. • The author didn’t specify how to find high yield or high quality. I think high values appear on top, 5 with hints from some of his later description.

Objective • Raise the yield (X 1), and maintain high quality (X 2). It’s a multiobjective optimization problem. • It’s believed that the presence of defects hindered high yields and qualities. • So the goal is—to achieve zero defects. • (But is that really the case? …. let’s see) 6

Observations From Figure 2 • It isolates the batches having the highest X 1 and X 2. Also, notice the two clusters of X 15. • It doesn’t include some batches having high X 3 value (nearly 0 defects). So it casts doubt on the goal of “achieve zero defects”. Is it the right aim? • To answer this question, we construct Figure 3, which includes batches having 0 defects in at least 9 categories (they are really close to the aim of zero defects). Do they have high yields and quality? 7

Figure 3—Our assumption is challenged. • The nine batches have poor yields and low quality. • Here’s another visual cue—X 6. The process is much more sensitive to variations in X 6 than the other defects. • Treat X 6 differently—select those batches with 0 X 6 defects—the very best batch is included. (As shown in Figure 4). 8

Figure 5 and Figure 6—Test The Assumption • Figure 5 shows those batches which does not have zeros for X 3 and X 6. • Figure 6 shows the cluster of batches with top yields (notice there’s a gap in X 1 between them and remaining batches, as seen in Figure 1). • The finding—small amounts of X 3 and X 6 type defects are essential for high yields and quality. • Besides, back to Fig. 2, we can see X 15’s relationship with X 1/X 2. 9

Our Conclusion For VLSI Chip Problem • Small ranges of X 3, X 6 close to (but not equal to) zero, together with the lower range of X 15 provide necessary conditions for high yields and quality. • Fig. 9 shows the result of constraining only X 1 and the resulting gap in X 15. • Fig. 10 shows only constraining X 2 does not yield a gap in X 15. 10

Other Insights and The Lesson We Learned From VLSI Example • Fig. 11 shows that except for two batches, the others all have very high X 2. So we isolate these two batches in Fig. 12—and find that the high yields but lower quality may be due to ranges of X 6, X 13, X 14, X 15. • So it suggests that we can further partition this multivariate problem into sub-problems pertaining to individual objectives. 11

The Economic Model Example • This example illustrates how to use interior point algorithm with the model, to do trade-off analyses, understand the impact of constraints, and in some cases do optimizations. • Interior point algorithm—We can use it to find a point that is interior to a region, and satisfies all the constraints simultaneously, so in this case, it represents a feasible economic policy for a country. • It is done interactively by sequentially choosing values of the variables. (Fig 13) 12

Result of Choosing The First Variable • Once a value of the first variable is chosen(Agriculture output), the dimensionality of the region is reduced by one. We can see the relationship between Agriculture and Fishing (Low ranges corresponds to each other). • So it’s possible to find a policy that favors Agriculture but not favoring Fishing and vice versa. • Mining and Fishing (see from the lower lines of Fishing in Fig. 13). We find the competition between them. 13

Neighborhood • In Fig. 15, a 20 -dimensional model. The intermediate curves provide useful insights. • The steep strips in X 13, X 14 and X 15. These 3 are critical variables, where the point is bumping the boundary. 14

Boundary Point and Exterior Point • Boundary point—If the polygonal line is tangent to anyone of the intermediate curves then it represents a boundary point. • Exterior point—If it crosses any intermediate curves. • Exterior point enables us to see the first variable for which the construction failed and what is needed to make corrections. • By changing variables interactively, we can discover sensitive regions and other patterns. 15

Before We Come To Conclusion • Is this model merely a model, or is it used (with the “intuitive” functionalities and high interactivity) in any software products? • Is this model accurate enough? • Is it sufficient to come to any conclusion about a problem using this technique when data set is very large? • How to become a skillful detective? Can any software substitute people? 16

Conclusion • Each multivariate dataset and problem has its own “personality” , so it requires substantial variations in the discovery scenarios and calls for considerable ingenuity ( a characteristic of a detective). • An effort of automating the exploration process is under way. It will have a number of new features, like intelligent agents, which will learn from gathered experiences. 17