Variable Clustering 1 Variable Clustering Variable clustering finds

  • Slides: 13
Download presentation
Variable Clustering 1

Variable Clustering 1

Variable Clustering Variable clustering finds groups of variables that are as correlated as possible

Variable Clustering Variable clustering finds groups of variables that are as correlated as possible among themselves and as uncorrelated as possible with variables in other clusters. The basic algorithm is binary and divisive. All variables start in one cluster. A principal components analysis is done on the variables in the cluster. 2

If the second eigenvalue is greater than a specified threshold (in other words, there

If the second eigenvalue is greater than a specified threshold (in other words, there is more than one dominant dimension), then the cluster is split. The PC scores are then rotated obliquely so that the variables can be split into two groups. This process is repeated for the two child clusters until the second eigenvalue drops below the threshold. 3

The VARCLUS Procedure PROC VARCLUS DATA=SAS-data-set <options>; VAR variables; RUN; 4

The VARCLUS Procedure PROC VARCLUS DATA=SAS-data-set <options>; VAR variables; RUN; 4

Variable Clustering, the develop data set Mortgage Balance Number of Checks Credit Card Balance

Variable Clustering, the develop data set Mortgage Balance Number of Checks Credit Card Balance Checking Deposits Teller Visits Age 5

Where we are proc contents data=d. imputed; run; 6

Where we are proc contents data=d. imputed; run; 6

Get name of numeric input variables from dictionary. columns /*get names of numeric variables

Get name of numeric input variables from dictionary. columns /*get names of numeric variables in a macro variable note that leaving out brclus 5 gives us 4 indicator variables*/ proc sql; describe table dictionary. columns; select name into : inputs separated by " " from dictionary. columns where memname="IMPUTED" and libname="D" and name ^= "Ins" and name ^="brclus 5" and type="num" ; quit; %put &inputs; 7

Variable Clustering proc varclus data=d. imputed maxeigen=. 7 hi short; var &inputs ; title

Variable Clustering proc varclus data=d. imputed maxeigen=. 7 hi short; var &inputs ; title "Variable Clustering of Imputed Data Set"; run; title; 8

Use ODS to get some stuff. ods output clusterquality=summary rsquare=clusters; proc varclus data=d. imputed

Use ODS to get some stuff. ods output clusterquality=summary rsquare=clusters; proc varclus data=d. imputed maxeigen=. 7 short hi; var &inputs ; run; 9

proc print data=summary; run; 10

proc print data=summary; run; 10

Numerous possibilities for summarizing clusters. Principal Components Pick one variable: Based on subject matter

Numerous possibilities for summarizing clusters. Principal Components Pick one variable: Based on subject matter Statistics 11

proc print data=clusters; where numberofclusters=39; run; 12

proc print data=clusters; where numberofclusters=39; run; 12

One variable per cluster /* Pick one variable per cluster for the first 10

One variable per cluster /* Pick one variable per cluster for the first 10 The others are clusters of one variable */ %let reduced= MIPhone MICCBal Dep MM ILS MTGBal Income POS CD IRA brclus 1 Sav NSF Age Sav. Bal LOCBal NSFAmt Inv MIHMVal CRScore MIAcct. Ag Inv. Bal Dir. Dep CCPurc SDB Cash. Bk Acct. Age In. Area ATMAmt DDABal DDA brclus 2 CC HMOwn Dep. Amt Phone ATM LORes brclus 4; 13