Parallel chisquare test Usman Roshan Chisquare test The

  • Slides: 4
Download presentation
Parallel chi-square test Usman Roshan

Parallel chi-square test Usman Roshan

Chi-square test • The chi-square test is a popular feature selection method when we

Chi-square test • The chi-square test is a popular feature selection method when we have categorical data and classification labels as opposed to regression • In a feature selection context we would apply the chi-square test to each feature and rank them chi -square values (or p-values) • A parallel solution is to calculate chi-square for all features in parallel at the same time as opposed to one at a time if done serially

Chi-square test • • • We have two random variables: – Label (L): 0

Chi-square test • • • We have two random variables: – Label (L): 0 or 1 – Feature (F): Categorical Null hypothesis: the two variables are Label=0 independent of each other (unrelated) Under independence – P(L, F)= P(D)P(G) Label=1 – P(L=0) = (c 1+c 2)/n – P(F=A) = (c 1+c 3)/n Expected values – E(X 1) = P(L=0)P(F=A)n We can calculate the chi-square statistic for a given feature and the probability that it is independent of the label (using the p-value). Features with very small probabilities deviate significantly from the independence assumption and therefore considered important. Contingency table Feature=A Feature=B Observed=c 1 Expected=X 1 Observed=c 2 Expected=X 2 Observed=c 3 Expected=X 3 Observed=c 4 Expected=X 4

Parallel GPU implementation of chisquare test in CUDA • The key here is to

Parallel GPU implementation of chisquare test in CUDA • The key here is to organize the data to enable coalescent memory access • We define a kernel function that computes the chisquare value for a given feature • The CUDA architecture automatically distributes the kernel across different GPU cores to be processed simultaneously.