Study of dimensionality reduction and its impact on

  • Slides: 1
Download presentation
Study of dimensionality reduction and its impact on performance and results quality in parallel

Study of dimensionality reduction and its impact on performance and results quality in parallel text processing Marcin Pietroń, Maciej Wielgosz, Rafał Frączek, Paweł Russek, Kazimierz Wiatr ACK Cyfronet AGH, ul. Nawojki 11, Cracow, Poland University of Science and Technology, al. Mickiewicza 30, Cracow, Poland INTRODUCTION Dimensionality reduction is a one of the most popular method of decreasing the complexity of data processing. Such algorithms like Principal Component Analysis, Singular Value Decomposition or Random Projection are very often used in big data analysis. The presented problem is a part of authors’ research on designing a system capable of comparing numerous text documents in a reasonable time. This article examines the relation between number of dimensions and quality achieved in document clustering process. The paper also concentrates on increasing effectiveness of presented algorithms by incorporating multi-core processors and GPU hardware platforms. ARCHITECTURE AND RESULTS Conclusion and future work Results described above show that reducing dimensionality doesn’t worsen significantly the quality of clustering documents. This however depends on a profile of data. Further research should focus on running tests on wider spectrum of test data and incorporating to the system other algorithms e. g. Random Projection. The most challenging aspect of research is formulating a relationship between a profile of data, number of singular values and computational precision and it will be a subject of further research. REFERENCES 1. M. Andrecut. Parallel GPU Implementation of Iterative PCA Algorithms. Journal of Computational Biology, 16(11), Nov. 2009. 2. Tom Howley, Michael Madden, Marie. Louise O’Connell, Alan Ryder, “The effect of principal component analisys on machine learning accuracy with high dimensional spectral data”, Knowledge-Based Systems, v. 19, issue 5, pp. 363– 370, September 2006. 3. Michael P. Holmes, Alexander G. Gray and Charles Lee Isbell, Jr QUIC-SVD: Fast SVD Using Cosine Trees. Advanced in Neural Information Processing Systems, pp. 673 -680, 2008.