Vector and Matrix Operations Evaluated with UserDefined Functions
- Slides: 22
Vector and Matrix Operations Evaluated with User-Defined Functions Data Warehousing
Outline • • Research overview Motivation SQL limitations UDFs Scalar and aggregate UDFs Experiments Conclusions
Research overview: Relational database systems • Query optimization: query rewriting, linear recursive queries, indexing • Integration of statistical and machine learning algorithms: linear models, statistical tests, decision trees, Bayesian classifiers • OLAP: cubes, multi-level aggregations, pivoting, compression • Database cleaning: detecting and repairing errors • Locking in a parallel DBMS
Research overview: Machine learning • • Clustering: K-means, EM, streams Dimensionality reduction: PCA and Max Llh FA Regression: linear, logistic, non-linear models Bayesian classifiers: Naïve, Nets, MDA Association rules for prediction Medical data analysis Other topics: image mining, neural nets
Motivation • Perform statistical and machine learning analysis inside a DBMS • Users extract data sets to specialized statistical or mathematical tools • SQL is the standard language • A DBMS is a complex software system • Focus: DBMS architecture and scalability
SQL limitations • • • Relational model Limited array support Run-time overhead Set-oriented Not an imperative programming language
UDF advantages • • Programmed in C Called in a SELECT statement API provided by DBMS User does not need modifying DBMS internal code
UDFs limitations • • • In general, simple data types OS and DBMS architecture dependent Limited stack memory space Heap memory only for aggregate UDFs No I/O capability Null handling Parallel execution Compile-time memory allocation Not uniform across different DBMSs
Alternatives to UDFs • Inside DBMS – Internal C code in the DBMS – SQL code generation: arithmetic expressions or SQL existing aggregations • Outside DBMS – Statistical or machine learning packages – Statistical libraries called in C++/Java
Scalar UDFs • • Vectorial sum Dot product Distance argmin
Scalar UDFs application • Regression equations – Linear – Binary logistic • Distance computation in clustering or nearest neighbor classifiers • Multivariate pdfs
Aggregate UDFs • Best for functions that require a table scan on an entire table • Partition induced on data set with groups • Sufficient statistics
Aggregate UDFs for linear models • Descriptive models • Predictive models • Statistical tests
Sufficient statistics: SQL and C
Aggregate UDF optimizations • • Arrays Diagonal/triangular/full matrix Parameter passing on the stack Full parallel execution
Experiments • Compare SQL and UDFs • SQL standard aggregations on pivoted version of input table • Teradata Parallel DBMS
UDF: sum, argmin • Much faster than standard SQL aggregation • Slightly slower than arithmetic expressions
UDF: sufficient statistics alternatives in SQL • Simulate matrices with one entry per row with pivoted table • Aggregation list with all matrix on one row • UDF
UDF aggregate • One std aggregation significantly slower • UDF is faster than arithmetic expressions
UDF run-time execution • Scalar UDF bottleneck: disk I/O • Aggregate UDF bottleneck: memory
UDF scalability • Linear in n • Linear in d for scalar UDF • almost “flat” in d for aggregate UDF
Conclusions • • • Extend DBMS with mathematical operators UDF called in SQL queries Group-by capability Great performance Several UDF limitations can be solved
- How to solve evaluating functions
- Evaluating functions and operations on functions
- Ready to be evaluated pqeb
- Small business subcontracting is only evaluated when
- Small business subcontracting is only evaluated
- Evaluating channel member performance
- A form requesting the purchase of merchandise.
- Directed line segment definition geometry
- Vector unitario formula
- Vector resolution examples
- Position vector r
- Matriks varian kovarian
- Vector overlay operations
- Vector functions and space curves
- Matrix vector multiplication by mapreduce
- General vector
- Matrix vector multiplication mapreduce
- Unary matrix
- Matrix vector multiplication by mapreduce
- Diagonal matrix
- Skew hermitian matrix
- Dissimilarity matrix in data mining
- Hermitian matrix and skew hermitian matrix