- Slides: 26
A Bayesian Approach to Online Performance Modeling for Database Appliances using Gaussian Models 报告人：郭爽 时间： 2014. 4. 25
• ICAC 2011 • Muhammad Bilal Sheikh • Cheriton School of Computer Science, University of Waterloo, Canada
Introduction • Accurately predicting response times of DBMS queries is necessary for meeting service level agreements (SLAs) and maintaining peak performance of DBMS • two challenges: ① a database workload typically consists of many concurrently running queries and an accurate model needs to capture their interactions
Introduction ②in dynamic cloud computing environments， workload, data, and physical resources can change frequently, on-the-ﬂy. • 之前模型的不足： ① 需要丰富领域知识、针对特定的DBMS ② 不考虑请求并发时彼此之间的影响 ③cannot be updated online
Goal • build efﬁcient and highly accurate online query response time models for database appliances • take into account the interactions among concurrently running queries • in the face of changes in the workload, database or physical resource allocation build efﬁcient and highly accurate online query response can dynamically and robustly adapt • without the need for additional sampling experiments.
The function f(. ) can be based on the distribution of queries in the mix, or just on the total number of queries in the mix (l). • build two separate models: 1) response time model: a model to predict query response times for each query type of interest 2) conﬁguration model: a model to predict the response time model’s parameters for different conﬁgurations.
Generating Training Workloads • A simple sampling approach that guarantees relatively good coverage of the space of possible query mixes will provide a good starting point for an ofﬂine trained model.
• Uniform Sampling： 对于每种查询类别，在范围[0, M]（ M is the multi-programming limit (MPL) of the DBMS and is speciﬁed by the DBA. ）中均匀取样，构成负载分布。
• Workload Characterization Based Sampling look at the query mix from two perspectives: the overall load and the query types contributing to that load
Gaussian Response Time Models Linear Gaussian Models Non-Linear Gaussian Process Models
Linear Gaussian Models •
2. Linear Query Mix Model(LQMM)
Non-Linear Gaussian Process Models • Query performance varies in a complex nonlinear way with varying the query mix, the hardware, and the DBMS conﬁguration. For example, if a query involves a join, the behavior of this join varies signiﬁcantly and in a non-linear way depending on whether the data ﬁts in memory or needs to be read from disk. 由于非线性参数模型很难，本文use non-parametric models--Gaussian Processes (GP) 高斯过程指的是一组随机变量的集合，这个集合里面的任意有限个随机变量都服 从联合高斯分布 1. 2. 3. 4. Gaussian Process Load Model (GPLM)：variant of LLM Gaussian Process Mix Model (GPMM)：variant of LQMM Gaussian Process Mix + Load Model (GPMLM) Gaussian Process Conﬁguration Model (GPCM)
• Bayesian Inference with Gaussian Processes In Bayesian inference the probability of a hypothesis (posterior probability) depends on the likelihood of the hypothesis (based on observed data) and the prior belief (prior probability).
Kernel Functions： 1. Squared exponential function (SE) with parameters 2. Rational quadratic function (RQ) with parameters 选择完均值函数和核函数后，下一步就是去使用训练样本以及下面的公式利用共 个梯度法去求均值函数和核函数中的参数（超参数）
Online Model Adaptation 1. Adding/Removing A Sample (Rank-1 Updates) If we have an ofﬂine trained model for n samples我们想 添加一个样本. 就需要重新计算 （时间复 杂度 ）
2. Data Replacement Policy To update the models of different query types online： ① maintain a set of recently observed response times for each query type ② replace old samples in this set with newly observed samples and cap the number of samples maintained for each query type to C ③ 如果新样本的数量<C，通过Rank-1 Updates 方式添加新样本 知道样本数量=C ④ 用新的样本替换旧样本进行预测
Experimental Evaluation 实验环境： using the average percentage error (APE) 衡量模型的准确度
• Effect of Buffer Pool Size