A Bayesian Approach to Online Performance Modeling for

  • Slides: 26
Download presentation
A Bayesian Approach to Online Performance Modeling for Database Appliances using Gaussian Models 报告人:郭爽

A Bayesian Approach to Online Performance Modeling for Database Appliances using Gaussian Models 报告人:郭爽 时间: 2014. 4. 25

 • ICAC 2011 • Muhammad Bilal Sheikh • Cheriton School of Computer Science,

• ICAC 2011 • Muhammad Bilal Sheikh • Cheriton School of Computer Science, University of Waterloo, Canada

Introduction • Accurately predicting response times of DBMS queries is necessary for meeting service

Introduction • Accurately predicting response times of DBMS queries is necessary for meeting service level agreements (SLAs) and maintaining peak performance of DBMS • two challenges: ① a database workload typically consists of many concurrently running queries and an accurate model needs to capture their interactions

Introduction ②in dynamic cloud computing environments, workload, data, and physical resources can change frequently,

Introduction ②in dynamic cloud computing environments, workload, data, and physical resources can change frequently, on-the-fly. • 之前模型的不足: ① 需要丰富领域知识、针对特定的DBMS ② 不考虑请求并发时彼此之间的影响 ③cannot be updated online

Goal • build efficient and highly accurate online query response time models for database

Goal • build efficient and highly accurate online query response time models for database appliances • take into account the interactions among concurrently running queries • in the face of changes in the workload, database or physical resource allocation build efficient and highly accurate online query response can dynamically and robustly adapt • without the need for additional sampling experiments.

Solution Overview

Solution Overview

The function f(. ) can be based on the distribution of queries in the

The function f(. ) can be based on the distribution of queries in the mix, or just on the total number of queries in the mix (l). • build two separate models: 1) response time model: a model to predict query response times for each query type of interest 2) configuration model: a model to predict the response time model’s parameters for different configurations.

Generating Training Workloads • A simple sampling approach that guarantees relatively good coverage of

Generating Training Workloads • A simple sampling approach that guarantees relatively good coverage of the space of possible query mixes will provide a good starting point for an offline trained model.

 • Uniform Sampling: 对于每种查询类别,在范围[0, M]( M is the multi-programming limit (MPL) of the

• Uniform Sampling: 对于每种查询类别,在范围[0, M]( M is the multi-programming limit (MPL) of the DBMS and is specified by the DBA. )中均匀取样,构成负载分布。

 • Workload Characterization Based Sampling look at the query mix from two perspectives:

• Workload Characterization Based Sampling look at the query mix from two perspectives: the overall load and the query types contributing to that load

Gaussian Response Time Models Linear Gaussian Models Non-Linear Gaussian Process Models

Gaussian Response Time Models Linear Gaussian Models Non-Linear Gaussian Process Models

Linear Gaussian Models •

Linear Gaussian Models •

2. Linear Query Mix Model(LQMM)

2. Linear Query Mix Model(LQMM)

Non-Linear Gaussian Process Models • Query performance varies in a complex nonlinear way with

Non-Linear Gaussian Process Models • Query performance varies in a complex nonlinear way with varying the query mix, the hardware, and the DBMS configuration. For example, if a query involves a join, the behavior of this join varies significantly and in a non-linear way depending on whether the data fits in memory or needs to be read from disk. 由于非线性参数模型很难,本文use non-parametric models--Gaussian Processes (GP) 高斯过程指的是一组随机变量的集合,这个集合里面的任意有限个随机变量都服 从联合高斯分布 1. 2. 3. 4. Gaussian Process Load Model (GPLM):variant of LLM Gaussian Process Mix Model (GPMM):variant of LQMM Gaussian Process Mix + Load Model (GPMLM) Gaussian Process Configuration Model (GPCM)

 • Bayesian Inference with Gaussian Processes In Bayesian inference the probability of a

• Bayesian Inference with Gaussian Processes In Bayesian inference the probability of a hypothesis (posterior probability) depends on the likelihood of the hypothesis (based on observed data) and the prior belief (prior probability).

Kernel Functions: 1. Squared exponential function (SE) with parameters 2. Rational quadratic function (RQ)

Kernel Functions: 1. Squared exponential function (SE) with parameters 2. Rational quadratic function (RQ) with parameters 选择完均值函数和核函数后,下一步就是去使用训练样本以及下面的公式利用共 个梯度法去求均值函数和核函数中的参数(超参数)

Online Model Adaptation 1. Adding/Removing A Sample (Rank-1 Updates) If we have an offline

Online Model Adaptation 1. Adding/Removing A Sample (Rank-1 Updates) If we have an offline trained model for n samples我们想 添加一个样本. 就需要重新计算 (时间复 杂度 )

2. Data Replacement Policy To update the models of different query types online: ①

2. Data Replacement Policy To update the models of different query types online: ① maintain a set of recently observed response times for each query type ② replace old samples in this set with newly observed samples and cap the number of samples maintained for each query type to C ③ 如果新样本的数量<C,通过Rank-1 Updates 方式添加新样本 知道样本数量=C ④ 用新的样本替换旧样本进行预测

Experimental Evaluation 实验环境: using the average percentage error (APE) 衡量模型的准确度

Experimental Evaluation 实验环境: using the average percentage error (APE) 衡量模型的准确度

 • Effect of Buffer Pool Size

• Effect of Buffer Pool Size