Database Benchmarking Miseries Myths and Misconceptions Bert Scalzo
Database Benchmarking: Miseries, Myths and Misconceptions Bert Scalzo, Ph. D. & Oracle ACE Website: www. bertscalzo. com Email: work – bert@bertscalzo. com personal – bertscalzo 2@gmail. com PASS July 25 th, 2018
Books by Author New benchmarking book Q 4 2018
Why Do We Need Benchmarking? Management asks DBA, “What if we? ”: • Virtualize our databases • Consolidate our databases • Move to open source database • Buy new or upgrade database servers • Move some or all databases to the cloud – What cloud provider – What size VM image – What type of storage (disks vs. NVMe) 3
Most DB Benchmarking Efforts Fail Miserably
Why DB Benchmarking Fails • Insufficient time allocated – 1 -3 weeks under-estimated – 1 -2 months necessary (or more) • Wrong skill sets assigned – DBA – i. e. strong DB knowledge – Good OS, network & storage skills • Insufficient hardware/software • Wrong setup / configuration – Often out-of-box defaults (wrong!) • Lack of benchmarking knowledge – Almost no one reads the specs!!! – Not sure what to measure and why – No experience interpreting results
DB Benchmarking Requires Preparation
DB Benchmarking is NOT Easy! No tool can think for you or do your job – no big red easy button 8
DB Benchmarking Requires Tools & Expert Database Monitor Expert – must know database, OS, hardware, software, tuning, & have read the benchmark specs Database Benchmarking Tool: stress the database & make it sweat – that’s all they do (no built-in smarts) Expert decides what treadmill inclination, what speed, what duration, what to monitor, how to diagnose, how to treat, etc. … ?
DB Benchmarking – Common, Simple Failures Wrong person trying to do the heavy lifting Right person trying to do tests without right monitoring tools Not familiar with benchmark specs and/or how well tool adheres to it
Know the Industry Standard Benchmarks • TPC-C older OLTP benchmark basic “order entry” type app • TPC-H basic data warehousing 22 queries – star schema=NO • TPC-E newer OLTP benchmark simulates online brokerage firm • TPC-DS newer data warehousing 99 queries – star schema=YES • TPC-DI data integration/ETL brokerage firm into DW • TPC-VMSvirtualized database standard DB tests on VM’s
Know Thy Test If you don’t know these, how can you choose good benchmark tool or DB parameters ? ? ? http: //tpc. org/tpcc/spec/tpcc_current. pdf
Understand the DB Design – TPC-C
Understand the DB Design – TPC-H
Create Data Model if Unsure (TPC-H)
Know What’s Allowed
Understand the DB Design – TPC-E
Understand the DB Design – TPC-DS – Part 1
Understand the DB Design – TPC-DS – Part 2
Free Benchmarking Tools • Hammer. DB (formerly Hammer. ORA) – Includes 2 benchmarks: TPC-C and TPC-H – http: //www. hammerdb. com • Swingbench – Includes 6 benchmarks: Order. Entry, Sales. History, TPC-DS Like, JSON, Calling. Circle and Stress. Test – http: //dominicgiles. com/swingbench. html • Benchmark Factory (freeware) – Inlcudes 3 benchmarks: TPC-C, TPC-E and TPC-H – https: //www. toadworld. com/m/freeware/555 – NOTE: only commercial version on Quest website 20
Top-10 Benchmarking Misconceptions
Shown on next page
SQL Server on Linux or Windows The ext file systems normally records when a file was last modified and last accessed. We don’t need to know access time for Oracle files as background programs open and access the files until shutdown. chattr +A file_name Actually, this can be done on Windows server as well! HKEY_LOCAL_MACHINESy stemCurrent. Control. SetControl File. System chattr –R +A directory_name Ntfs. Disable. Last. Access. Update=1 Edit /etc/fstab /dev/sda 6 /dev/sda 1 /dev/cdrom /dev/md 0 /dev/fd 0 none /dev/sda 5 / /boot /mnt/cdrom /u 01 /mnt/floppy /proc /dev/pts swap ext 2 iso 9660 ext 2 auto proc devpts swap defaults, noatime noauto, owner, ro defaults, noatime noauto, owner defaults, noatime gid=5, mode=620 defaults, noatime 1 1 0 0 0 0 1 2 0 0 0 0
For SQL Server not just memory allocated, but file locations, auto growth settings, CPU counts/utilization for VM’s, and highly recommend not use RAID-5 or 6
Remember, treadmill just to make database sweat
Does this really tell the whole picture, it looks better by maybe 25% ? ? ?
Wow – the difference is more like 533%
New book coming Q 4 2018 Database Benchmarking and Stress Testing (APress publishing)
Read Full Disclosure Reports for Similar Setups
Look Around – Find Closest Match Possible
Appendix B Will Show DDL for Optimal Design Very common technique to load very large data sets into ETL or staging tables, and then to do create table as select (CTAS) to populate the benchmark tables
Metrics – TPS VS. Avg. Response Time • Transactions per Second – Gets far too much attention – Meaningless to most users – Sort of like automobile RPM’s (how fast internal engine is working – not how fast car is moving or how soon we’ll arrive) – Misconception that TPS equates to IOPS (IO Operations per Second) – ignores database memory caching and logging • Average Response Time – Gets far too less attention – Meaningful to most users – Sort of like MPH (or KPH) (how fast car actually is moving – so infers how soon we’ll arrive or amount of fuel we’ll use) – When examined in conjunction with TPS, then a generally observable and clear pattern often emerges (next slide) …
Benchmark Pattern – True Point of Saturation Notice the line characteristics between roughly 750 and 850 concurrent users – for current configuration and optimization benchmark results interesting 7000 6000 5000 4000 Looking for inflection point where TPS is still increasing or just decreasing and close to max where average response time is below customer defined SLA 3000 2000 1000 0 100 200 300 400 TPS 500 600 700 800 900 1000 Avg Resp Time Common mistake to simply attempt maximize TPS – remember TPS is not IOPS
- Slides: 36