Multimedia Workloads versus SPEC Benchmarks Christopher Martinez Mythri
Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio
Outline Motivation Multimedia Workloads Cycles Per Instruction Branch Prediction Cache Performance Conclusion
Motivation The common workloads for the home user now focus upon entertainment For the home user entertainment performance is the selling point There are many media benchmarks but can SPEC benchmarks give some insight to entertainment applications?
Objective Understand the performance characteristics of multimedia workloads Compare them against SPEC CPU 2000
Multimedia Workloads Codecs used include: mp 3, aac, MPEG 2(dvd), windows media(dvd, HD), and MPEG 4 Examine multimedia playback and creation (decoding/encoding)
Multimedia Workloads Decoding n n MP 3/AAC – i. Tunes, Winamp, Real. Player Video – Windows Media Player Encoding n n n MP 3 – i. Tunes, Windows Media Player, Real. Player AAC – i. Tunes, Real. Player Video – Windows Encoder
Multimedia Workloads MP 3 files used a bitrate of 128 kbps AAC files used a bitrate of 128 kbps Video files used presets from applications Video was a TV capture of a football game Audio encoding was done on Beethoven Symphonie Pastoraie Audio playback was done on “Boulevard Of Broken Dreams” by Greenday
Performance based on common measurements: cycles per instruction (CPI), uops per instruction, branch prediction, cache hit rate Use on chip performance counters on the Pentium 4 processor Use Vtune to capture the on chip counters
CPI Our test were performed on a Pentium 4 which is capable of executing 6 micro operation per second (uops) Audio decoding CPI --- 1. 85 - 3. 55 Audio encoding CPI --- 1. 40 - 2. 11 Video decoding --- 1. 96 - 2. 56 Video encoding --- 1. 82 and 2. 08 Integer SPEC 2000 CPI --- 1. 16 - 8. 54 Floating SPEC 2000 CPI --- 4. 72 – 8. 31
CPI
uops Audio decoding uops --- 1. 38 – 1. 71 Audio encoding uops --- 1. 30 – 1. 41 Video decoding uops --- 1. 28 – 1. 43 Video encoding uops --- 1. 29 – 1. 31 SPEC 2000 integer uops --- 1. 29 – 2. 11 SPEC 2000 float uops --- 1. 32 – 2. 48
Branch Prediction SPEC benchmarks have a large percentage of branch instructions than media applications Audio decoding -- 12% branch instructions Audio encoding -- 7% branch instructions Video decoding & encoding -- 8% branch instructions SPEC -- 13% - 20% branch instructions
Branch Prediction Media and SPEC benchmark exhibit a high branch prediction rate n Prediction rates of 94% and higher in most cases With media application there is a high correlation between misprediction and CPI
Branch Prediction
Cache Performance The Pentium 4 processor has two level cache n 1 st level 16 KB & 2 nd level 1 MB Multimedia deals with data in a linear fashion n n Audio/Video must be played in order This sequential data should allow for high hit rates Since SPEC benchmark covers a wide application range not all benchmarks will resemble the media hit rates
1 st Level Cache Performance For 1 st level cache hit rates the multimedia had hit rates of 93% and higher Half of the SPEC benchmarks had similar 1 st level hit rates n Remainder of the SPEC benchmarks were considerable worst performance
1 st Level Cache Performance
2 nd Level Cache Performance For all multimedia application 2 nd level cache had a hit rate of 99. 8% or greater Only 5 of the 14 SPEC benchmarks had similar 2 nd level hit rates n Most of the remaining SPEC benchmarks had 98% or higher but 2 SPEC had 86%
2 nd Level Cache Performance
Conclusion Audio and video have similar range in CPI, uops per instruction, and uops per cycle SPEC programs exhibit performance characteristics in a much larger range than media. i. e SPEC suites are very diverse
Conclusion Both audio and video are comparable to SPEC in 2 nd level cache performance Half of the SPEC benchmarks resemble audio and video in 1 st level cache SPEC benchmarks can give some insight into performance of media applications
CPI i. Tunes MP 3/AAC Decode WMV DVD/HD Decode Video Encode Pass 1/Pass 2 1. 85 / 1. 98 1. 96 / 2. 14 2. 02 / 1. 82 Real. Player MP 3 Encode 2. 02 i. Tunes MP 3 Encode gcc / crafty / praser 2. 07 1. 81 / 1. 86 bzip 2 2. 06 Encode WMP MP 3/ Real AAC Encode i. Tunes AAC gzip/ vortex/ gap 1. 66 / 1. 71 1. 40 1. 52 / 1. 32 / 1. 40
CPI Winamp MP 3 Decode Real MP 3 Decode vpr Twolf 3. 11 3. 55 3. 17 3. 36 Winamp AAC decode MPEG 2 MPEG 4 Real AAC Decode eon 2. 43 2. 38 2. 59 2. 82 2. 53
uops/instr MP 3 Decode Real. Player & i. Tunes 1. 54 AAC Decode Real. Player/i. Tunes vortex 1. 57/1. 61 1. 60 parser gap twolf 1. 52 1. 53 1. 56
uops Encode MP 3 WMP/ Real/ i. Tunes Encode AAC Real / i. Tunes Winamp AAC Decode MPEG 2 / MPEG 4/ WMV DVD 1. 49 / 1. 38 / 1. 41 1. 38 / 1. 30 1. 38 1. 43 / 1. 37 / 1. 28 WMV HD / pass 1 / pass 2 gzip / mcf / vpr art / crafty / perlbmk bzip 2 1. 31 / 1. 29 1. 35 / 1. 29 / 1. 46 1. 32 / 1. 31 / 1. 48 1. 42
uops Besides just similar number of uops one can also look at the cycles to complete the uop i. Tunes AAC Encode gcc gzip Cycle/uop 1. 08 1. 05 1. 13 CPI 1. 40 1. 81 1. 52
uops Decode Real MP 3/AAC Decode winamp MP 3 / AAC vpr / twolf Decode i. Tunes AAC / MP 3 2. 30 1. 80 2. 17 1. 23 / / 1. 80 1. 76 2. 15 1. 20 parser / eon Pass 1 / pass 2 bzip 2 Encode MP 3 Real / i. Tunes 1. 22 / 1. 20 1. 59 / 1. 42 1. 44 1. 47 / 1. 47
Branch Prediction Mispredict/Instr Winamp MP 3 Real MP 3 % of Prediction Rate branches 12. 8 94. 92 9. 41 91. 50 i. Tunes MP 3 Winamp AAC Real AAC i. Tunes AAC 11. 76 16. 85 13. 02 12. 81 0. 0025 0. 0053 0. 0060 0. 0024 97. 84 96. 88 95. 26 98. 16 Audio Decoding 0. 0065 0. 0080
Branch Prediction Mispredict/Instr WMP MP 3 % of Prediction Rate branches 9. 08 96. 96 Real MP 3 i. Tunes MP 3 Real AAC i. Tunes AAC 10. 42 0. 53 7. 74 7. 68 0. 0043 0. 0055 0. 0043 0. 0035 95. 86 94. 87 94. 52 95. 36 Audio Encoding 0. 0028
Branch Prediction % of Prediction Rate branches MPEG 2 (DVD) 8. 91 92. 93 MPEG 4 8. 28 96. 76 Mispredict/Instr WMV DVD WMV HD Pass 1 5. 12 9. 89 6. 31 95. 86 96. 30 94. 69 0. 0021 0. 0018 0. 0033 WMV HD Pass 2 9. 28 95. 46 0. 0042 Video 0. 0063 0. 0027
Branch Prediction gcc 21. 84 96. 91 0. 0067 gzip 19. 10 94. 89 0. 0097 mcf 24. 25 95. 78 0. 0102 vortex 21. 22 99. 75 0. 0005 vpr 16. 57 92. 86 0. 0118 art 14. 21 99. 21 0. 0011 equake 11. 00 98. 21 0. 0020 parser 20. 80 96. 65 0. 0074 crafty 15. 76 94. 20 0. 0091 eon 13. 45 97. 12 0. 0039 gap 17. 51 98. 57 0. 0025 perlbmk 21. 18 98. 56 0. 0031 bzip 2 14. 83 94. 35 0. 0084 twolf 16. 48 88. 39 0. 0019
Branch Prediction The high correlation between branch prediction and CPI can give improvement insight When new CPU enhancements show improvement in SPEC, a similar or higher gain will be observed in multimedia applications
- Slides: 32