What Choices Make A Killer Video Processor Architecture

  • Slides: 17
Download presentation
What Choices Make A Killer Video Processor Architecture? Jonah Probell jonah@ultradatacorp. com Ultra Data

What Choices Make A Killer Video Processor Architecture? Jonah Probell jonah@ultradatacorp. com Ultra Data Corp www. ultradatacorp. com © Copyright 2004 Jonah Probell

Outline • Overview of Ultra Data UD 3000 • Software programmability • Parallelism –

Outline • Overview of Ultra Data UD 3000 • Software programmability • Parallelism – VLIW – SIMD – Multiprocessing • Appropriate use of on- and off-chip memory – Optimal organization of data structures in DRAM • Deterministic performance – 5 -port regfile – 2 -port on-chip memory – DMA controller instead of caches © Copyright 2004 Jonah Probell 2

Nobody’s Video Decoder Chip SDRAM Host / audio processor SDRAM controller Video postprocessing Optics

Nobody’s Video Decoder Chip SDRAM Host / audio processor SDRAM controller Video postprocessing Optics sled I 2 C, SATA, DVD optical timers interface Peripheral bus bridge high-speed interconnect Video Decode Processor SATA & I 2 C busses peripheral bus Audio output I 2 S / SPDIF / raw Video output S-video / raw 24 -bit RGB or 8/16 -bit YCr. Cb Audio / Video DACs © Copyright 2004 Jonah Probell 3

The Ultra Data UD 3000 2 -port DMEM … 2 -port DMEM FIFO …

The Ultra Data UD 3000 2 -port DMEM … 2 -port DMEM FIFO … FIFO Test & Set System Bus Bridge Crossbar Switch Fabric Outer Loop Processor 0 Outer Loop Processor 1 instruction extensions Inner Loop Processor 0 © Copyright 2004 Jonah Probell Inner Loop Processor 1 Inner Loop Processor 2 Smart 2 -D DMA Controller 4

H. 264 Main Profile Decode OLP 0 CA VLC DMA ctrl ILP 0 CABAC

H. 264 Main Profile Decode OLP 0 CA VLC DMA ctrl ILP 0 CABAC load prediction source store block interpolation ILP 1 OLP 1 ILP 2 © Copyright 2004 Jonah Probell inverse apply transform deltas Deblocking thresholds Deblocking Filter 5

The Inner Loop Processor Switch Fabric Control Unit 32 • 32 -bit RISC •

The Inner Loop Processor Switch Fabric Control Unit 32 • 32 -bit RISC • Program Counter • Loads & Stores IMEM Vector Unit 32 • 64 -bit SIMD data • Multiply Acc • Data packing © Copyright 2004 Jonah Probell 3 -port 32 Regfile Data Aligner 5 -port 64 Regfile 6

Video Codec Standards ITU-T standards H. 261 H. 263 ITU-T / MPEG joint standards

Video Codec Standards ITU-T standards H. 261 H. 263 ITU-T / MPEG joint standards H. 262 / MPEG-2 MPEG standards MPEG-1 On 2 Technologies standards H. 264 / MPEG-4 Part 10 AVC MPEG-4 VP 3 VP 4 VP 5 VP 6 Div. X Networks standard Div. X Microsoft standard Windows Media Video 1984 1986 1988 © Copyright 2004 Jonah Probell 1990 1992 1994 1996 1998 2000 2002 7 2004

VLIW Parallelism load multiply shift store add branch data memory program sequencer regfile ALU

VLIW Parallelism load multiply shift store add branch data memory program sequencer regfile ALU +-x &|! >> << load store branch multiply shift add VLIW DSP program sequential DSP program © Copyright 2004 Jonah Probell 8

SIMD Parallelism frame of macroblocks © Copyright 2004 Jonah Probell macroblock of pixels 8

SIMD Parallelism frame of macroblocks © Copyright 2004 Jonah Probell macroblock of pixels 8 x 8 block 4 x 4 block of pixels 9

Multiprocessor Parallelism symmetric parallel multiprocessing video codec system motion estimation prediction pipelined multiprocessing video

Multiprocessor Parallelism symmetric parallel multiprocessing video codec system motion estimation prediction pipelined multiprocessing video codec system CPU 0 motion estimation prediction CPU 0 transform & compression CPU 1 deblocking CPU 2 software hardware © Copyright 2004 Jonah Probell 10

Data Bandwidths SDRAM temporary data storage bitstrea m source video chip © Copyright 2004

Data Bandwidths SDRAM temporary data storage bitstrea m source video chip © Copyright 2004 Jonah Probell display device 11

DRAM Optimal Data Ordering DRAM : 1 k byte rows Frame mapped to DRAM

DRAM Optimal Data Ordering DRAM : 1 k byte rows Frame mapped to DRAM rows as a C-style twodimentional array © Copyright 2004 Jonah Probell Frame mapped to DRAM rows as square groups 12

Deterministic Performance © Copyright 2004 Jonah Probell 13

Deterministic Performance © Copyright 2004 Jonah Probell 13

The Inner Loop Processor Switch Fabric Control Unit 32 • 32 -bit RISC •

The Inner Loop Processor Switch Fabric Control Unit 32 • 32 -bit RISC • Program Counter • Loads & Stores IMEM Vector Unit 32 • 64 -bit SIMD data • Multiply Acc • Data packing © Copyright 2004 Jonah Probell 3 -port 32 Regfile Data Aligner 5 -port 64 Regfile 14

The Ultra Data UD 3000 2 -port DMEM … 2 -port DMEM FIFO …

The Ultra Data UD 3000 2 -port DMEM … 2 -port DMEM FIFO … FIFO Test & Set System Bus Bridge Crossbar Switch Fabric Outer Loop Processor 0 Outer Loop Processor 1 instruction extensions Inner Loop Processor 0 © Copyright 2004 Jonah Probell Inner Loop Processor 1 Inner Loop Processor 2 Smart 2 -D DMA Controller 15

A Killer Video Processor Architecture • Software programmability • Parallelism – VLIW – SIMD

A Killer Video Processor Architecture • Software programmability • Parallelism – VLIW – SIMD – Multiprocessing • Appropriate use of on- and off-chip memory – Optimal organization of data structures in DRAM • Deterministic performance – 5 -port regfile – 2 -port on-chip memory – DMA controller instead of caches © Copyright 2004 Jonah Probell 16

Acknowledgements This presentation is © Copyright 2004 Jonah Probell ALL RIGHTS RESERVED. Certain information

Acknowledgements This presentation is © Copyright 2004 Jonah Probell ALL RIGHTS RESERVED. Certain information for this document was derived from publicly available documents of Ultra Data Corp. , UB Video Inc. , On 2 Technologies Inc. , and Wikipedia. All trademarks mentioned in this document are property of their respective owners and are hereby acknowledged. Jonah Probell jonah@ultradatacorp. com (781) 209 -0886 © Copyright 2004 Jonah Probell 17