AMD R 700 Series Processors AMD R 700

  • Slides: 23
Download presentation
AMD R 700 Series Processors

AMD R 700 Series Processors

AMD R 700 Series History • The AMD 700 chipset series – also known

AMD R 700 Series History • The AMD 700 chipset series – also known as the AMD 7 -Series Chipsets • A set of chipsets designed by ATI for AMD Phenom processors • GA - late 2007 to end of 2008

CPU verses GPU CPU • Typically use a basic load instructions for data loads

CPU verses GPU CPU • Typically use a basic load instructions for data loads GPU • Processes instructions one at a time • Typically uses texture-fetch instructions for data loads AND vertex-fetch for data loads • Located on the motherboard • Processes hundreds of instructions simultaneously • Typically located on an IO card attached to the BUS

AMD R 700 Series Processor – Design philosophy/rational of the AMD R 7000 –

AMD R 700 Series Processor – Design philosophy/rational of the AMD R 7000 – related to the good design policies studied in class

AMD R 700 Instructions Control-flow A program consists of two sections, control flow and

AMD R 700 Instructions Control-flow A program consists of two sections, control flow and clause. • Control flow instructions can initiate executions of the following: • ALU (by referring to an appropriate clause) • Texture-fetch • Vertex-fetch • Clause is a homogeneous group of instructions comprised of: • • • ALU Texture-fetch Vertex-fetch Local data share Memory read

AMD R 700 Registers • 128 General-purpose registers – 128 bits wide – Organized

AMD R 700 Registers • 128 General-purpose registers – 128 bits wide – Organized as four 32 -bit values • 512 Constant registers – 128 bits wide, – Organized as four 32 -bit values • Address Register

AMD R 700 Registers • Loop index – Initialized by software – Incremented by

AMD R 700 Registers • Loop index – Initialized by software – Incremented by hardware on each iteration of a loop • Integer Constant register – 96 bits wide (3 x 32) – GPU has read access – Main CPU has write access – Specified in the CF_CONST field of the CF_DWORD 1 microcode format for the current LOOP* instruction

AMD R 700 Addressing modes • Absolute • Loop-index-relative • Relative addressing

AMD R 700 Addressing modes • Absolute • Loop-index-relative • Relative addressing

AMD R 700 Operands • 3 source operands and 1 destination operand all of

AMD R 700 Operands • 3 source operands and 1 destination operand all of which have an absolute addressing mode enabling each to be accessed relative to address zero. • Float • Double • Half • Signed/unsigned Integer

AMD R 700 Operation Repertoire Arithmetic Operations on built-in integer, floating-point scalar, and vector

AMD R 700 Operation Repertoire Arithmetic Operations on built-in integer, floating-point scalar, and vector data types. • Add • Subtract • Multiply • Divide • Basic Linear Algebra Subroutines • Linear Algebra Package • Fast Fourier Transform • Math Transcendental • Random Number Generator Routines • Stream Processing backend for load balancing of computations between CPU and stream processing

AMD R 700 Features Instructions operate on 32 -bit or 64 -bit IEEE floating

AMD R 700 Features Instructions operate on 32 -bit or 64 -bit IEEE floating -point values and signed/unsigned integers. • Instruction set • • • Control-flow ALU Clause Vertex-fetch Texture-fetch Memory Read Data-Share Read/Write

AMD R 700 Instructions Memory Read • Software initiated with the VTX or VTX_TC

AMD R 700 Instructions Memory Read • Software initiated with the VTX or VTX_TC instructions • Fetch data from one of three types of buffers • Scratch • Reduction • Scatter (general read/write) • Can be intermixed within a clause that can consist to as many as 16 memory read instructions (memory read instructions cannot be in the same clause as texture or vertex fetch instructions, or with local data share instructions).

AMD R 700 Instructions Data-Share Read/Write • Software initiated with the TEX control flow

AMD R 700 Instructions Data-Share Read/Write • Software initiated with the TEX control flow instructions • Within the clause, LDS uses common instruction encodings: • MEM_DSR – reads • MEM_DSW – writes LDS clause contains instructions that are issued sequentially. A write instruction followed by a read has all of the write data posted before the read so that data share within a clause can use a location repeatedly to exchange data.

AMD R 700 Instructions Vertex-fetch • Software initiated with the VTX or VTX_TC instruction.

AMD R 700 Instructions Vertex-fetch • Software initiated with the VTX or VTX_TC instruction. • Fetch vertices from the vertex buffer based on a GPR address. • At most eight instructions long Relative byte offset of the word in memory

AMD R 700 Instructions Texture-fetch • Software initiated with the TEX instruction • Consists

AMD R 700 Instructions Texture-fetch • Software initiated with the TEX instruction • Consists of instructions that lookup texture elements known as texels, based on a GPR address or constant-fetch operations • At most eight instructions long Relative byte offset of the word in memory

AMD R 700 ALU Instructions ALU instructions are organized in pairs of two 32

AMD R 700 ALU Instructions ALU instructions are organized in pairs of two 32 bit double words. • OP 2 instruction - ALU_INST field uses a seven-bit opcode, with the high three bits set to 000 b. • OP 3 instruction – at least 1 of the three high bits of the ALU_INST field has a nonzero value. Byte offset of the double words Choice of 2 or 3 source operands

AMD R 700 ALU Instructions The processor contains multiple sets of five scalar ALUs.

AMD R 700 ALU Instructions The processor contains multiple sets of five scalar ALUs. Four of the Five are called ALU. [X, Y, Z, W] and perform scalar operations on as many as three 32 -bit data elements. 128 bits containing 4 – 32 bit elements in little-endian order Most-significant element Lease-significant

AMD R 700 Procedure Calls Control Flow Sync Barrier Instruction, i. e. 1 -can

AMD R 700 Procedure Calls Control Flow Sync Barrier Instruction, i. e. 1 -can run in CF_INST_JUMP – parallel withexecute jump prior instruction statement End of Program 31 32 31 29: 23 22 21 20 19 COUNT Number of Specifies. Control how toflow constant to instructions slots to evaluateuse thefor condition flow control execute in the clause test for each statements. pixel Pop Count (values 1 -16) 18: 13 12: 10 9: 8 7: 3 2: 0 0 Whole_Quad_Mode Valid_Pixel_Mode MSB of Count Field and Amount Address And VPM are 1 -Execute instruction to increment call nesting mutually exclusive if invalid pixels counter are by when executing a +4 andcall +0 statement are relative(the to the (either WQM or VPM Offsets inactive callbyte is address specified in the host-written PGM_START_* register. depth Texture are set to 1) skipped if the nesting + and Vertex clauses aligned addresses. 1 -Execute instruction must start on 16 -byte CALL_COUNT > 32) range 0 -31 if ALL pixels are active and valid.

AMD R 700 CISC or RISC • CISC characteristics: • Number of operands per

AMD R 700 CISC or RISC • CISC characteristics: • Number of operands per instruction • Complex set of operations in the ISA • Instructions work out of both on and off chip memory • RISC characteristics: • Large number of registers • Separate instructions for load/store and data processing

Design Policies The Good and the Bad 1. Simplicity favors regularity R 700 series

Design Policies The Good and the Bad 1. Simplicity favors regularity R 700 series specializes in the processing of graphic instructions in parallel quickly 2. Smaller is faster Not so good – it’s all about trade-offs 3. Make the common case fast The R 700 series processes graphics efficiently at high speeds 4. Good design demands good compromise Trade error handling for high speed

Conclusion Pros • Multiple parallel stream processing units (SPU) • Each single instruction multiple

Conclusion Pros • Multiple parallel stream processing units (SPU) • Each single instruction multiple data pipeline maintains a separate interface to memory • Speed Cons • Cost • R 700 programs do not support • Exceptions • Interrupts • Errors • Any event that can interrupt pipeline operations • Size of the circuit board

Conclusion • AMD R 7000 GPU is a specialized processing unit • Depending on

Conclusion • AMD R 7000 GPU is a specialized processing unit • Depending on the application/use the tradeoffs can be worth it

References • Ali Umut ˙Irt¨urk. "GUSTO: General Architecture Design Utility and Synthesis Tool for

References • Ali Umut ˙Irt¨urk. "GUSTO: General Architecture Design Utility and Synthesis Tool for Optimization. " Thesis. UNIVERSITY OF CALIFORNIA, SAN DIEGO, 2009. Web. 20 Apr. 2010. <http: //cseweb. ucsd. edu/~kastner/papers/phdthesis-irturk. pdf>. • AMD 700 Chipset Series. 14 Apr. 2010. Web. 16 Apr. 2010. <http: //en. wikipedia. org/wiki/AMD_700_chipset_series>. • AMD 700 Chipset Series. Advanced Micro Devices, 2009. Print. • ATI CTM Guide. Advance Micro Devices, Inc, 2006. Print.