CSCE 212 Computer Architecture Lecture 08 Aggregate Data

  • Slides: 37
Download presentation
CSCE 212 Computer Architecture Lecture 08 Aggregate Data Topics n SSH/X n Arrays n

CSCE 212 Computer Architecture Lecture 08 Aggregate Data Topics n SSH/X n Arrays n Structs Unions n February 15, 2018

Today • Secure Remote Logins -- Secure SHell SSH Separate slides • Arrays n

Today • Secure Remote Logins -- Secure SHell SSH Separate slides • Arrays n One-dimensional Multi-dimensional (nested) n Multi-level n • Structures n n n Allocation Access Alignment • Floating Point code – 2– CSCE 212 H Spring 2018

Counting things Rule of Sum - Statement: If there are m choices for one

Counting things Rule of Sum - Statement: If there are m choices for one action, and n choices for another action and the two actions cannot be done at the same time, then there are m+n ways to choose one of these actions. Rule of Product - Statement: If there are m ways of doing something, and n ways of doing another thing after that, then there are m*n ways to perform both of these actions. Example: choose bn-1 bn-2… b 1 b 0 § Two choices for each bit, so there are 2*2*2…. *2*2 = 2 n different bit strings of length n – 3– https: //brilliant. org/wiki/rule-of-sum-and-rule-of-product-problem-solving/ CSCE 212 H Spring 2018

Array Allocation • Basic Principle T A[L]; n n Array of data type T

Array Allocation • Basic Principle T A[L]; n n Array of data type T and length L Contiguously allocated region of L * sizeof(T) bytes in memory char string[12]; x x + 12 int val[5]; x x+4 x+8 x + 12 x + 16 x + 20 double a[3]; x x+8 x + 16 x + 24 char *p[3]; – 4– x + 16 x + 24 CSCE 212 H Spring 2018

Array Access • Basic Principle T A[L]; Array of data type T and length

Array Access • Basic Principle T A[L]; Array of data type T and length L n Identifier A can be used as a pointer to array element 0: Type T* n 1 int val[5]; x • Reference – 5– val[4] val+1 &val[2] val[5] *(val+1) val + i 5 x+4 2 x+8 Type Value int int 3 x x+4 x+8 ? ? 5 x+4 i * * 1 x + 12 3 x + 16 x + 20 CSCE 212 H Spring 2018

Array Example #define ZLEN 5 typedef int zip_dig[ZLEN]; zip_dig cmu = { 1, 5,

Array Example #define ZLEN 5 typedef int zip_dig[ZLEN]; zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 }; 1 zip_dig cmu; 16 5 20 0 zip_dig mit; 36 56 24 2 40 9 zip_dig ucb; 2 28 1 44 4 60 1 32 3 48 7 64 3 9 52 2 68 36 56 0 72 76 • Declaration “zip_dig cmu” equivalent to “int cmu[5]” • Example arrays were allocated in successive 20 byte blocks n Not guaranteed to happen in general CSCE 212 H Spring 2018 – 6–

Array Accessing Example 1 zip_dig cmu; 16 5 20 2 24 1 3 28

Array Accessing Example 1 zip_dig cmu; 16 5 20 2 24 1 3 28 32 int get_digit (zip_dig z, int digit) { return z[digit]; } n IA 32 n # %rdi = z # %rsi = digit movl (%rdi, %rsi, 4), %eax # z[digit] n n – 7– 36 Register %rdi contains starting address of array Register %rsi contains array index Desired digit at 4*%rdi + %rsi Use memory reference (%rdi, %rsi, 4) CSCE 212 H Spring 2018

Array Loop Example void zincr(zip_dig z) { size_t i; for (i = 0; i

Array Loop Example void zincr(zip_dig z) { size_t i; for (i = 0; i < ZLEN; i++) z[i]++; } # %rdi = z movl $0, jmp. L 3. L 4: addl $1, addq $1, . L 3: cmpq $4, jbe. L 4 rep; ret – 8– %eax # i = 0 # goto middle # loop: (%rdi, %rax, 4) # z[i]++ %rax # i++ # middle %rax # i: 4 # if <=, goto loop CSCE 212 H Spring 2018

Multidimensional (Nested) Arrays • Declaration A[0][0] T A[R][C]; n 2 D array of data

Multidimensional (Nested) Arrays • Declaration A[0][0] T A[R][C]; n 2 D array of data type T R rows, C columns n Type T element requires K bytes n • • • A[0][C-1] • • • A[R-1][0] • • • A[R-1][C-1] • Array Size n R * C * K bytes • Arrangement Row-Major Ordering int A[R][C]; n A [0] – 9– A A • • • [0] [1] [C-1] [0] A • • • [1] [C-1] 4*R*C Bytes • • • A A [R-1] • • • [R-1] [0] [C-1] CSCE 212 H Spring 2018

Nested Array Example #define PCOUNT 4 zip_dig pgh[PCOUNT] = {{1, 5, 2, 0, 6},

Nested Array Example #define PCOUNT 4 zip_dig pgh[PCOUNT] = {{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }}; zip_dig pgh[4]; 1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1 76 96 116 136 156 • “zip_dig pgh[4]” equivalent to “int pgh[4][5]” n n Variable pgh: array of 4 elements, allocated contiguously Each element is an array of 5 int’s, allocated contiguously • “Row-Major” ordering of all elements in memory CSCE 212 H Spring 2018 – 10 –

Nested Array Row Access • Row Vectors n A[i] is array of C elements

Nested Array Row Access • Row Vectors n A[i] is array of C elements n Each element of type T requires K bytes Starting address A + i * (C * K) n int A[R][C]; A[0] A – 11 – • • • A[i] A [0] [C-1] • • • A [i] [0] • • • A+(i*C*4) A[R-1] A [i] [C-1] A • • • [R-1] [0] • • • A [R-1] [C-1] A+((R-1)*C*4) CSCE 212 H Spring 2018

Nested Array Element Access • Array Elements n n A[i][j] is element of type

Nested Array Element Access • Array Elements n n A[i][j] is element of type T, which requires K bytes Address A + i * (C * K) + j * K = A + (i * C + j)* K int A[R][C]; A[0] A – 12 – • • • A[i] A [0] [C-1] • • • • A [i] [j] A[R-1] • • • A+(i*C*4)+(j*4) A • • • [R-1] [0] • • • A [R-1] [C-1] A+((R-1)*C*4) CSCE 212 H Spring 2018

Multi-Level Array Example zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig

Multi-Level Array Example zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 }; #define UCOUNT 3 int *univ[UCOUNT] = {mit, cmu, ucb}; cmu univ 160 36 168 16 176 56 mit 1 16 0 – 13 – • Each element is a pointer n 8 bytes • Each pointer points to array of int’s 2 24 2 40 9 56 Variable univ denotes array of 3 elements 5 20 ucb 36 • 28 1 44 4 60 1 32 3 48 7 64 3 9 52 2 68 36 56 0 72 76 CSCE 212 H Spring 2018

Element Access in Multi-Level Array int get_univ_digit (size_t index, size_t digit) { return univ[index][digit];

Element Access in Multi-Level Array int get_univ_digit (size_t index, size_t digit) { return univ[index][digit]; } salq addq movl ret $2, %rsi # 4*digit univ(, %rdi, 8), %rsi # p = univ[index] + 4*digit (%rsi), %eax # return *p • Computation n Element access Mem[univ+8*index]+4*digit] n Must do two memory reads l First get pointer to row array – 14 – l Then access element within array CSCE 212 H Spring 2018

Array Element Accesses Multi-level array Nested array int get_pgh_digit int get_univ_digit (size_t index, size_t

Array Element Accesses Multi-level array Nested array int get_pgh_digit int get_univ_digit (size_t index, size_t digit) { { return pgh[index][digit]; return univ[index][digit]; } } Accesses looks similar in C, but address computations very different: Mem[pgh+20*index+4*digit] – 15 – Mem[univ+8*index]+4*digit] CSCE 212 H Spring 2018

N X N Matrix Code • Fixed dimensions n Know value of N at

N X N Matrix Code • Fixed dimensions n Know value of N at compile time • Variable dimensions, explicit indexing n Traditional way to implement dynamic arrays • Variable dimensions, implicit indexing – 16 – n Now supported by gcc #define N 16 typedef int fix_matrix[N][N]; /* Get element a[i][j] */ int fix_ele(fix_matrix a, size_t i, size_t j) { return a[i][j]; } #define IDX(n, i, j) ((i)*(n)+(j)) /* Get element a[i][j] */ int vec_ele(size_t n, int *a, size_t i, size_t j) { return a[IDX(n, i, j)]; } /* Get element a[i][j] */ int var_ele(size_t n, int a[n][n], size_t i, size_t j) { return a[i][j]; } CSCE 212 H Spring 2018

16 X 16 Matrix Access ¢ Array Elements § Address A + i *

16 X 16 Matrix Access ¢ Array Elements § Address A + i * (C * K) + j * K § C = 16, K = 4 /* Get element a[i][j] */ int fix_ele(fix_matrix a, size_t i, size_t j) { return a[i][j]; } # a in %rdi, i in %rsi, j in salq $6, %rsi addq %rsi, %rdi movl (%rdi, %rdx, 4), %eax ret – 17 – %rdx # 64*i # a + 64*i # M[a + 64*i + 4*j] CSCE 212 H Spring 2018

n X n Matrix Access ¢ Array Elements § Address A + i *

n X n Matrix Access ¢ Array Elements § Address A + i * (C * K) + j * K § C = n, K = 4 § Must perform integer multiplication /* Get element a[i][j] */ int var_ele(size_t n, int a[n][n], size_t i, size_t j) { return a[i][j]; } # n in %rdi, a in %rsi, i in %rdx, j in %rcx imulq %rdx, %rdi # n*i leaq (%rsi, %rdi, 4), %rax # a + 4*n*i movl (%rax, %rcx, 4), %eax # a + 4*n*i + 4*j ret – 18 – CSCE 212 H Spring 2018

Structure Representation r struct rec { int a[4]; size_t i; struct rec *next; };

Structure Representation r struct rec { int a[4]; size_t i; struct rec *next; }; a 0 i 16 next 24 32 • Structure represented as block of memory n Big enough to hold all of the fields • Fields ordered according to declaration n Even if another ordering could yield a more compact representation • Compiler determines overall size + positions of fields n – 19 – Machine-level program has no understanding of the structures in the source code CSCE 212 H Spring 2018

Generating Pointer to Structure Member r struct rec { int a[4]; size_t i; struct

Generating Pointer to Structure Member r struct rec { int a[4]; size_t i; struct rec *next; }; • Generating Pointer to Array Element n n – 20 – Offset of each structure member determined at compile time Compute as r + 4*idx r+4*idx a 0 i 16 next 24 32 int *get_ap (struct rec *r, size_t idx) { return &r->a[idx]; } # r in %rdi, idx in %rsi leaq (%rdi, %rsi, 4), %rax ret CSCE 212 H Spring 2018

struct rec { int a[3]; int i; struct rec *next; }; Following Linked List

struct rec { int a[3]; int i; struct rec *next; }; Following Linked List • C Code r void set_val (struct rec *r, int val) { while (r) { int i = r->i; r->a[i] = val; r = r->next; } } . L 11: movslq movl movq testq jne – 21 – a i 16 0 next 24 32 Element i Register Value %rdi r %rsi val # loop: 16(%rdi), %rax # i = M[r+16] %esi, (%rdi, %rax, 4) # M[r+4*i] = val 24(%rdi), %rdi # r = M[r+24] %rdi, %rdi # Test r. L 11 # if !=0 goto loop CSCE 212 H Spring 2018

Structures & Alignment • Unaligned Data c i[0] p p+1 i[1] p+5 v p+9

Structures & Alignment • Unaligned Data c i[0] p p+1 i[1] p+5 v p+9 p+17 struct S 1 { char c; int i[2]; double v; } *p; • Aligned Data n n c p+0 Primitive data type requires K bytes Address must be multiple of K 3 bytes p+4 i[0] i[1] p+8 Multiple of 4 Multiple of 8 – 22 – 4 bytes p+16 v p+24 Multiple of 8 CSCE 212 H Spring 2018

Alignment Principles • Aligned Data n Primitive data type requires K bytes n Address

Alignment Principles • Aligned Data n Primitive data type requires K bytes n Address must be multiple of K Required on some machines; advised on x 86 -64 n • Motivation for Aligning Data n Memory accessed by (aligned) chunks of 4 or 8 bytes (system dependent) l Inefficient to load or store datum that spans quad word boundaries l Virtual memory trickier when datum spans 2 pages • Compiler n – 23 – Inserts gaps in structure to ensure correct alignment of fields CSCE 212 H Spring 2018

Specific Cases of Alignment (x 86 -64) • 1 byte: char, … n no

Specific Cases of Alignment (x 86 -64) • 1 byte: char, … n no restrictions on address • 2 bytes: short, … n lowest 1 bit of address must be 02 • 4 bytes: int, float, … n lowest 2 bits of address must be 002 • 8 bytes: double, long, char *, … n lowest 3 bits of address must be 0002 • 16 bytes: long double (GCC on Linux) n – 24 – lowest 4 bits of address must be 00002 CSCE 212 H Spring 2018

Satisfying Alignment with Structures • Within structure: struct S 1 { char c; n

Satisfying Alignment with Structures • Within structure: struct S 1 { char c; n Must satisfy each element’s alignment requirement i[2]; double v; Overall structure placement } *p; • n Each structure has alignment requirement K l K = Largest alignment of any element n Initial address & structure length must be multiples of K • Example: K = 8, due to double element c 3 bytes i[0] i[1] n p+0 p+4 p+8 Multiple of 4 Multiple of 8 – 25 – 4 bytes p+16 v p+24 Multiple of 8 CSCE 212 H Spring 2018

Meeting Overall Alignment Requirement • For largest alignment requirement K struct • Overall structure

Meeting Overall Alignment Requirement • For largest alignment requirement K struct • Overall structure must be multiple v p+0 i[0] p+8 i[1] S 2 { double v; of Kint i[2]; char c; } *p; c 7 bytes p+16 p+24 Multiple of K=8 – 26 – CSCE 212 H Spring 2018

Arrays of Structures struct S 2 { double v; int i[2]; char c; }

Arrays of Structures struct S 2 { double v; int i[2]; char c; } a[10]; • Overall structure length multiple of K • Satisfy alignment requirement for every element a[0] a+0 a[1] a+24 v a+24 – 27 – i[0] a+32 • • • a[2] a+48 i[1] a+72 c a+40 7 bytes a+48 CSCE 212 H Spring 2018

Accessing Array Elements • Compute array offset 12*idx n sizeof(S 3), including alignment spacers

Accessing Array Elements • Compute array offset 12*idx n sizeof(S 3), including alignment spacers struct S 3 { short i; float v; short j; } a[10]; • Element j is at offset 8 within structure • Assembler gives offset a+8 n Resolved during linking • • • a[0] a+0 a+12 i a+12*idx short get_j(int idx) { return a[idx]. j; } – 28 – a[idx] • • • a+12*idx 2 bytes v j 2 bytes a+12*idx+8 # %rdi = idx leaq (%rdi, 2), %rax # 3*idx movzwl a+8(, %rax, 4), %eax CSCE 212 H Spring 2018

Saving Space • Put large data types first struct S 5 { int i;

Saving Space • Put large data types first struct S 5 { int i; char c; char d; } *p; struct S 4 { char c; int i; char d; } *p; • Effect (K=4) c i – 29 – i 3 bytes c d d 3 bytes 2 bytes CSCE 212 H Spring 2018

Today • Arrays n n n One-dimensional Multi-dimensional (nested) Multi-level • Structures n n

Today • Arrays n n n One-dimensional Multi-dimensional (nested) Multi-level • Structures n n n Allocation Access Alignment • Floating Point – 30 – CSCE 212 H Spring 2018

Background • History n x 87 FP l Legacy, very ugly n SSE FP

Background • History n x 87 FP l Legacy, very ugly n SSE FP l Supported by Shark machines l Special case use of vector instructions n AVX FP l Newest version l Similar to SSE l Documented in book – 31 – CSCE 212 H Spring 2018

Programming with SSE 3 XMM Registers – 32 – n 16 total, each 16

Programming with SSE 3 XMM Registers – 32 – n 16 total, each 16 bytes n 16 single-byte integers n 8 16 -bit integers n 4 32 -bit integers n 4 single-precision floats n 2 double-precision floats n 1 single-precision float n 1 double-precision float CSCE 212 H Spring 2018

Scalar & SIMD Operations n Scalar Operations: Single Precision addss %xmm 0, %xmm 1

Scalar & SIMD Operations n Scalar Operations: Single Precision addss %xmm 0, %xmm 1 %xmm 0 + %xmm 1 n SIMD Operations: Single Precision addps %xmm 0, %xmm 1 %xmm 0 + + %xmm 1 n Scalar Operations: Double Precision addsd %xmm 0, %xmm 1 %xmm 0 + – 33 – %xmm 1 CSCE 212 H Spring 2018

FP Basics • Arguments passed in %xmm 0, %xmm 1, . . . •

FP Basics • Arguments passed in %xmm 0, %xmm 1, . . . • Result returned in %xmm 0 • All XMM registers caller-saved float fadd(float x, float y) { return x + y; } # x in %xmm 0, y in %xmm 1 addss %xmm 1, %xmm 0 ret – 34 – double dadd(double x, double y) { return x + y; } # x in %xmm 0, y in %xmm 1 addsd %xmm 1, %xmm 0 ret CSCE 212 H Spring 2018

FP Memory Referencing • Integer (and pointer) arguments passed in regular registers • FP

FP Memory Referencing • Integer (and pointer) arguments passed in regular registers • FP values passed in XMM registers • Different mov instructions to move between XMM registers, and between memory and XMM registers double dincr(double *p, double v) { double x = *p; *p = x + v; return x; } # p in %rdi, v in %xmm 0 movapd %xmm 0, %xmm 1 # movsd (%rdi), %xmm 0 # addsd %xmm 0, %xmm 1 # movsd %xmm 1, (%rdi) # ret – 35 – Copy v x = *p t = x + v *p = t CSCE 212 H Spring 2018

Other Aspects of FP Code • Lots of instructions n Different operations, different formats,

Other Aspects of FP Code • Lots of instructions n Different operations, different formats, . . . • Floating-point comparisons n Instructions ucomiss and ucomisd n Set condition codes CF, ZF, and PF • Using constant values – 36 – n Set XMM 0 register to 0 with instruction xorpd %xmm 0, %xmm 0 n Others loaded from memory CSCE 212 H Spring 2018

Summary • Arrays n n Elements packed into contiguous region of memory Use index

Summary • Arrays n n Elements packed into contiguous region of memory Use index arithmetic to locate individual elements • Structures n n n Elements packed into single region of memory Access using offsets determined by compiler Possible require internal and external padding to ensure alignment • Combinations n Can nest structure and array code arbitrarily • Floating Point n – 37 – Data held and operated on in XMM registers CSCE 212 H Spring 2018