Assembly Language x 86 Programming Supplement Advanced ASM
Assembly Language x 86 Programming Supplement: Advanced ASM Topics Slide show prepared by Zuoliu Ding at Fullerton College, 09/2020 (c) Zuoliu DING, 2020. All rights reserved. You may modify and copy this slide show for your personal use, or for use in the classroom, as long as this copyright statement, the author's name, and the title are not changed.
Chapter Overview • SIMD Technology • Boot Loader and BIOS Interrupt Zuoliu Ding, Assembly Language Programming 2
SIMD Technology • MMX and SSE Technology • https: //en. wikipedia. org/wiki/SIMD • https: //en. wikipedia. org/wiki/Streaming_SIMD_Extensions • Examples: • Find minimum code sample • Array sum with packed addition From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 3
Evolution of X 86 -SSE after MMX -1 Zuoliu Ding, Assembly Language Programming. 4
Evolution of X 86 -SSE after MMX - 2 Zuoliu Ding, Assembly Language Programming. 5
MMX Register Set MMX technology adds eight 64 -bit registers to the core x 86 -32 platform From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 6
MMX Technology – Data Type An MMX can be used to perform SIMD operations using eight 8 -bit integers, four 16 -bit integers, or two 32 -bit integers, and one QWORD From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 7
Find. Array. Min. Original PROC uses ECX, p. Array: DWORD, n: DWORD ; Find the array Minimun in an array by p. Array ; Returns: the array Minimum in AL ; Initialize mov ecx, n mov edx, p. Array mov eax, 0 ffh L 1: cmp AL, [edx] JB L 2 mov AL, [edx] L 2: inc edx LOOP L 1 ret Find. Array. Min. Original ENDP ; ecx = 'n' ; edx = pointer to From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 8
SIMD Instruction Examples From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 9
Find. Array. Min. Advanced 1 ; Initialize mov ecx, n ; ecx = 'n' shr ecx, 4 ; ecx = number of 16 -byte blocks mov edx, p. Array ; edx = pointer to 'x' movq mm 4, Start. Min. Val ; qword 0 ffffffffh movq mm 6, mm 4 ; mm 6: mm 4 current min values ; Scan array for min values @@: movq mm 0, [edx] ; mm 0 = packed 8 bytes movq mm 1, [edx+8] ; mm 1 = packed 8 bytes pminub mm 6, mm 0 ; mm 6 = updated min values pminub mm 4, mm 1 ; mm 4 = updated min values add edx, 16 ; set edx to next 16 byte block dec ecx jnz @B ; jump if more data remains From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 10
Find. Array. Min. Advanced 2 ; Determine final minimum value pminub mm 6, mm 4 ; mm 6[63: 0] = final 8 min vals pshufw mm 0, mm 6, 00001110 b ; mm 0[31: 0] = mm 6[63: 32] pminub mm 6, mm 0 ; mm 6[31: 0] = final 4 min vals pshufw mm 0, mm 6, 00000001 b ; mm 0[15: 0] = mm 6[31: 16] pminub mm 6, mm 0 ; mm 6[15: 0] = final 2 min vals pextrw eax, mm 6, 0 cmp al, ah jbe Done mov al, ah ; ax = final 2 min vals ; jump if al <= ah ; al = final min value From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 11
Find. Array. Min. Advanced 3 From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 12
Test and Benchmark • Test to verify the minimum: • Compare ASM original with SIMD advanced: From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 13
SIMD Instruction Examples From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 14
Packed Addition and Promotion From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 15
SIMD Array Sum ; Promote array values from bytes to words, then sum words movq mm 2, mm 0 movq mm 3, mm 1 punpcklbw mm 0, mm 7 ; mm 0 = 4 words punpcklbw mm 1, mm 7 ; mm 1 = 4 words punpckhbw mm 2, mm 7 ; mm 2 = 4 words punpckhbw mm 3, mm 7 ; mm 3 = 4 words paddw mm 0, mm 2 paddw mm 1, mm 3 paddw mm 0, mm 1 ; mm 0 = pack sums (4 words) ; Promote packed sums to DWs, update DW sums in mm 5: mm 4 movq mm 1, mm 0 punpcklwd mm 0, mm 7 ; mm 0 = packed sums (2 dwords) punpckhwd mm 1, mm 7 ; mm 1 = packed sums (2 dwords) paddd mm 4, mm 0 paddd mm 5, mm 1 ; mm 5: mm 4 = packed sums (4 dwords) From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 16
Array Sum Benchmark See more: Using SIMD to Optimize x 86 Assembly Code in Array Sum Example From “Modern X 86 Assembly Language Programming_ 32 -bit, 64 -bit, SSE, and AVX” Daniel Kusswurm Apress 17
Boot Loader and BIOS Interrupt • MS DOS Real-Address mode • Segmented Memory • What is Boot Loader? • “Hello, World!” boot loader • Writing a Tiny x 86 Bootloader • Emulator • QEMU the FAST! processor emulator • Bochs x 86 PC emulator From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 18
Real-Address mode • 1 MB RAM maximum addressable • Application programs can access any area of memory • Single tasking • Supported by MS-DOS operating system Irvine, Kip R. Assembly Language for x 86 Processors 7/e, 2015. 19
Segmented Memory linear addresses Segmented memory addressing: absolute (linear) address is a combination of a 16 -bit segment value added to a 16 -bit offset Irvine, Kip R. Assembly Language for x 86 Processors 7/e, 2015. one segment 20
Calculating Linear Addresses • Given a segment address, multiply it by 16 (add a hexadecimal zero), and add it to the offset • Example: convert 08 F 1: 0100 to a linear address Adjusted Segment value: 0 8 F 1 0 Add the offset: 0 1 0 0 Linear address: 0 9 0 1 0 Irvine, Kip R. Assembly Language for x 86 Processors 7/e, 2015. 21
Your turn. . . What linear address corresponds to the segment/offset address 028 F: 0030? 028 F 0 + 0030 = 02920 Always use hexadecimal notation for addresses. Irvine, Kip R. Assembly Language for x 86 Processors 7/e, 2015. 22
Your turn. . . What segment addresses correspond to the linear address 28 F 30 h? Many different segment-offset addresses can produce the linear address 28 F 30 h. For example: 28 F 0: 0030, 28 F 3: 0000, 28 B 0: 0430, . . . Irvine, Kip R. Assembly Language for x 86 Processors 7/e, 2015. 23
What is Boot Loader? • Boot loader • a program that loads an operating system. It is loaded into operating memory from persistent storage, such as a hard drive or whatever else • BIOS (Basic Input/Output System) • non-volatile firmware used to perform hardware initialization during the booting process (power-on startup), and to provide runtime services for operating systems From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 24
POST Procedure • POST stands for Power-On-Self-Test From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 25
How BIOS recognize a device bootable? • Magic numbers • 55 h and AAh, or 85 d and 170 d • Located in bytes 511 and 512 in bootable device • Specific memory address • 0000 h: 7 C 00 h • When BIOS finds such a boot sector, it loads it into memory at a specific address From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 26
Magic numbers - Boot Signature times 510 - ($ - $$) db 0 dw 0 x. AA 55 • $ means current address and $$ means the first address of the current section. • $-$$ gives you the offset from start to address of the currently executed instruction. • Subtract that value from 510 to get the offset from the address of the currently executed instruction to the 510 th byte. • The times directive will now pad that number of bytes up to 510 th byte with zeros. From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 27
Sector The most common physical sector size for hard disks is 512 bytes • https: //en. wikipedia. org/wiki/Cylinder-head-sector From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 28
Boot. asm org 0 x 7 C 00 bits 16 start: cli mov si, msg mov ah, 0 x 0 E. loop lodsb or al, al jz halt int 0 x 10 jmp. loop halt: hlt msg: db "Hello, World!", 0 ; ; Magic numbers times 510 - ($ - $$) db 0 dw 0 x. AA 55 From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com ; ; ; ; Disable the interrupts SI now points to message Indicate BIOS to print Loads SI into AL and ++ Checks if string end Jump to halt if end else int char printing ; CPU command to halt ; message to print 29
INT Instruction • The INT instruction executes a software interrupt. • The code that handles the interrupt is called an interrupt handler. • Syntax: INT number (number = 0. . FFh) The Interrupt Vector Table (IVT) holds a 32 -bit segmentoffset address for each possible interrupt handler. Interrupt Service Routine (ISR) is another name for interrupt handler. Irvine, Kip R. Assembly Language for x 86 Processors 7/e, 2015. 30
Interrupt 10 h/AH=0 Eh VIDEO - TELETYPE OUTPUT AH = 0 Eh AL = character to write BH = page number BL = foreground color (graphics modes only) See http: //www. ctyme. com/intr/int. htm From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 31
MS-DOS Memory Map • ROM BIOS The ROM BIOS, at memory locations F 0000 to FFFFF, is an important part of the computer’s operating system. • It contains system diagnostic and configuration software, as well as low-level input–output procedures used by application programs. https: //en. wikipedia. org/wiki/Interrupt_vector_table Irvine, Kip R. Assembly Language for x 86 Processors 7/e, 2015. 32
Interrupt Vectoring Process Irvine, Kip R. Assembly Language for x 86 Processors 7/e, 2015. 33
“Hello World” Bochs in Windows floppya: 1_44=boot. bin, status=inserted boot: a From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 34
“Hello World” QEMU in Ubuntu From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 35
Another Bochs Example in Windows From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 36
Another QEMU Example in Ubuntu From https: //www. joe-bergeron. com and https: //blog. ghaiklor. com 37
- Slides: 37