Direct Cache Access for High Bandwidth Network IO

Direct Cache Access for High Bandwidth Network I/O Authors: Ram Huggahalli Ravi Iyer Scott Tetrick Presented by Cao Zhang

Introduction Ø Recent I/O technologies such as PCI-Express and 10 Gb Ethernet enable unprecedented levels of I/O bandwidth in mainstream platforms Ø Memory latency alone can limit processors from matching 10 Gb inbound network I/O traffic Ø A platform-wide method called Direct Cache Access (DCA) to deliver inbound I/O data directly into processor caches

Basic Processor, Memory and I/O Interaction

I/O Interactions From A Cache Perspective

TCP/IP Example

Memory Access Profiles of Benchmarks Ø NTTTCP Ø SPECweb 99 Ø TPC-W Ø TPC-C Evaluation Methodology Ø CASPER cache simulation fed by memory access traces Ø Simulated 4 MB, 8 MB, and 16 MB caches, all configured a 64 -byte line size and 8 -way set associative

Usage of I/O Writes by Processor

Distance between I/O Writes and Processor Reads

Traffic Profiles

System Considerations Ø Cache Selection within a Hierarchy Last level of cache is preferred Ø Cache Replacement Policy Limiting I/O-related allocations to one cache way Ø System Interconnect Protocol 1. Write-Update can be useful 2. Prefetch hint Ø Identify the Target Processor NUMA memory affinity and connection based affinity

Summary of Architectural Guidelines

Ø Conclusions Ø References Ø Questions?