OnChip Logic Minimization Roman Lysecky Frank Vahid Department

  • Slides: 18
Download presentation
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering

On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, and a Department of Education GAANN fellowship

Introduction • Boolean logic minimization typically used during logic synthesis • Two-level logic minimization

Introduction • Boolean logic minimization typically used during logic synthesis • Two-level logic minimization can be considered as a general optimization technique • Many applications can benefit from using logic minimization dynamically – IP routing table reduction – Access control list (ACL) reduction 2

On-Chip Minimization Applications (IP Routing Table Reduction) 138. 23. 16. 9 Incoming IP packet

On-Chip Minimization Applications (IP Routing Table Reduction) 138. 23. 16. 9 Incoming IP packet 138. 23. 16. 9 Destination IP address Prefix 138. 23. 16. x Next hop Port 7 138. 23. x. x Port 5 125. x. x. x Port 3 Port 7 Lookup Destination IP in Routing Table Choose Longest Prefix Match 3

On-Chip Minimization Applications (IP Routing Table Reduction) • Longest Prefix Match – Ternary CAM

On-Chip Minimization Applications (IP Routing Table Reduction) • Longest Prefix Match – Ternary CAM (Mc. Auley & Francis, 1993) • • • Store 0, 1, * Store IP address as TCAM entry Store prefix length using the TCAM entries mask Fast Smaller hardware resources than binary CAM Very large power consumption – How can we reduce hardware resources and power consumption? 4

On-Chip Minimization Applications (IP Routing Table Reduction) • Mask Extension (Liu, 2002) – Uses

On-Chip Minimization Applications (IP Routing Table Reduction) • Mask Extension (Liu, 2002) – Uses two-level logic minimization – Performing minimization for each update too slow – Incremental update • Existing minimized set becomes don’t care set • New route becomes single entry in on set • Achieves an average of 50 updates/second – Not considering communication, though Original TCAM Entries # IP Addr. Mask Next Hop P 1 10011100 11111100 7 P 2 10001100 11111100 7 Logic Minimization TCAM Entries after Mask Extension # IP Addr. P 1&P 2 10001100 Mask Next Hop 11101100 7 5

On-Chip Minimization Applications (Access Control List Reduction) • Access Control List (ACL) – Used

On-Chip Minimization Applications (Access Control List Reduction) • Access Control List (ACL) – Used to restrict IP traffic through network routers – ACL size can range anywhere from 300 (UCR CS&E Dept. ) to 10, 000 (AOL) – Common use is to block a particular protocol or port number to avoid attacks such as Denial of Service attacks • ACL Minimization – Similar approach as used for IP routing table reduction – However, order of the list must be preserved ACL Input Format Type Protocol In IP In Port Out IP Out Port Action 6

Introduction (Off-chip Logic Minimization) Router Transmit Data to Server Proc. I$ • Slow due

Introduction (Off-chip Logic Minimization) Router Transmit Data to Server Proc. I$ • Slow due to communication • Sensitive to server failures • Security issues MEM Execute Minimizer D$ Server Network Router Chip Transmit Result to Router 7

Introduction (On-chip Logic Minimization) Router Initialize Minimizer I$ Proc. D$ Execute Minimizer Indicate Completion

Introduction (On-chip Logic Minimization) Router Initialize Minimizer I$ Proc. D$ Execute Minimizer Indicate Completion ARM 7 MEM DMA Mem. On-chip Minimizer Network Router Chip 8

On-Chip Logic Minimization Requirements • On-chip Logic Minimization Requirements – Data Memory Resources •

On-Chip Logic Minimization Requirements • On-chip Logic Minimization Requirements – Data Memory Resources • On-chip minimization algorithms must be very memory conscious – Instruction Memory Resources • On-chip minimization algorithm must incorporate simplified approaches that result in acceptable designs – Execution time • Limited data and instruction memory will likely lead to longer execution times • Must still remain reasonable – Quality of results • Must be capable of producing solution relatively close to optimal Focus on developing an on-chip logic minimization tool that produces acceptable results with reasonable increases in execution time while using limited memory resources. 9

ROCM • ROCM – Riverside On-Chip Minimizer – Two-level minimization tool – Utilized a

ROCM • ROCM – Riverside On-Chip Minimizer – Two-level minimization tool – Utilized a combination of approaches from Espresso-II (Brayton, et al. 1984) and Presto (Svoboda & White, 1979) Optimize(F, D) { Order. Cubes(F) for i=1 to |F| { c = Fi (c', W) = Iterative. Expand(F, D, c) F = (F c') - W } } Iterative. Expand(F, D, c) { W = {} c' = c for i=1 to |c| { c' = Expand(c', i) (val, W') = Valid. Expansion(F, D, c') if val = true W = W W' else Revert(c', i) } return (c', W) } 10

ROCM Results – Quality (Full Routing Table Reduction) Only 2% larger than optimal on

ROCM Results – Quality (Full Routing Table Reduction) Only 2% larger than optimal on average 11

ROCM Results – Performance (Incremental Update Execution Time) ROCM executing on a 40 MHz

ROCM Results – Performance (Incremental Update Execution Time) ROCM executing on a 40 MHz ARM 7 requires less than 1 second On a 500 MHz On a 40 MHz ARM 7 Sun Ultra 60 12

ROCM Results – Memory (Code Size and Data Memory Usage) Small code size of

ROCM Results – Memory (Code Size and Data Memory Usage) Small code size of only 22 kilobytes Code Size Average data memory usage of only 1 megabyte Data Memory Usage 13

ROCM Results – Quality (Access Control List Reduction) Only 2% larger than optimal on

ROCM Results – Quality (Access Control List Reduction) Only 2% larger than optimal on average 14

Customizing ROCM • ROCM Customization – Beneficial to optimize an algorithm for a particular

Customizing ROCM • ROCM Customization – Beneficial to optimize an algorithm for a particular application – Customize ROCM’s data structures and algorithms for a particular input size • Require less memory • Reduce dynamic memory allocation • Improve performance – Created ROCM-32 customized for IP routing table reduction 15

Customized ROCM-32 Results (IP Routing Table Reduction) 37% reduction in execution time vs. ROCM

Customized ROCM-32 Results (IP Routing Table Reduction) 37% reduction in execution time vs. ROCM 11% reduction in data memory usage vs. ROCM 16

Conclusions • Presented Riverside On-Chip Minimizer (ROCM) • Feasible to execute logic minimization on

Conclusions • Presented Riverside On-Chip Minimizer (ROCM) • Feasible to execute logic minimization on chip – Can be executed on an embedded 40 MHz ARM 7 in seconds for real networking problem sizes – Requires small code size (22 kilobytes) – Requires small data memory (1 megabyte) • Produces good results – On average only 2% larger than exact minimization • Shown usefulness for networking applications 17

Future Work • More Applications – May appear now that on-chip minimization is feasible

Future Work • More Applications – May appear now that on-chip minimization is feasible • Dynamic HW/SW Partitioning – Dynamically partition executing binary to on-chip configurable logic – Logic minimization is used during the logic synthesis stage – Initial work on dynamic HW/SW partitioning presented at DAC 2003 yesterday in session 15 18