Design of a Reconfigurable Hardware For Efficient Implementation
Design of a Reconfigurable Hardware For Efficient Implementation of Secret Key and Public Key Cryptography
Presentation Outline n n n n Introduction & Motivation Related Work Design Methodology Design Description Algorithm Implementations Comparison with other Work Programming Paradigm Conclusion/Work in Progress
Motivating Factors Need for high speed cryptography n Need for algorithm independence n Need for more secure implementations n Need for implementing both Symmetric and Asymmetric key encryption n
Need for High Speed Implementations Software implementations cannot provide real time rates n Hardware implementations essential for n ¨ IPSec end points ¨ SSL servers ¨ VPN at rates exceeding ATM n Algorithm implementation must be able to sustain the network bandwidth
Need for Algorithm Independence n IPSec ¨ n SSL Transactions ¨ n Cipher Algorithm Specified in Security Association (SA) Algorithm Negotiable for both Key Exchange & Encryption Need for Both Secret Key and Public Key Encryption Session establishment - Large Number of transactions ¨ Dedicated hardware not cheap! ¨
Hardware Implementation Benefits More secure implementations n Implementing both algorithms in hardware removes bottleneck associated with slow computations in key establishment n Single hardware implementation supporting both algorithms reduce costs of separate hardware n
Advantages of Reconfigurable Hardware Implementations Algorithm Agility n Algorithm Upload/Modification n Architecture Efficiency/Throughput n Cost Efficiency n
Comparison of Different Approaches
FPGAs? Post Fabrication Customization n Low Cost Design Cycle n Fast turnaround time n Potential for Parallelism n ¨ Instruction-level – Multiple operations ¨ Data-level – Multiple blocks of data ¨ Task-level – Parallel tasks (e. g. secret key)
FPGA: The basics n n n General purpose logic elements (LUTs) Very flexible interconnect Basically fine grained to support both data paths and random logic
FPGA: Disadvantages n n Too much flexible – inefficiencies Too fine grained – again inefficiencies Block ciphers primarily data flow oriented – implemented using a large number of small elements Ciphers have a well defined data flow – general purpose interconnect end up being slow and overkill in terms of area
FPGA vs. Specialized Reconfigurable Logic Coarse grained vs. Fine grained n Specialized interconnect vs. generic interconnect n Reduced reconfiguration times n End result n ¨ Faster performance with reduced area while maintaining enough flexibility to support the application domain
Issues in Reconfigurable Hardware Designs n How much of what to support? ¨ How many functional units? ¨ What kinds of functional units? ¨ How much support for random logic? ¨ How much interconnect flexibility to allow? n Programming/CAD tools ¨ What kind of programming model to target ¨ How to design efficient automated tools
Custom Reconfigurable Hardware Design- What’s involved? n Looking for commonalities/overlaps as well as disjoint elements ¨ ¨ n Balancing the resources ¨ ¨ ¨ n Identify crucial components Utilize potential overlap or partial reuse Generic enough but fast components Minimizing the differences in component types Upper bounds/Lower bounds Logic units vs. memory blocks Determining exact number of each type of unit Make the common case fast- IMPORTANT ALWAYS!
Related Work Cavium Networks’ SSL & IPSEC Protocol Aware Security Processor n USC Mark II ‘s Advanced Cryptographic Engine for IPsec n Worcester Polytechnic Institute’s COBRA Architecture n
SSL/IPsec Security Processor n n n Support for both public key and secret key encryption Not Reconfigurable Dedicated hardware blocks for each operation
Advanced Cryptographic Engine (ACE) n n n Designed to implement flexible cipher needs of IPsec Only supports block ciphers Support for any algorithm through a library of general purpose FPGA implementations
COBRA Architecture n n n Custom Reconfigurable Hardware for block ciphers Each RCE is a macro block supporting various component operations Configured using VLIW instructions
Design Methodology n Literature Survey ¨ Block cipher implementations ¨ Public key cipher implementations ¨ Identifying essential components of efficient implementations n n n Iterative Development of Architecture Validation by mapping several representative algorithms Identification of Programming Methodology
Categorizing Implementation Requirements n Essential step to handle the design complexity ¨ Logic Requirements ¨ Interconnection Requirements ¨ Memory (RAM/ROM) Requirements n Area and Performance directly affected by these
Prioritizing Support n Ordered by importance and then by relative hardware complexity ¨ ¨ ¨ AES (Rijndael) DES Modular Exponentiation (RSA) Serpent Twofish RC 6, MARS, and others
Block Ciphers: Key Elements n n n n Bitwise XOR, AND, OR. Addition or subtraction modulo 2 n Shift or rotation by a constant number of bits. Data-dependent rotation by a variable number of bits. Multiplication modulo the table entry value. Multiplication in the Galois field specified by the table entry value. Inversion modulo the table entry value. Look-up-table substitution
Block Cipher: Core Operations
Modular Multiplication and Exponentiation Modular Exponentiation implemented with multiple and square algorithm n Montgomery Multiplication algorithm the most popular for modulo multiplication n Various Approaches for Implementation n ¨ Systolic Array ¨ Word Based
ME & MM n n ME primarily requires fast adders CSA based implementation most common The highest throughput implementation used redundant representation with carry save adders for computation of partial results The same implementation style thus selected for ME
Our Design: Key Insight n n CSA made up of 2 half adders with 1 OR gate Each half adder itself 1 XOR & 1 AND Add some configurability to the basic CSA Result: A fast basic element with support for most of primitive operations
So What Else is needed? Shifts between rounds of addition (for modulo exponentiation) n support for fixed length shifts, rotates & arbitrary permutes of 32 -bit operands (for symmetric key) n Solution: A Permutation Unit! n
Structure of Proposed Design Final Design arrived upon by iterative refinement n Hierarchical Design n ¨ Cell ¨ Block/Cluster ¨ Groups ¨ Top of Hierarchy
The Cell
The Block/Cluster
Group
Interconnects In a Group
Overall Structure
Random Logic Support
- Slides: 34