Performance evaluation of network security processor architectures combining
Performance evaluation of network security processor architectures: combining simulation with analytical estimation Yung Chia Lin, Chung Wen Huang, Jenq Kuen Lee, Ting Huang National Tsing Hua University, Taiwan
Network Security Processor Spec. • Applications: IPSEC, SSL, VPN, (3 G, WLAN), etc. • Functionalities: – – • • Public key Cryptography: RSA, ECC, DSA Private key Cryptography: AES, (DES, RC 4, Kasumi) Truly random number generator Message Authentication: SHA-1, MD 5 Target technology: 0. 25�m to 0. 18�m Clock rate: 200 MHz or higher (internal) 32 -bit data and instruction word 10 Gbps (OC 192) Power: 1 to 10 m. W/MHz at 3 V (LP to HP) Die size: 50 mm 2 On-chip bus: AMBA
System Architecture • Stand for a coprocessor of packet processor (network processor) server ARM NSP SSL (web browser), SSH (ftp, telnet) System bus Network adaptor Memory client
System Consideration • Accelerate the cryptographic processing – • High performance crypto-engine Reduce the host processor intervention between packet transfer and key setups – Descriptor-based DMA interface
Programmer’s Model • Scenario 1. 2. 3. 4. Host processor queues up any number of packets or key setups Host processor passes a descriptor pointer to the data structure (linked list) containing the packets or key setups to NSP processes all the packets or key setups as specified NSP reports status when finishing the processing via interrupt
NSP Architecture SKEM MACM PKEM RNGM AMBA AHB (Arbiter, Decoder, Muxes) Main Controller I/O (DMA, Sequencer) Interface System Bus Test Power Control Mng. Interface Unit
Design Exploration and Performance Evaluation
Performance Evaluation • Performance Evaluation : – Measuring utilization and throughput of the resources. • Techniques for Performance : – Analytical • • Fast Evaluation Less Accurate Coarse Level of Details, Oversimplification Complex to model at system level with minor level details of each components – Simulation • • Slow Evaluation More Accurate Can Model Finer Level of Details Flexible
Performance Evaluation • Level of Abstraction and Trade-offs – Performance analysis can take place at different level of details depending on the trade-offs that are made between these issues.
Our Approach • A combination of analytical (probabilistic & statistical ) and simulation based technique Course-grained Performance Benchfile (statistical data) Analytical Performance Estimation Toolkit DMA Descriptors Fine-grained Performance Simulator
The System of Security Processor Design • We use System. C as the design language of the simulation kernel
Security Processor Simulation • The security processor module contains the DMA controller, public key algorithm modules, and AES algorithm modules.
Analytical Model Construction • K. Hwang and F. Briggs, Computer architecture and parallel processing, Mc. Graw-Hill, 1984. • Daya Atapattu and Dennis Gannon, Building analytical models into an interactive performance prediction tool, Proceedings of ACM Supercomputing 89, November 1989, pp. 512– 530. • Francois Bodin, Daniel Windheiser, William Jalby, Daya Atapattu, Mannho Lee, and Dennis Gannon, Performance evaluation and prediction for parallel algorithms on the bbn gp 1000, Proceedings of the 4 th ACM international conference on Supercomputing, June 1990, pp. 401– 403.
Simple Security Processor Model AES 1 n m Internal Bus m AES RSA 1 … Channel n … … System Bus … Channel 1 Internal Bus 1 RSA
From The View of A Channel: Encryption/Decryption Operation Decomposition • Operations cannot overlap in a channel • Flow equilibrium of steady state in all phases • T 1 and T 5 are considered as known factors Channel request T 1 System bus crossing (+memory access) T 2 T 4 T 5 Time Internal bus crossing T 3 Operation Internal bus crossing System bus crossing (+memory access) Channel release
Defined Variables in The Analytical Model • Define R to be the average fraction of the time that a channel is not waiting for the module or bus service to be completed • Let Zk be the fraction of the time spent by each channel waiting for a request to module or bus k • Let Msk be the service rate of module or bus k • Let Pk be the probability that each channel make request to module k • Let Mrk be the requests to module k / static_time • Define static_time to be the total operation cycles outside the crypto processor plus the channel setup latency
Associated Equations in The Analytical Model • Assume that all channels have non-correlated activities • Uniform AES and RSA service rate • Uniform channel assignment and bus assignment … (1) … (2) … (3) … (4)
Comparison with Simulation • • • 133 Mhz 32 bit system bus as well as inner bus 1 innerbus, 10 channels, 2 RSA modules RSA 0. 0317 Mbps/Mhz, AES 7. 06 Mbps/Mhz AES request rate is 3. 2 Gbps, RSA is about %1 of AES Request data during 1000 microseconds
Performance Evaluation by Analytical Toolkit • Measured workloads: 3. 084 Gbps in 10 mins, 0. 95% are RSA • 6 channels, 1 AES module, 5 RSA modules • AES and RSA engine speed initialized as 100 Mhz
Performance Prediction from Analytical Model • The base configuration: 3 AES, 2 Triple-DES, 2 RSA, 1 MD 5 -HMAC, 6 channels, in 66 Mhz • The initial workload: 512 Mbps AES, 256 Mbps Triple-DES, 5. 12 Mbps RSA, 20. 48 Mbps MD 5
Future Work • More realistic network security processor design – DES 3 – MD 5, SHA – Random number generator • More precise analytical model – Parallel operations in a channel – Correlated operations between channels • Use performance prediction to control task scheduling with low-power
- Slides: 21