Performance evaluation of network security processor architectures combining

Performance evaluation of network security processor architectures: combining simulation with analytical estimation Yung Chia Lin, Chung Wen Huang, Jenq Kuen Lee, Ting Huang National Tsing Hua University, Taiwan

Network Security Processor Spec. • Applications: IPSEC, SSL, VPN, (3 G, WLAN), etc. • Functionalities: – – • • Public key Cryptography: RSA, ECC, DSA Private key Cryptography: AES, (DES, RC 4, Kasumi) Truly random number generator Message Authentication: SHA-1, MD 5 Target technology: 0. 25�m to 0. 18�m Clock rate: 200 MHz or higher (internal) 32 -bit data and instruction word 10 Gbps (OC 192) Power: 1 to 10 m. W/MHz at 3 V (LP to HP) Die size: 50 mm 2 On-chip bus: AMBA

System Architecture • Stand for a coprocessor of packet processor (network processor) server ARM NSP SSL (web browser), SSH (ftp, telnet) System bus Network adaptor Memory client

System Consideration • Accelerate the cryptographic processing – • High performance crypto-engine Reduce the host processor intervention between packet transfer and key setups – Descriptor-based DMA interface

Programmer’s Model • Scenario 1. 2. 3. 4. Host processor queues up any number of packets or key setups Host processor passes a descriptor pointer to the data structure (linked list) containing the packets or key setups to NSP processes all the packets or key setups as specified NSP reports status when finishing the processing via interrupt

NSP Architecture SKEM MACM PKEM RNGM AMBA AHB (Arbiter, Decoder, Muxes) Main Controller I/O (DMA, Sequencer) Interface System Bus Test Power Control Mng. Interface Unit

Design Exploration and Performance Evaluation

Performance Evaluation • Performance Evaluation : – Measuring utilization and throughput of the resources. • Techniques for Performance : – Analytical • • Fast Evaluation Less Accurate Coarse Level of Details, Oversimplification Complex to model at system level with minor level details of each components – Simulation • • Slow Evaluation More Accurate Can Model Finer Level of Details Flexible

Performance Evaluation • Level of Abstraction and Trade-offs – Performance analysis can take place at different level of details depending on the trade-offs that are made between these issues.

Our Approach • A combination of analytical (probabilistic & statistical ) and simulation based technique Course-grained Performance Benchfile (statistical data) Analytical Performance Estimation Toolkit DMA Descriptors Fine-grained Performance Simulator

The System of Security Processor Design • We use System. C as the design language of the simulation kernel

Security Processor Simulation • The security processor module contains the DMA controller, public key algorithm modules, and AES algorithm modules.

Analytical Model Construction • K. Hwang and F. Briggs, Computer architecture and parallel processing, Mc. Graw-Hill, 1984. • Daya Atapattu and Dennis Gannon, Building analytical models into an interactive performance prediction tool, Proceedings of ACM Supercomputing 89, November 1989, pp. 512– 530. • Francois Bodin, Daniel Windheiser, William Jalby, Daya Atapattu, Mannho Lee, and Dennis Gannon, Performance evaluation and prediction for parallel algorithms on the bbn gp 1000, Proceedings of the 4 th ACM international conference on Supercomputing, June 1990, pp. 401– 403.

Simple Security Processor Model AES 1 n m Internal Bus m AES RSA 1 … Channel n … … System Bus … Channel 1 Internal Bus 1 RSA

From The View of A Channel: Encryption/Decryption Operation Decomposition • Operations cannot overlap in a channel • Flow equilibrium of steady state in all phases • T 1 and T 5 are considered as known factors Channel request T 1 System bus crossing (+memory access) T 2 T 4 T 5 Time Internal bus crossing T 3 Operation Internal bus crossing System bus crossing (+memory access) Channel release

Defined Variables in The Analytical Model • Define R to be the average fraction of the time that a channel is not waiting for the module or bus service to be completed • Let Zk be the fraction of the time spent by each channel waiting for a request to module or bus k • Let Msk be the service rate of module or bus k • Let Pk be the probability that each channel make request to module k • Let Mrk be the requests to module k / static_time • Define static_time to be the total operation cycles outside the crypto processor plus the channel setup latency

Associated Equations in The Analytical Model • Assume that all channels have non-correlated activities • Uniform AES and RSA service rate • Uniform channel assignment and bus assignment … (1) … (2) … (3) … (4)

Comparison with Simulation • • • 133 Mhz 32 bit system bus as well as inner bus 1 innerbus, 10 channels, 2 RSA modules RSA 0. 0317 Mbps/Mhz, AES 7. 06 Mbps/Mhz AES request rate is 3. 2 Gbps, RSA is about %1 of AES Request data during 1000 microseconds

Performance Evaluation by Analytical Toolkit • Measured workloads: 3. 084 Gbps in 10 mins, 0. 95% are RSA • 6 channels, 1 AES module, 5 RSA modules • AES and RSA engine speed initialized as 100 Mhz

Performance Prediction from Analytical Model • The base configuration: 3 AES, 2 Triple-DES, 2 RSA, 1 MD 5 -HMAC, 6 channels, in 66 Mhz • The initial workload: 512 Mbps AES, 256 Mbps Triple-DES, 5. 12 Mbps RSA, 20. 48 Mbps MD 5

Future Work • More realistic network security processor design – DES 3 – MD 5, SHA – Random number generator • More precise analytical model – Parallel operations in a channel – Correlated operations between channels • Use performance prediction to control task scheduling with low-power
- Slides: 21