M M Interleaving Montgomery HighRadix Comparison Improvement Improving

Modular Multiplication Operation M. M. Interleaving In many public-key encryption schemes (e. g. ,

Interleaving Multipl. and reduction M. M. Interleaving Montgomery High-Radix Comparison Improvement In 1983, Blakley:

Montgomery’s Method P’ = 0 for i= 0 to n-1 { M. M. P’

High-Radix Method M. M. Interleaving Montgomery Speedups the modular multiplier by requiring less number

Comparison Between [6] and [18] M. M. Description Koc [6] Montgomery [18] Equation (S,

Comparison Between [6] and [18] M. M. Interleaving Algorithmic Analysis Koc [6] [18]Montgomery High-Radix

Comparison Between [6] and [18] M. M. Interleaving Hardware Analysis Koc [6] [18]Montgomery High-Radix

Improvements on [6] M. M. Interleaving Montgomery High-Radix Comparison Improvement Adders CLA CSK Comparison

Improvements on [6] M. M. Interleaving Montgomery High-Radix Comparison Improvement Parallelism : The correction

Binary Adders M. M. Interleaving The last stage in both algorithms does full-length addition

Carry-Lookahead Adder M. M. Interleaving Montgomery High-Radix Comparison Improvement Adders CLA The total delay

Carry-Skip Adder M. M. Interleaving Montgomery High-Radix Comparison Improvement Adders The carry-skip adder has

CLA versus CSK M. M. Interleaving Montgomery High-Radix Using 32 -bit operands, a multi-level

Conclusion M. M. Interleaving Montgomery This work studied the modular multiplication problem over large

The End M. M. Interleaving Montgomery High-Radix Comparison Improvement Adders Improving Cryptographic Architectures by

Slides: 16

Download presentation

M. M. Interleaving Montgomery High-Radix Comparison Improvement Improving Cryptographic Architectures by Adopting Efficient Adders in their Modular Multiplication Hardware Adders CLA CSK Comparison Conclusion Adnan Gutub, Hassan Tahhan Computer Engineering Department, King Fahd University of Petroleum & Minerals

Modular Multiplication Operation M. M. Interleaving In many public-key encryption schemes (e. g. , RSA, El. Gamal & ECC), Montgomery Modular Multiplication is a basic arithmetic operations heavily used. High-Radix Comparison A B M M. M. C Improvement Adders CLA CSK Comparison Conclusion Modular Multiplication: Secure System C = A * B mod M where A, B < M very large operand size too expensive. Straightforward Method: Multiplication then modulus division.

Interleaving Multipl. and reduction M. M. Interleaving Montgomery High-Radix Comparison Improvement In 1983, Blakley: Pi = 2 Pi-1 + bi A + q M P=0 for i = n-1 to 0 { P=2*P if ( P M ) P = P – M if ( bi = 1 ) { P=P+A if ( P M ) P = P – M } } In the literature, proposals to solve the magnitude comparison problem. Adders CLA CSK Comparison Conclusion Koc’s implementation based on carry-save adders. Partial products are represented as sum-carry pairs. The 5 MSBs of the pair is tested for sign estimation.

Montgomery’s Method P’ = 0 for i= 0 to n-1 { M. M. P’ = P’ + a’i * B’ if ( p’ 0 = 1 ) P’ = P’ + M Interleaving Montgomery High-Radix P’ = P’ / 2 In 1985, Montgomery: Pi = Pi-1 + bi A + q M / 2 } if ( P’ M ) P’ = P’ - M Comparison Improvement No full magnitude comparison is required. The correction step can be easily removed. Adders CLA CSK Comparison Conclusion However, pre and post calculations are needed in order to have the required result. As in the interleaving method, implementations based on carry-save adders are the most effective solutions.

High-Radix Method M. M. Interleaving Montgomery Speedups the modular multiplier by requiring less number of cycles. Area and time will increase. High-Radix Comparison The reduction step will be the crucial operation. As the radix increases, it becomes more complex. Improvement Adders CLA CSK Comparison Conclusion Walter shows that there is a direct trade-off between the required space and the overall computation time. The AT factor is independent of the choice of the radix. The factor is expected to improve for radices that are not much larger than radix-2.

Comparison Between [6] and [18] M. M. Description Koc [6] Montgomery [18] Equation (S, C) = 2 S + 2 C + ai. B + q. M q Є {1, 0 , -1} (S, C) = S + C + ai. B (S, C) = (S + C + s 0 M) / 2 Interleaving Montgomery High-Radix Comparison Improvement Adders CLA CSK Comparison Conclusion Hardware

Comparison Between [6] and [18] M. M. Interleaving Algorithmic Analysis Koc [6] [18]Montgomery High-Radix Comparison Improvement Adders CLA The two’s complement of Transformation of calculations. Pre- the modulus needs to be operands into computed Montgomery’s domain calculations. Inter- n + 3 iterations n + 2 iterations There is a correction step in addition to the calculations. Postfinal summation of the sum-carry pair Summation of the sumcarry pair needs to be transformed back to the ordinary domain CSK Comparison Conclusion Restrictions If M is represented using n bits, then |M| 2 n-1 GCD (M, 2) = 1

Comparison Between [6] and [18] M. M. Interleaving Hardware Analysis Koc [6] [18]Montgomery High-Radix Logic Comparison Improvement Two (n+4)-bit carry save adders plus 5 -bit carry lookahead logic Two n-bit carry save adders Registers 6 5 Synthesis Analysis Koc [6] [18]Montgomery Clock period 6. 468 ns 6. 342 ns Adders CLA CSK Comparison Conclusion

Improvements on [6] M. M. Interleaving Montgomery High-Radix Comparison Improvement Adders CLA CSK Comparison Conclusion Pipelining: Due to data dependency, the pipelining will not improve throughput. However, the pipeline can be used to compute two separate operations simultaneously.

Improvements on [6] M. M. Interleaving Montgomery High-Radix Comparison Improvement Parallelism : The correction step at the end of the algorithm increases the algorithm complexity. At the hardware level, the correction Adders step can be implemented using two options. CLA By computing the two possible CSK Comparison Conclusion results in parallel, time will be saved.

Binary Adders M. M. Interleaving The last stage in both algorithms does full-length addition on Montgomery High-Radix the carry-sum pair which can be performed in hardware through binary adders. Comparison Improvement Adders CLA CSK Comparison Conclusion Statistics showed that 72% of the instructions perform additions in the data path of a prototypical RISC machine. The carry-lookahead adder and the carry-skip adder were compared in terms of time, area and power.

Carry-Lookahead Adder M. M. Interleaving Montgomery High-Radix Comparison Improvement Adders CLA The total delay of the carry-lookahead adder is (log n). There is a CSK penalty paid for this gain: the area increases. The carry-lookahead adders require (n log n) area. Comparison Conclusion

Carry-Skip Adder M. M. Interleaving Montgomery High-Radix Comparison Improvement Adders The carry-skip adder has a simple and regular structure that requires an area in the order of (n) which is hardly larger then CLA the area required by the ripple-carry adder. The time complexity CSK of the carry-skip adder is bounded between (n 12) and (log_n). An equal-block-size one-level carry-skip adder will have a time Comparison Conclusion complexity of (n 12). However, a more optimized multi-level carry -skip adder will have a time complexity of O (log n).

CLA versus CSK M. M. Interleaving Montgomery High-Radix Using 32 -bit operands, a multi-level carry-skip adder was 14 % faster and its power dissipation was 58 % of that of the carry- Comparison lookahead adder. Improvement Adders CLA CSK Comparison Conclusion Using 64 -bit operands, a one-level carry-skip adder was 38% slower and its power consumption is 68 % of the carrylookahead adder.

Conclusion M. M. Interleaving Montgomery This work studied the modular multiplication problem over large operand sizes. Based on a survey, two implementations for modular multiplication algorithms were modeled using VHDL and synthesized. A time-area analysis of both implementations showed that Koc’s High-Radix Comparison Improvement implementation has the potential to be an effective solution in terms of time and hardware requirements. This implementation was improved further. Carry-save adders give the maximum speedup in computing the Adders CLA CSK partial products since. However, full-length addition on the sumcarry pair needs to be carried out at the last iteration through dedicated binary adder. Two binary adders were studied: the CLA and the CSK. Although the two adders can be of a comparable Comparison speed, the CSK requires smaller area and consumes much less Conclusion power than the CLA.

The End M. M. Interleaving Montgomery High-Radix Comparison Improvement Adders Improving Cryptographic Architectures by Adopting Efficient Adders in their Modular Multiplication Hardware Thank you CLA CSK Comparison Conclusion Adnan Gutub, Hassan Tahhan Computer Engineering Department, King Fahd University of Petroleum and Minerals