FPGA Implementation of Whirlpool and FSB Hash Algorithms

FPGA Implementation of Whirlpool and FSB Hash Algorithms 6. 375 Final Presentation Jeff Simpson, Jingwen Ouyang, Kyle Fritz

Outline • Overview • Test Harness • Hash Algorithms – Whirlpool – FSB • Closing Remarks

Outline • Overview • Test Harness • Hash Algorithms – Whirlpool – FSB • Closing Remarks

What is a Hash? • A hash is a fingerprint of sorts – a small key which can be used to identify a larger data set. • Hashes have many uses – Identifying that a data set is correct. – Performing database indexing – Cryptographic functions

SHA-3 Competition • National Institute of Science and Technology (NIST) is holding a competition to write the successor to the SHA-2 hashing algorithm. • Over 50 algorithms have been submitted for consideration. • NIST will make the final decision, but the community is performing analysis and making recommendations.

Project Goals • Implementation of hash algorithms on the Altera DE 2 -70 FPGA – Whirlpool hash – FSB hash (SHA-3 candidate, uses Whirlpool) • The process and results of implementing the SHA-3 candidate algorithm will serve as an analysis of the algorithm.

Outline • Overview • Test Harness • Hash Algorithms – Whirlpool – FSB • Closing Remarks

Test Harness • • Provide a layer of abstraction Simplify memory access Provide FPGA interface Provide simple and fast end-to-end testing

Hash Abstraction • • • Put Length Put Word Get Hash Get Table Lookup Put Table Lookup Response Hash does not need to know anything about memory organization, addressing, or interface Test harness does not need to know anything about the Hash function.

Memory 0400000: 040105 F – NIOS (4 KB) 0410000: 0417 FFF – Input Message (32 KB) 0440000: 0447 FFF – Hash Memory (32 KB) 1000000: 17 FFFFF – Lookup Tables (8 MB, Flash)

On FPGA • Intel HEX file is generated from test-case data for loading FPGA • Altera flash image is generated from lookup table • NIOS signals for the hash to start, then reads the result from memory when the hash has completed.

In Simulation • Verilog VMH file generated from test-case data, AND lookup table. • Hash is commanded to start automatically. • Result is displayed (saved to output log file)

Message Input VMH Format @0002 // Message size in bits (64) @0004 // Data address @0005 // Result address @400000 //Lookup table data (simulation only)

Testing • A suite of test-cases is used for automated testing • Reference hashes are automatically generated and compared to the simulation results. • FPGA results can be automatically compared in the same fashion. • A NIOS-based message generator is used to test message input > 32 KB

Outline • Overview • Test Harness • Hash Algorithms – Whirlpool – FSB • Closing Remarks

Typical Hash Structure Preprocessing Compression F Finalization

Typical Hash Structure Preprocessing Compression F Finalization

Typical Hash Structure Preprocessing Compression 491 daf F Finalization

Typical Hash Structure Preprocessing Compression 3 c 0000 F 491 daf Finalization

Typical Hash Structure Preprocessing Compression 3 c 8 020 0000 F 491 daf Finalization

Typical Hash Structure Preprocessing Compression 3 c 8020 F 46 a 931 ff Finalization

Typical Hash Structure Preprocessing Compression 46 a 931 ff 3 c 8020 F Finalization

Typical Hash Structure Preprocessing Compression F a 903 bd 55 Finalization

Typical Hash Structure Preprocessing Compression F Finalization a 903 bd 55

Outline • Overview • Test Harness • Hash Algorithms – Whirlpool – FSB • Closing Remarks

Whirlpool Introduction • A stand-alone hash function based on a substantially modified Advanced Encryption Standard (AES) • Given a message less than 2256 bits in length, it returns a 512 -bit message digest. • Whirlpool is not a SHA-3 candidate • Will never be patented, free for public use • No Bluespec implementations exist

Whirlpool Preprocessing • Input: A input message being hashed (any size) • Padded input: – A ={ message, 1, 0, 0, 0, …, 0, 0, 0} (512 N + 256 bits) – B =message length (256 bits) – Padded input = {A, B} (512 N + 512 bits) • Output: Split the padded input to small message blocks (512 bits each) Message bits 1 Zeroes Length

Whirlpool Preprocessor Input Words Message Block • Input words are shifted into the message block one bit at a time until any of the following events: – Message block is full: It is sent and a new one is started. – Input word is finished: The next one is loaded. – Message is complete: The block is padded with a 1 and the message length (in bits) before being sent. • Because these events happen independently, the preprocessor does not depend on message size, message block size or input word size. • It requires very little logic, but is rather slow, as it requires 1 cycle per bit, minimally.

Whirlpool Compression • Inputs: – Current hash from previous iteration (8 bit x 64 vector) – Small message blocks (512 bit) • Output: – Intermediate Hash (8 bit x 64 vector) W

Whirlpool Compression • Block Diagram: init process. Buffer finalize – init: • takes in message blocks and resets internal states – process. Buffer: • computes internal state from an internal block cipher – finalize: • new. Hash = current. Hash ^ input message ^ state • new. Hash is sent out as result when there is no more input message blocks

Whirlpool Compression • Internal block cipher in process. Buffer: – Originally uses a randomly generated box, lack internal structure, hard to implement efficiently in hardware – Current version uses S-box, which has nice patterns for hardware implementation

Whirlpool Implementation • Do one branch at a time – Reuse hardware – Save logic – Take longer time • 10 rounds of iteration – Big for-loop takes a lot of logic, and increases critical path – Use counter to break into multiple cycles

Whirlpool Implementation • Use registers with ready bits instead of FIFOs • Put s-box’s lookup table onto SRAM – One table lookup per cycle • Concatenate vectors to avoid multi-layered MUX C 3 C 2 C 1 C 0 C = {c 3, c 2, c 1, c 0} C 3[2] C[15: 12] C 2[1] C 1[3] C 0[0] C[11: 8] C[ 7: 4] C[ 3: 0] C[9]

Whirlpool Finalization • Functionality: – Unwrap the intermediate hash from its vector form to a bit string as final output (8 bit x 64 vector => 512 bit string) • No separate finalization module – Done at the end of the compression module

Whirlpool Result • Successfully simulated and verified in Bluespec compiler • Successfully put onto FPGA and verified • Noticeable trade-offs between speed and area – We choose area over speed

Outline • Overview • Test Harness • Hash Algorithms – Whirlpool – FSB • Closing Remarks

Fast Syndrome-Based hash function • FSB is a family of hash functions submitted to the SHA-3 competition. • Maintains a large internal state. • Requires a large lookup table. • Simple design, simple operations. • Proof of reduction to known hard problems. • Authors are French.

FSB Preprocessing • • • Message blocks of 1240 bits. Filled first with bits from message. After last message input, single bit appended. Padded with zeroes. Last 64 bits contain message length in bits. Message bits 1 Zeroes Length

FSB Compression 8 bits 1101… 1240 bits 21 bits 5 bits Simple Math with Constants % 10110… 1987 bits / Memory 1984 bits 01001… π 1987 bits x 1024 >>

FSB Compression • • Implementation follows specification closely. Single cycle division and modulo component. Multiple cycle shifter. Memory interface for loading pi vectors.

FSB Finalization 1984 bits Whirlpool 512 bits • Breaks up 1984 bits into a stream of 32 bit input words for Whirlpool.

Closing Remarks • FSB is not ideal for hardware. – Large lookup table. – Large internal state. – Simple operations on large values. • Generalized code can be reused for other hash functions.
- Slides: 42