FPGA Super Hash Processor Design
General Overview and Theory
The goal of this project is to implement a FPGA core that can compute the SHA1, SHA256, and MD5 hash result of any given message. This design is implemented and tested for an Intel Aria II FPGA using Quartus Prime and Modalism.
What is a hash and why do we need them?
Any function that takes arbitrary data of any size as an input and outputs new data of a fixed size can be considered a hash function and its results are considered hashes. Hashes can be used to verify the integrity of data, efficient data look ups, cryptographic algorithms, and the handling of large data sets.
Overview of the Hashing Algorithms
The SHA1, SHA256, and MD5 hash algorithms are three of the most popular hash functions in use today. Although the SHA1 and MD5 algorithms are older and can exploited these days, they are still used in many legacy and non-security critical systems. The SHA256 algorithm is a more modern and secure and thus it is used in many systems that require security such as the Bitcoin blockchain.
Although the computation of these three algorithms are different, they all require the message to be preprocessed to create a input using the guidelines below:
1. The processed input must be an integer multiple of 512 bits (512, 1024, 2048, etc.)
2. The last 64 bits of proceed input must be equal to the size of the message (64-bit binary representation)
3. After the message there is a mandatory padding bit of value 1 plus padding bits of all zeros to make the input a multiple of 512 bits. If the message + padding bit of 1 + size is already a multiple of 512 bits, no zeros are needed.
Examples:
1. If the given message is 204 bits long, the processed input will look as follows:
Input[0:203] = message
Input[204] = 1
Input [205:447] = 0
Input[448:511] = size of message in a 64-bit binary representation
2. If the given message is 564 bits long, the processed input will look as follows:
Input[0:563] = message
Input[564] = 1
Input[565:958] = 0
Input[959:1023] = size of message in a 64-bit binary representation
After the message is processed, it is split into individual 512-bit blocks. Then each block sequentially goes through rounds of computation which gives our result, also known as the digest.
Block Sizes [bits] | Digest Size [bits] | Computation Rounds Per Block | |
MD5 | 512 | 128 | 64 |
SHA1 | 512 | 160 | 80 |
SHA256 | 512 | 256 | 64 |
Basic Core and Testbench Layout
The Super Hash processor module takes the following inputs and outputs:
INPUTS:
Opcode: The desired hash operation (MD5, SHA256, SHA 1)
Message Address: Location of the message in memory
Size: Size of the message
Output Address: Location where the digest must be written in memory
Memory Read Data: Data read from memory
Clock: System clock for module
Reset: When set low, module is reset
OUTPUTS:
Memory Clock: Clock for memory (synced with system clock)
Memory Write Enable: Enable pin to make memory writeable
Done: Output pin which is set high when module has computed digest
Implementation:
This project is implemented using a state machine design with the following eight states.
STATE 1: SETUP
This is the initial starting state and the state the machine resets to if the reset pin is set low. In this state we initialize our hash constants (h1, h2,…) and fill the first processed block with zeros before it is filled with its actual values. It’s important to do this now because each 512 bit block is processed sequentially and individually. The block is represented by 16 32bit registers ( w[0], w[1], …. w[15]) which total 512 bits. By always initially filling the 16 registers with zeros, we can just re write the registers we need and save the hassle of figuring out how many zeros are needed for padding purposes. Once this is done we switch over the IIDLE state.
STATE 2: IDLE
Here we determine set the current_block variable to the number 512 bit blocks we will need given the size of the message. We also set the registers variable to the total number of 32 bit words our message uses, set our counter variable to zero, set our limit variable to the number of registers we will need to read for this current block (max of 16), and set the mem_addr output pin to the address of the first word in the message. Once this is all done we move to the reading state.
STATE 3 READ:
Here we simply keep reading words from memory, convert them to big Endian, and store them into our w[] registers. Each time the counter and mem_addr are incremented. Once the counter is greater than the limit, we reset the counter, update the registers variable with the remaining number of registers, and move to the PAD state.
STATE 4 PAD:
Here we go through a set of conditional statements which determine if and where padding is required. Since we initially fill the block with 0s, all we are concerned about is the “1 bit” of padding and the addition of the “size pad” to the last 64 bits of the final block. Finally we update the limit variable for the next block (we do this now and not in the previous state because it depends on the registers variable and that was set the previous cycle) and set our hash function variables (a,b,c,d…) before moving to the HASH state.
STATE 5 HASH:
This is the state where all the number crunching for the hash function is done. Once the computation is done, hashing variables are updates, and we move to the POST_HASH sate.
STATE 6 POST_HASH:
In this state we update the current_block variable, set our counter variable to zero, and set our block to all zeros again. We then determine the next stage of action. If there are more words to be read, we point the mem_addr output pin to the correct place and switch to the READ state. If there are no more words to be read but we require a block of padding we move to the PAD state. If all the blocks have been hashed, we move to the OUTPUT state.
STATE 7 OUTPUT:
Once the hash results have all been computed we begin outputting the result by setting the mem_addr pin the output address and writing our results. When finished we move to the DONE state.
STATE 8 DONE:
The done pin is set high and we reset back to the SETUP state.
Results
MD5 ONLY | SHA1 ONLY | SHA256 ONLY | ALL THREE | |
FMAX [Mhz] | 98.3 | 147.36 | 114.56 | 95.74 |
ALUTs | 2211 | 2307 | 2406 | 3962 |
LOGIC REGISTERS | 954 | 1044 | 1208 | 1236 |
AREA | 3165 | 3351 | 3614 | 5198 |
CYCLES (505 byte message) | 754 | 899 | 758 | 2411 |
DELAY [microseconds] | 7.6703967 | 6.10070575 | 6.616620112 | 25.18279 |
AREA*DELAY [miliseconds * area] | 24.276806 | 20.443465 | 23.91246508 | 130.9001 |
The results above were achieved using the "performance high effort" compiler option on a Windows 10 64-Bit machine running Quartus Prime Lite Edition v18. Overall the system achieved a good Fmax and the area was quite low making this a compact design. However there is still much more room for improvement. The largest gains can be realized by breaking down the computation into smaller functions and implementing pipelining to eliminate superficial data depencides. The second version of this core will focus on those two things.
System Verilog Code:
module super_hash_processor(input logic clk, reset_n, start,
input logic [1:0] opcode,
input logic [31:0] message_addr, size, output_addr,
output logic done, mem_clk, mem_we,
output logic [15:0] mem_addr,
output logic [31:0] mem_write_data,
input logic [31:0] mem_read_data);
enum logic [3:0] {SETUP=4'b0000, IDLE=4'b0001, READ=4'b0010, PAD=4'b0011, HASH=4'b0100, POST_HASH=4'b0101, OUTPUT=4'b0110, DONE=4'b1110} state;
logic [31:0] counter;
//logic [31:0] temp_address; //uncomment if doing more than one hash
logic [31:0] current_block;
logic [31:0] limit;
logic [31:0] registers;
logic [31:0] w[0:15];
logic [31:0] a, b, c, d, e, f, fsha1, g, h, k, temp;
logic [31:0] h0, h1, h2, h3, h4, h5, h6, h7;
logic [31:0] a1, b1, c1, d1, e1;
assign mem_clk = clk;
// right rotation
function logic [31:0] rightrotate(input logic [31:0] x,
input logic [7:0] r);
begin
rightrotate = (x >> r) | (x << (32-r));
end
endfunction
// SHA256 K constants
parameter int sha256_k[0:63] = '{
32'h428a2f98, 32'h71374491, 32'hb5c0fbcf, 32'he9b5dba5, 32'h3956c25b, 32'h59f111f1, 32'h923f82a4, 32'hab1c5ed5,
32'hd807aa98, 32'h12835b01, 32'h243185be, 32'h550c7dc3, 32'h72be5d74, 32'h80deb1fe, 32'h9bdc06a7, 32'hc19bf174,
32'he49b69c1, 32'hefbe4786, 32'h0fc19dc6, 32'h240ca1cc, 32'h2de92c6f, 32'h4a7484aa, 32'h5cb0a9dc, 32'h76f988da,
32'h983e5152, 32'ha831c66d, 32'hb00327c8, 32'hbf597fc7, 32'hc6e00bf3, 32'hd5a79147, 32'h06ca6351, 32'h14292967,
32'h27b70a85, 32'h2e1b2138, 32'h4d2c6dfc, 32'h53380d13, 32'h650a7354, 32'h766a0abb, 32'h81c2c92e, 32'h92722c85,
32'ha2bfe8a1, 32'ha81a664b, 32'hc24b8b70, 32'hc76c51a3, 32'hd192e819, 32'hd6990624, 32'hf40e3585, 32'h106aa070,
32'h19a4c116, 32'h1e376c08, 32'h2748774c, 32'h34b0bcb5, 32'h391c0cb3, 32'h4ed8aa4a, 32'h5b9cca4f, 32'h682e6ff3,
32'h748f82ee, 32'h78a5636f, 32'h84c87814, 32'h8cc70208, 32'h90befffa, 32'ha4506ceb, 32'hbef9a3f7, 32'hc67178f2
};
// SHA256 hash round
function logic [255:0] sha256_op(input logic [31:0] a, b, c, d, e, f, g, h, w,
input logic [7:0] t);
logic [31:0] S1, S0, ch, maj, t1, t2; // internal signals
begin
S1 = rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25);
ch = (e & f) ^ ((~e) & g);
t1 = h + S1 + ch + sha256_k[t] + w;
S0 = rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22);
maj = (a & b) ^ (a & c) ^ (b & c);
t2 = S0 + maj;
sha256_op = {t1 + t2, a, b, c, d + t1, e, f, g};
end
endfunction
// MD5 S constants
parameter byte S[0:63] = '{
8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22,
8'd5, 8'd9, 8'd14, 8'd20, 8'd5, 8'd9, 8'd14, 8'd20, 8'd5, 8'd9, 8'd14, 8'd20, 8'd5, 8'd9, 8'd14, 8'd20,
8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23,
8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21
};
// MD5 K constants
parameter int md5_k[0:63] = '{
32'hd76aa478, 32'he8c7b756, 32'h242070db, 32'hc1bdceee,
32'hf57c0faf, 32'h4787c62a, 32'ha8304613, 32'hfd469501,
32'h698098d8, 32'h8b44f7af, 32'hffff5bb1, 32'h895cd7be,
32'h6b901122, 32'hfd987193, 32'ha679438e, 32'h49b40821,
32'hf61e2562, 32'hc040b340, 32'h265e5a51, 32'he9b6c7aa,
32'hd62f105d, 32'h02441453, 32'hd8a1e681, 32'he7d3fbc8,
32'h21e1cde6, 32'hc33707d6, 32'hf4d50d87, 32'h455a14ed,
32'ha9e3e905, 32'hfcefa3f8, 32'h676f02d9, 32'h8d2a4c8a,
32'hfffa3942, 32'h8771f681, 32'h6d9d6122, 32'hfde5380c,
32'ha4beea44, 32'h4bdecfa9, 32'hf6bb4b60, 32'hbebfbc70,
32'h289b7ec6, 32'heaa127fa, 32'hd4ef3085, 32'h04881d05,
32'hd9d4d039, 32'he6db99e5, 32'h1fa27cf8, 32'hc4ac5665,
32'hf4292244, 32'h432aff97, 32'hab9423a7, 32'hfc93a039,
32'h655b59c3, 32'h8f0ccc92, 32'hffeff47d, 32'h85845dd1,
32'h6fa87e4f, 32'hfe2ce6e0, 32'ha3014314, 32'h4e0811a1,
32'hf7537e82, 32'hbd3af235, 32'h2ad7d2bb, 32'heb86d391
};
// MD5 g
function logic[3:0] md5_g(input logic [7:0] t);
begin
if (t <= 15)
md5_g = t;
else if (t <= 31)
md5_g = (5*t + 1) % 16;
else if (t <= 47)
md5_g = (3*t + 5) % 16;
else
md5_g = (7*t) % 16;
end
endfunction
// MD5 f
function logic[31:0] md5_f(input logic [7:0] t);
begin
if (t <= 15)
md5_f = (b & c) | ((~b) & d);
else if (t <= 31)
md5_f = (d & b) | ((~d) & c);
else if (t <= 47)
md5_f = b ^ c ^ d;
else
md5_f = c ^ (b | (~d));
end
endfunction
// MD5 hash round
function logic[127:0] md5_op(input logic [31:0] a, b, c, d, w,
input logic [7:0] t);
logic [31:0] t1, t2; // internal signals
begin
//debug
/*
$display("w == %x\n",w);
$display("--------------------------\n");
*/
t1 = a + md5_f(t) + md5_k[t] + w;
t2 = b + ((t1 << S[t])|(t1 >> (32-S[t])));
md5_op = {d, t2, b, c};
end
endfunction
//sha1 hash
function logic [159:0] hash_op(input logic [31:0] a, b, c, d, e, w, input logic [31:0] t);
if(t<=19)begin
k = 32'h5A827999;
fsha1 = (b & c) | ( (~b) & d);
end else
if(t <=39)begin
k = 32'h6ED9EBA1;
fsha1 = b ^ c ^ d;
end else
if(t<=59)begin
k = 32'h8F1BBCDC;
fsha1 = (b & c) | (b & d) | (c & d);
end else
begin
k = 32'hCA62C1D6;
fsha1 = b ^ c ^ d;
end
//debug
/*
$display("a == %x\n",a);
$display("b == %x\n",b);
$display("c == %x\n",c);
$display("d == %x\n",d);
$display("e == %x\n",e);
$display("f == %x\n",f);
$display("w == %x\n",w);
*/
temp = {a[26:0],a[31:27]} + fsha1 + e + k + w;
//debug
/*
$display("temp == %x\n",temp);
$display("-------------------------------");
*/
e = d;
d = c;
c = {b[1:0],b[31:2]};
b = a;
a = temp;
hash_op = {a, b, c, d, e};
endfunction
// convert from little-endian to big-endian
function logic [31:0] changeEndian(input logic [31:0] value);
changeEndian = {value[7:0], value[15:8], value[23:16], value[31:24]};
endfunction
//appending pading "1"
function logic [31:0] magic(input logic [31:0] value);
begin
if(size%4 == 1)begin
magic = ((value & 32'hFF000000) | 32'h00800000);
end
if(size%4 == 2) begin
magic = ((value & 32'hFFFF0000) | 32'h00008000);
end
if(size%4 == 3) begin
magic = ((value & 32'hFFFFFF00) | 32'h00000080);
end
end
endfunction
//determine number of blocks
function logic [31:0] determine_num_blocks(input logic [31:0] size);
determine_num_blocks = ((((size)+8)/64)+1);
endfunction
always_ff @(posedge clk, negedge reset_n)
begin
if (!reset_n) begin
state <= SETUP;
end else
casex (state)
SETUP: begin
w[0]<=0; //set block to zeros to avoid padding 0s
w[1]<=0;
w[2]<=0;
w[3]<=0;
w[4]<=0;
w[5]<=0;
w[6]<=0;
w[7]<=0;
w[8]<=0;
w[9]<=0;
w[10]<=0;
w[11]<=0;
w[12]<=0;
w[13]<=0;
w[14]<=0;
w[15]<=0;
if(opcode == 2'b01 || opcode == 2'b00) begin // set constants depending on operation
h0 <= 32'h67452301;
h1 <= 32'hEFCDAB89;
h2 <= 32'h98BADCFE;
h3 <= 32'h10325476;
h4 <= 32'hC3D2E1F0;
end
if(opcode == 2'b10) begin
h0 <= 32'h6a09e667;
h1 <= 32'hbb67ae85;
h2 <= 32'h3c6ef372;
h3 <= 32'ha54ff53a;
h4 <= 32'h510e527f;
h5 <= 32'h9b05688c;
h6 <= 32'h1f83d9ab;
h7 <= 32'h5be0cd19;
end
//temp_addr <= output_addr; //uncomment if doing more than one hash
state <= IDLE;
done <= 0;
end
IDLE: // start
if(start) begin // READ first word
current_block<=determine_num_blocks(size);
registers <= size/4 + size[0];
if(size >= 64) begin //if message is more than 512 bits
limit <= 16;
end
if (size < 64) begin //if message is less than 512 bits
limit <= size/4 + size[0];
end
counter <= 0;
state <= READ;
mem_addr <= message_addr - 1; //account for reading dealys
end
READ: begin
mem_addr <= mem_addr + 1;
w[counter-2] <= changeEndian(mem_read_data); //-2 to account for delays
counter <= counter + 1;
state <= READ;
if (counter>limit)begin
counter <= 0;
registers <= registers - limit;
state <= PAD;
end
end
PAD: begin
if(current_block == 1)begin
w[15] <= (size << 3); //append size
if(size % 4 == 0)begin
w[0] <= 32'h80000000; //full block of padding
end
if(limit < 16) begin
if(size % 4 > 0)begin //if unfinished word need to magic
w[limit-1] <= magic(w[limit-1]);
end
if(size % 4 == 0)begin //if finished word need to make w[limit] = 1
w[limit] <= 32'h80000000;
end
end
end
if(current_block == 2 && size % 4 > 0 && registers < 1)begin //unfinished last word
w[limit-1] <= magic(w[limit-1]);
end
if(registers<16)begin //set limit for next block
limit <= registers;
end
else begin
limit <= 16;
end
a <= h0; //prep hash constants
b <= h1;
c <= h2;
d <= h3;
e <= h4;
f <= h5;
g <= h6;
h <= h7;
state <= HASH;
end
HASH: begin
if(opcode == 2'b00)begin //md5
if(counter < 64)begin
if(counter < 16)begin
{a, b, c, d} <= md5_op(a, b, c, d, w[counter], counter);
end
end
if(counter > 15)begin
{a, b, c, d} <= md5_op(a, b, c, d, w[md5_g(counter)], counter);
end
counter <= counter + 1;
if(counter==64)begin
h0 <= h0 + a;
h1 <= h1 + b;
h2 <= h2 + c;
h3 <= h3 + d;
state <= POST_HASH;
end
end
if(opcode == 2'b01)begin //sha1
if(counter < 80)begin
if(counter < 16)begin
//debug
//$display("t == %d\n",t);
{a,b,c,d,e} <= hash_op(a, b, c, d, e, w[counter], counter);
counter <= counter + 1;
end
if( counter >= 16)begin
//debug
// $display("t == %d\n",t);
{a,b,c,d,e} <= hash_op(a, b, c, d, e, ((w[counter-a1]^w[counter-b1]^w[counter-c1]^w[counter-d1])<<1 |
(w[counter-a1]^w[counter-b1]^w[counter-c1]^w[counter-d1])>>31), counter);
w[counter-e1]<= ((w[counter-a1]^w[counter-b1]^w[counter-c1]^w[counter-d1])<<1 |
(w[counter-a1]^w[counter-b1]^w[counter-c1]^w[counter-d1])>>31);
counter <= counter + 1;
if((((counter+1)-3)%16)==0)begin
a1 <= a1 + 16;
end
if((((counter+1)-8)%16)==0)begin
b1 <= b1 + 16;
end
if((((counter+1)-14)%16)==0)begin
c1 <= c1 + 16;
end
if(((counter+1)%16)==0)begin
d1 <= d1 + 16;
e1 <= e1 + 16;
end
end
end
if (counter == 80)begin
h0 <= h0 + a;
h1 <= h1 + b;
h2 <= h2 + c;
h3 <= h3 + d;
h4 <= h4 + e;
a1 <= 3;
b1 <= 8;
c1 <= 14;
d1 <= 16;
e1 <= 16;
state <= POST_HASH;
end
end
if(opcode == 2'b10) begin //sha256
if(counter < 64)begin
if(counter < 16)begin
{a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w[counter], counter);
end
if(counter> 14 )begin
w[15] <= w[0] + (rightrotate(w[1], 7) ^ rightrotate(w[1], 18) ^ (w[1] >> 3)) +
w[9] + (rightrotate(w[14], 17) ^ rightrotate(w[14], 19) ^ (w[14] >> 10));
for (int i=0; i<15; i++)
w[i] <= w[i+1];
end
if(counter > 15)begin
{a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w[15], counter);
end
counter <= counter + 1;
end
if (counter == 64)begin
h0 <= h0 + a;
h1 <= h1 + b;
h2 <= h2 + c;
h3 <= h3 + d;
h4 <= h4 + e;
h5 <= h5 + f;
h6 <= h6 + g;
h7 <= h7 + h;
state <= POST_HASH;
end
end
end
POST_HASH:begin
w[0] <= 0; //set block to zeros again
w[1] <= 0;
w[2] <= 0;
w[3] <= 0;
w[4] <= 0;
w[5] <= 0;
w[6] <= 0;
w[7] <= 0;
w[8] <= 0;
w[9] <= 0;
w[10] <= 0;
w[11] <= 0;
w[12] <= 0;
w[13] <= 0;
w[14] <= 0;
w[15] <= 0;
counter <= 0;
current_block <= current_block - 1; //decrement block counter
if(limit > 0)begin
state <= READ;
mem_addr <= mem_addr - 2;
end
if(limit < 1 )begin
state <= PAD;
end
if(current_block == 1)begin
state <= OUTPUT;
//mem_addr <= temp_addr //uncomment if doing more than one hash
end
end
OUTPUT:begin
mem_we <= 1;
mem_addr <= output_addr; //comment out if doing more than one hash
if(counter == 0)begin
mem_write_data <= h0;
end
if(counter == 1)begin
mem_write_data <= h1;
end
if(counter == 2)begin
mem_write_data <= h2;
end
if(counter == 3)begin
mem_write_data<=h3;
if (opcode == 2'b00) begin
state <= DONE;
end
end
if(counter==4)begin
mem_write_data <= h4;
if (opcode == 2'b01)begin
state <= DONE;
end
end
if(counter == 5)begin
mem_write_data <= h5;
end
if(counter == 6)begin
mem_write_data <= h6;
end
if(counter == 7)begin
mem_write_data <= h7;
if (opcode == 2'b10)begin
state <= DONE;
end
end
mem_addr <= output_addr + counter;
counter <= counter + 1;
//temp_address <= output_addr + counter; //uncomment if doing more than one hash
end
DONE: begin
/* //debug
$display("-------- FINAL RESULT --------");
$display("h0== %x\n",h0);
$display("h1 == %x\n",h1);
$display("h2 == %x\n",h2);
$display("h3 == %x\n",h3);
$display("h4 == %x\n",h4);
*/
done <= 1;
state <= SETUP;
end
endcase
end
endmodule
module tb_super_hash_processor();
logic clk, reset_n, start;
logic [ 1:0] opcode;
logic [ 31:0] message_addr, size, output_addr;
logic done, mem_clk, mem_we;
logic [ 15:0] mem_addr;
logic [ 31:0] mem_write_data;
logic [ 31:0] mem_read_data;
logic [127:0] md5_hash; // results here
logic [159:0] sha1_hash; // results here
logic [255:0] sha256_hash; // results here
logic [ 31:0] dpsram[0:16383]; // each row has 32 bits
logic [ 31:0] dpsram_tb[0:16383]; // for result testing, testbench only
logic [ 31:0] message_seed; // modify message_seed below
int message_size = 505; // in bytes
int pad_length;
int t, m;
int outloop;
int cycles;
int total_cycles;
int rounds;
logic correct;
logic [127:0] md5_digest;
logic [159:0] sha1_digest;
logic [255:0] sha256_digest;
logic [ 31:0] h0;
logic [ 31:0] h1;
logic [ 31:0] h2;
logic [ 31:0] h3;
logic [ 31:0] h4;
logic [ 31:0] h5;
logic [ 31:0] h6;
logic [ 31:0] h7;
logic [ 31:0] a, b, c, d, e, f, g, h;
logic [ 31:0] s1, s0;
logic [ 31:0] w[0:79];
// instantiate your design
super_hash_processor super_hash_processor_inst (clk, reset_n, start, opcode, message_addr, size, output_addr, done,
mem_clk, mem_we, mem_addr, mem_write_data, mem_read_data);
parameter string hnames[0:2] = {"MD5", "SHA1", "SHA256"};
// ---------------------------------------------------------------------------------------
// MD5 S constants
parameter byte S[0:63] = '{
8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22,
8'd5, 8'd9, 8'd14, 8'd20, 8'd5, 8'd9, 8'd14, 8'd20, 8'd5, 8'd9, 8'd14, 8'd20, 8'd5, 8'd9, 8'd14, 8'd20,
8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23,
8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21
};
// MD5 K constants
parameter int md5_k[0:63] = '{
32'hd76aa478, 32'he8c7b756, 32'h242070db, 32'hc1bdceee,
32'hf57c0faf, 32'h4787c62a, 32'ha8304613, 32'hfd469501,
32'h698098d8, 32'h8b44f7af, 32'hffff5bb1, 32'h895cd7be,
32'h6b901122, 32'hfd987193, 32'ha679438e, 32'h49b40821,
32'hf61e2562, 32'hc040b340, 32'h265e5a51, 32'he9b6c7aa,
32'hd62f105d, 32'h02441453, 32'hd8a1e681, 32'he7d3fbc8,
32'h21e1cde6, 32'hc33707d6, 32'hf4d50d87, 32'h455a14ed,
32'ha9e3e905, 32'hfcefa3f8, 32'h676f02d9, 32'h8d2a4c8a,
32'hfffa3942, 32'h8771f681, 32'h6d9d6122, 32'hfde5380c,
32'ha4beea44, 32'h4bdecfa9, 32'hf6bb4b60, 32'hbebfbc70,
32'h289b7ec6, 32'heaa127fa, 32'hd4ef3085, 32'h04881d05,
32'hd9d4d039, 32'he6db99e5, 32'h1fa27cf8, 32'hc4ac5665,
32'hf4292244, 32'h432aff97, 32'hab9423a7, 32'hfc93a039,
32'h655b59c3, 32'h8f0ccc92, 32'hffeff47d, 32'h85845dd1,
32'h6fa87e4f, 32'hfe2ce6e0, 32'ha3014314, 32'h4e0811a1,
32'hf7537e82, 32'hbd3af235, 32'h2ad7d2bb, 32'heb86d391
};
// MD5 g
function logic[3:0] md5_g(input logic [7:0] t);
begin
if (t <= 15)
md5_g = t;
else if (t <= 31)
md5_g = (5*t + 1) % 16;
else if (t <= 47)
md5_g = (3*t + 5) % 16;
else
md5_g = (7*t) % 16;
end
endfunction
// MD5 f
function logic[31:0] md5_f(input logic [7:0] t);
begin
if (t <= 15)
md5_f = (b & c) | ((~b) & d);
else if (t <= 31)
md5_f = (d & b) | ((~d) & c);
else if (t <= 47)
md5_f = b ^ c ^ d;
else
md5_f = c ^ (b | (~d));
end
endfunction
// MD5 hash round
function logic[127:0] md5_op(input logic [31:0] a, b, c, d, w,
input logic [7:0] t);
logic [31:0] t1, t2; // internal signals
begin
t1 = a + md5_f(t) + md5_k[t] + w;
t2 = b + ((t1 << S[t])|(t1 >> (32-S[t])));
md5_op = {d, t2, b, c};
end
endfunction
// ---------------------------------------------------------------------------------------
// SHA1 f
function logic [31:0] sha1_f(input logic [7:0] t);
begin
if (t <= 19)
sha1_f = (b & c) | ((~b) & d);
else if (t <= 39)
sha1_f = b ^ c ^ d;
else if (t <= 59)
sha1_f = (b & c) | (b & d) | (c & d);
else
sha1_f = b ^ c ^ d;
end
endfunction
// SHA1 k
function logic [31:0] sha1_k(input logic [7:0] t);
begin
if (t <= 19)
sha1_k = 32'h5a827999;
else if (t <= 39)
sha1_k = 32'h6ed9eba1;
else if (t <= 59)
sha1_k = 32'h8f1bbcdc;
else
sha1_k = 32'hca62c1d6;
end
endfunction
// SHA1 hash round
function logic [159:0] sha1_op(input logic [31:0] a, b, c, d, e, w,
input logic [7:0] t);
logic [31:0] temp, tc; // internal signals
begin
temp = ((a << 5)|(a >> 27)) + sha1_f(t) + e + sha1_k(t) + w;
tc = ((b << 30)|(b >> 2));
sha1_op = {temp, a, tc, c, d};
end
endfunction
// ---------------------------------------------------------------------------------------
// SHA256 K constants
parameter int sha256_k[0:63] = '{
32'h428a2f98, 32'h71374491, 32'hb5c0fbcf, 32'he9b5dba5, 32'h3956c25b, 32'h59f111f1, 32'h923f82a4, 32'hab1c5ed5,
32'hd807aa98, 32'h12835b01, 32'h243185be, 32'h550c7dc3, 32'h72be5d74, 32'h80deb1fe, 32'h9bdc06a7, 32'hc19bf174,
32'he49b69c1, 32'hefbe4786, 32'h0fc19dc6, 32'h240ca1cc, 32'h2de92c6f, 32'h4a7484aa, 32'h5cb0a9dc, 32'h76f988da,
32'h983e5152, 32'ha831c66d, 32'hb00327c8, 32'hbf597fc7, 32'hc6e00bf3, 32'hd5a79147, 32'h06ca6351, 32'h14292967,
32'h27b70a85, 32'h2e1b2138, 32'h4d2c6dfc, 32'h53380d13, 32'h650a7354, 32'h766a0abb, 32'h81c2c92e, 32'h92722c85,
32'ha2bfe8a1, 32'ha81a664b, 32'hc24b8b70, 32'hc76c51a3, 32'hd192e819, 32'hd6990624, 32'hf40e3585, 32'h106aa070,
32'h19a4c116, 32'h1e376c08, 32'h2748774c, 32'h34b0bcb5, 32'h391c0cb3, 32'h4ed8aa4a, 32'h5b9cca4f, 32'h682e6ff3,
32'h748f82ee, 32'h78a5636f, 32'h84c87814, 32'h8cc70208, 32'h90befffa, 32'ha4506ceb, 32'hbef9a3f7, 32'hc67178f2
};
// SHA256 hash round
function logic [255:0] sha256_op(input logic [31:0] a, b, c, d, e, f, g, h, w,
input logic [7:0] t);
logic [31:0] S1, S0, ch, maj, t1, t2; // internal signals
begin
S1 = rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25);
ch = (e & f) ^ ((~e) & g);
t1 = h + S1 + ch + sha256_k[t] + w;
S0 = rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22);
maj = (a & b) ^ (a & c) ^ (b & c);
t2 = S0 + maj;
sha256_op = {t1 + t2, a, b, c, d + t1, e, f, g};
end
endfunction
// ---------------------------------------------------------------------------------------
// left rotation
function logic [31:0] leftrotate(input logic [31:0] x);
begin
leftrotate = (x << 1) | (x >> 31);
end
endfunction
// right rotation
function logic [31:0] rightrotate(input logic [31:0] x,
input logic [7:0] r);
begin
rightrotate = (x >> r) | (x << (32-r));
end
endfunction
// convert from little-endian to big-endian
function logic [31:0] changeEndian(input logic [31:0] value);
changeEndian = {value[7:0], value[15:8], value[23:16], value[31:24]};
endfunction
// ---------------------------------------------------------------------------------------
// clock generator
always begin
#10;
clk = 1'b1;
#10
clk = 1'b0;
end
// main testbench
initial
begin
total_cycles = 0;
for (opcode = 0; opcode < 3; opcode = opcode + 1) begin
// RESET HASH CO-PROCESSOR
@(posedge clk) reset_n = 0;
for (m = 0; m < 2; m = m + 1) @(posedge clk);
reset_n = 1;
for (m = 0; m < 2; m = m + 1) @(posedge clk);
// SET MESSAGE LOCATION
size = message_size;
case (opcode)
2'b00: begin // md5
message_addr = 32'd1000;
message_seed = 32'h01234567;
end
2'b01: begin // sha1
message_addr = 32'd2000;
message_seed = 32'h34567012;
end
default: begin // sha256
message_addr = 32'd3000;
message_seed = 32'h45670123;
end
endcase
output_addr = message_addr + ((message_size-1))/4 + 1;
// CREATE AND DISPLAY MESSAGETEXT
$display("-----------\n");
$display("Messagetext\n");
$display("-----------\n");
dpsram[message_addr+0] = message_seed;
dpsram_tb[0] = changeEndian(message_seed); // change Endian // for testbench only
$display("%x\n", dpsram[message_addr]);
for (m = 1; m < (message_size-1)/4+1; m = m + 1) begin // data generation
dpsram[message_addr+m] = (dpsram[message_addr+m-1]<<1)|(dpsram[message_addr+m-1]>>31);
dpsram_tb[m] = changeEndian(dpsram[message_addr+m]); // change Endian
$display("%x\n", dpsram[message_addr+m]);
end
// START PROCESSOR
start = 1'b1;
for (m = 0; m < 2; m = m + 1) @(posedge clk);
start = 1'b0;
// PERFORM PADDING OF MESSAGE
// calculate total number of bytes after padding (before appending total length)
if ((message_size + 1) % 64 <= 56 && (message_size + 1) % 64 > 0)
pad_length = (message_size/64)*64 + 56;
else
pad_length = (message_size/64+1)*64 + 56;
case (message_size % 4) // pad bit 1
0: dpsram_tb[message_size/4] = 32'h80000000;
1: dpsram_tb[message_size/4] = dpsram_tb[message_size/4] & 32'h FF000000 | 32'h 00800000;
2: dpsram_tb[message_size/4] = dpsram_tb[message_size/4] & 32'h FFFF0000 | 32'h 00008000;
3: dpsram_tb[message_size/4] = dpsram_tb[message_size/4] & 32'h FFFFFF00 | 32'h 00000080;
endcase
for (m = message_size/4+1; m < pad_length/4; m = m + 1) begin
dpsram_tb[m] = 32'h00000000;
end
dpsram_tb[pad_length/4] = message_size >> 29; // append length of message in bits (before pre-processing)
dpsram_tb[pad_length/4+1] = message_size * 8;
pad_length = pad_length + 8; // final length after pre-processing
outloop = pad_length/64; // break message into 512-bit chunks (64 bytes)
// COMPUTE NUMBER OF ROUNDS
if (opcode == 2'b01) // sha1
rounds = 80;
else // md5 or sha256
rounds = 64;
// SET INITIAL HASH
case (opcode)
2'b00: begin // md5
h0 = 32'h67452301;
h1 = 32'hEFCDAB89;
h2 = 32'h98BADCFE;
h3 = 32'h10325476;
h4 = 32'h00000000;
h5 = 32'h00000000;
h6 = 32'h00000000;
h7 = 32'h00000000;
end
2'b01: begin // sha1
h0 = 32'h67452301;
h1 = 32'hEFCDAB89;
h2 = 32'h98BADCFE;
h3 = 32'h10325476;
h4 = 32'hC3D2E1F0;
h5 = 32'h00000000;
h6 = 32'h00000000;
h7 = 32'h00000000;
end
default: begin // sha256
h0 = 32'h6a09e667;
h1 = 32'hbb67ae85;
h2 = 32'h3c6ef372;
h3 = 32'ha54ff53a;
h4 = 32'h510e527f;
h5 = 32'h9b05688c;
h6 = 32'h1f83d9ab;
h7 = 32'h5be0cd19;
end
endcase
// COMPUTE SHA256 HASH
for (m = 0; m < outloop; m = m + 1) begin
// W ARRAY EXPANSION
for (t = 0; t < rounds; t = t + 1) begin
if (t < 16) begin
w[t] = dpsram_tb[t+m*16];
end else begin
case (opcode)
2'b00: begin // md5
w[t] = w[md5_g(t)];
end
2'b01: begin // sha1
w[t] = leftrotate(w[t-3] ^ w[t-8] ^ w[t-14] ^ w[t-16]);
end
default: begin // sha256
s0 = rightrotate(w[t-15], 7) ^ rightrotate(w[t-15], 18) ^ (w[t-15] >> 3);
s1 = rightrotate(w[t-2], 17) ^ rightrotate(w[t-2], 19) ^ (w[t-2] >> 10);
w[t] = w[t-16] + s0 + w[t-7] + s1;
end
endcase
end
end
// INITIAL HASH AT ROUND K
a = h0;
b = h1;
c = h2;
d = h3;
e = h4;
f = h5;
g = h6;
h = h7;
// HASH ROUNDS
for (t = 0; t < rounds; t = t + 1) begin
case (opcode)
2'b00:
{a, b, c, d} = md5_op(a, b, c, d, w[t], t);
2'b01:
{a, b, c, d, e} = sha1_op(a, b, c, d, e, w[t], t);
default:
{a, b, c, d, e, f, g, h} = sha256_op(a, b, c, d, e, f, g, h, w[t], t);
endcase
end
// FINAL HASH
h0 = h0 + a;
h1 = h1 + b;
h2 = h2 + c;
h3 = h3 + d;
h4 = h4 + e;
h5 = h5 + f;
h6 = h6 + g;
h7 = h7 + h;
end
md5_digest = {h0, h1, h2, h3};
sha1_digest = {h0, h1, h2, h3, h4};
sha256_digest = {h0, h1, h2, h3, h4, h5, h6, h7};
// WAIT UNTIL ENTIRE FRAME IS HASHED, THEN DISPLAY HASH RESULT
wait (done == 1);
// DISPLAY HASH RESULT
$display("-----------------------\n");
$display("correct %s hash result is:\n", hnames[opcode]);
$display("-----------------------\n");
case (opcode)
2'b00: // md5
$display("%x\n", md5_digest);
2'b01: // sha1
$display("%x\n", sha1_digest);
default: // sha256
$display("%x\n", sha256_digest);
endcase
md5_hash = {
dpsram[output_addr],
dpsram[output_addr+1],
dpsram[output_addr+2],
dpsram[output_addr+3]
};
sha1_hash = {
dpsram[output_addr],
dpsram[output_addr+1],
dpsram[output_addr+2],
dpsram[output_addr+3],
dpsram[output_addr+4]
};
sha256_hash = {
dpsram[output_addr],
dpsram[output_addr+1],
dpsram[output_addr+2],
dpsram[output_addr+3],
dpsram[output_addr+4],
dpsram[output_addr+5],
dpsram[output_addr+6],
dpsram[output_addr+7]
};
$display("-----------------------\n");
$display("Your %s result is: \n", hnames[opcode]);
$display("-----------------------\n");
case (opcode)
2'b00: // md5
$display("%x\n", md5_hash);
2'b01: // sha1
$display("%x\n", sha1_hash);
default: // sha256
$display("%x\n", sha256_hash);
endcase
$display("***************************\n");
correct = 1'b0;
case (opcode)
2'b00: // md5
correct = (md5_digest == md5_hash);
2'b01: // sha1
correct = (sha1_digest == sha1_hash);
default: // sha256
correct = (sha256_digest == sha256_hash);
endcase
if (correct) begin
$display("Congratulations! You have the correct %s hash result!\n", hnames[opcode]);
$display("Total number of %s cycles: %d\n\n", hnames[opcode], cycles);
end else begin
$display("Error! The %s hash result is wrong!\n", hnames[opcode]);
end
total_cycles = total_cycles + cycles;
$display("***************************\n");
end
$display("FINAL TOTAL NUMBER OF CYCLES: %d\n\n", total_cycles);
$display("***************************\n");
$stop;
end
// memory model
always @(posedge mem_clk)
begin
if (mem_we) // write
dpsram[mem_addr] = mem_write_data;
else // read
mem_read_data = dpsram[mem_addr];
end
// track # of cycles
always @(posedge clk)
begin
if (!reset_n)
cycles = 0;
else
cycles = cycles + 1;
end
endmodule