FPGA Super Hash Processor Design

General Overview and Theory

    The goal of this project is to implement a FPGA core that can compute the SHA1, SHA256, and MD5 hash result of any given message. This design is implemented and tested for an Intel Aria II FPGA using Quartus Prime and Modalism.

What is a hash and why do we need them?

    Any function that takes arbitrary data of any size as an input and outputs new data of a fixed size can be considered a hash function and its results are considered hashes. Hashes can be used to verify the integrity of data, efficient data look ups, cryptographic algorithms, and the handling of large data sets.


Overview of the Hashing Algorithms

    The SHA1, SHA256, and MD5 hash algorithms are three of the most popular hash functions in use today. Although the SHA1 and MD5 algorithms are older and can exploited these days, they are still used in many legacy and non-security critical systems. The SHA256 algorithm is a more modern and secure and thus it is used in many systems that require security such as the Bitcoin blockchain.

    Although the computation of these three algorithms are different, they all require the message to be preprocessed to create a input using the guidelines below:

1. The processed input must be an integer multiple of 512 bits (512, 1024, 2048, etc.)

2. The last 64 bits of proceed input must be equal to the size of the message (64-bit binary representation)

3. After the message there is a mandatory padding bit of value 1 plus padding bits of all zeros to make the input a multiple of 512 bits. If the message + padding bit of 1 + size is already a multiple of 512 bits, no zeros are needed.

Examples:

1. If the given message is 204 bits long, the processed input will look as follows:

    Input[0:203] = message

    Input[204] = 1

    Input [205:447] = 0

    Input[448:511] = size of message in a 64-bit binary representation

2. If the given message is 564 bits long, the processed input will look as follows:

    Input[0:563] = message

    Input[564] = 1

    Input[565:958] = 0

    Input[959:1023] = size of message in a 64-bit binary representation

    After the message is processed, it is split into individual 512-bit blocks. Then each block sequentially goes through rounds of computation which gives our result, also known as the digest. 

  Block Sizes [bits] Digest Size [bits] Computation Rounds Per Block
MD5 512 128 64
SHA1 512 160 80
SHA256 512 256 64

 


Basic Core and Testbench Layout

The Super Hash processor module takes the following inputs and outputs:

INPUTS:

                Opcode: The desired hash operation (MD5, SHA256, SHA 1)

                Message Address: Location of the message in memory

                Size: Size of the message

                Output Address: Location where the digest must be written in memory

                Memory Read Data: Data read from memory

                Clock: System clock for module

                Reset: When set low, module is reset

OUTPUTS:

                Memory Clock: Clock for memory (synced with system clock)

                Memory Write Enable: Enable pin to make memory writeable

                Done: Output pin which is set high when module has computed digest


Implementation:

    This project is implemented using a state machine design with the following eight states.

STATE 1: SETUP

    This is the initial starting state and the state the machine resets to if the reset pin is set low. In this state we initialize our hash constants (h1, h2,…) and fill the first processed block with zeros before it is filled with its actual values.  It’s important to do this now because each 512 bit block is processed sequentially and individually. The block is represented by 16 32bit registers ( w[0], w[1], …. w[15]) which total 512 bits. By always initially filling the 16 registers with zeros, we can just re write the registers we need and save the hassle of figuring out how many zeros are needed for padding purposes. Once this is done we switch over the IIDLE state.

STATE 2: IDLE

    Here we determine set the current_block variable to the number 512 bit blocks we will need given the size of the message. We also set the registers variable to the total number of 32 bit words our message uses, set our counter variable to zero, set our limit variable to the number of registers we will need to read for this current block (max of 16), and set the mem_addr output pin to the address of the first word in the message. Once this is all done we move to the reading state.

STATE 3 READ:

    Here we simply keep reading words from memory, convert them to big Endian, and store them into our w[] registers. Each time the counter and mem_addr are incremented. Once the counter is greater than the limit, we reset the counter, update the registers variable with the remaining number of registers, and move to the PAD state.

STATE 4 PAD:

    Here we go through a set of conditional statements which determine if and where padding is required. Since we initially fill the block with 0s, all we are concerned about is the “1 bit” of padding and the addition of the “size pad” to the last 64 bits of the final block. Finally we update the limit variable for the next block (we do this now and not in the previous state because it depends on the registers variable and that was set the previous cycle) and set our hash function variables (a,b,c,d…) before moving to the HASH state.

STATE 5 HASH:

    This is the state where all the number crunching for the hash function is done. Once the computation is done, hashing variables are updates, and we move to the POST_HASH sate.

STATE 6 POST_HASH:

    In this state we update the current_block variable, set our counter variable to zero, and set our block to all zeros again. We then determine the next stage of action. If there are more words to be read, we point the mem_addr output pin to the correct place and switch to the READ state. If there are no more words to be read but we require a block of padding we move to the PAD state. If all the blocks have been hashed, we move to the OUTPUT state.

STATE 7 OUTPUT:

    Once the hash results have all been computed we begin outputting the result  by setting the mem_addr pin the output address and writing our results. When finished we move to the DONE state.

STATE 8 DONE:

    The done pin is set high and we reset back to the SETUP state.


Results

  MD5 ONLY SHA1 ONLY SHA256 ONLY ALL THREE
FMAX [Mhz] 98.3 147.36 114.56 95.74
ALUTs 2211 2307 2406 3962
LOGIC REGISTERS 954 1044 1208 1236
AREA 3165 3351 3614 5198
CYCLES (505 byte message) 754 899 758 2411
DELAY [microseconds] 7.6703967 6.10070575 6.616620112 25.18279
AREA*DELAY [miliseconds * area] 24.276806 20.443465 23.91246508 130.9001

   

    The results above were achieved using the "performance high effort" compiler option on a Windows 10 64-Bit machine running Quartus Prime Lite Edition v18. Overall the system achieved a good Fmax and the area was quite low making this a compact design. However there is still much more room for improvement. The largest gains can be realized by breaking down the computation into smaller functions and implementing pipelining to eliminate superficial data depencides. The second version of this core will focus on those two things.


System Verilog Code:

module super_hash_processor(input logic clk, reset_n, start,
input logic [1:0] opcode,
input logic [31:0] message_addr, size, output_addr,
output logic done, mem_clk, mem_we,
output logic [15:0] mem_addr,
output logic [31:0] mem_write_data,
input logic [31:0] mem_read_data);

enum logic [3:0] {SETUP=4'b0000, IDLE=4'b0001, READ=4'b0010, PAD=4'b0011, HASH=4'b0100, POST_HASH=4'b0101, OUTPUT=4'b0110, DONE=4'b1110} state;
logic [31:0] counter;
//logic [31:0] temp_address; //uncomment if doing more than one hash
logic [31:0] current_block;
logic [31:0] limit;
logic [31:0] registers;
logic [31:0] w[0:15];
logic [31:0] a, b, c, d, e, f, fsha1, g, h, k, temp;
logic [31:0] h0, h1, h2, h3, h4, h5, h6, h7;
logic [31:0] a1, b1, c1, d1, e1;

assign mem_clk = clk;

// right rotation
function logic [31:0] rightrotate(input logic [31:0] x,
                                  input logic [7:0] r);
	begin
		 rightrotate = (x >> r) | (x << (32-r));
	end
endfunction

// SHA256 K constants
parameter int sha256_k[0:63] = '{
   32'h428a2f98, 32'h71374491, 32'hb5c0fbcf, 32'he9b5dba5, 32'h3956c25b, 32'h59f111f1, 32'h923f82a4, 32'hab1c5ed5,
   32'hd807aa98, 32'h12835b01, 32'h243185be, 32'h550c7dc3, 32'h72be5d74, 32'h80deb1fe, 32'h9bdc06a7, 32'hc19bf174,
   32'he49b69c1, 32'hefbe4786, 32'h0fc19dc6, 32'h240ca1cc, 32'h2de92c6f, 32'h4a7484aa, 32'h5cb0a9dc, 32'h76f988da,
   32'h983e5152, 32'ha831c66d, 32'hb00327c8, 32'hbf597fc7, 32'hc6e00bf3, 32'hd5a79147, 32'h06ca6351, 32'h14292967,
   32'h27b70a85, 32'h2e1b2138, 32'h4d2c6dfc, 32'h53380d13, 32'h650a7354, 32'h766a0abb, 32'h81c2c92e, 32'h92722c85,
   32'ha2bfe8a1, 32'ha81a664b, 32'hc24b8b70, 32'hc76c51a3, 32'hd192e819, 32'hd6990624, 32'hf40e3585, 32'h106aa070,
   32'h19a4c116, 32'h1e376c08, 32'h2748774c, 32'h34b0bcb5, 32'h391c0cb3, 32'h4ed8aa4a, 32'h5b9cca4f, 32'h682e6ff3,
   32'h748f82ee, 32'h78a5636f, 32'h84c87814, 32'h8cc70208, 32'h90befffa, 32'ha4506ceb, 32'hbef9a3f7, 32'hc67178f2
};

// SHA256 hash round
function logic [255:0] sha256_op(input logic [31:0] a, b, c, d, e, f, g, h, w,
                                 input logic [7:0] t);
    logic [31:0] S1, S0, ch, maj, t1, t2; // internal signals
begin
    S1 = rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25);
    ch = (e & f) ^ ((~e) & g);
    t1 = h + S1 + ch + sha256_k[t] + w;
    S0 = rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22);
    maj = (a & b) ^ (a & c) ^ (b & c);
    t2 = S0 + maj;

    sha256_op = {t1 + t2, a, b, c, d + t1, e, f, g};
end
endfunction

// MD5 S constants
parameter byte S[0:63] = '{
    8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22,
    8'd5, 8'd9,  8'd14, 8'd20, 8'd5, 8'd9,  8'd14, 8'd20, 8'd5, 8'd9,  8'd14, 8'd20, 8'd5, 8'd9,  8'd14, 8'd20,
    8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23,
    8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21
};

// MD5 K constants
parameter int md5_k[0:63] = '{
    32'hd76aa478, 32'he8c7b756, 32'h242070db, 32'hc1bdceee,
    32'hf57c0faf, 32'h4787c62a, 32'ha8304613, 32'hfd469501,
    32'h698098d8, 32'h8b44f7af, 32'hffff5bb1, 32'h895cd7be,
    32'h6b901122, 32'hfd987193, 32'ha679438e, 32'h49b40821,
    32'hf61e2562, 32'hc040b340, 32'h265e5a51, 32'he9b6c7aa,
    32'hd62f105d, 32'h02441453, 32'hd8a1e681, 32'he7d3fbc8,
    32'h21e1cde6, 32'hc33707d6, 32'hf4d50d87, 32'h455a14ed,
    32'ha9e3e905, 32'hfcefa3f8, 32'h676f02d9, 32'h8d2a4c8a,
    32'hfffa3942, 32'h8771f681, 32'h6d9d6122, 32'hfde5380c,
    32'ha4beea44, 32'h4bdecfa9, 32'hf6bb4b60, 32'hbebfbc70,
    32'h289b7ec6, 32'heaa127fa, 32'hd4ef3085, 32'h04881d05,
    32'hd9d4d039, 32'he6db99e5, 32'h1fa27cf8, 32'hc4ac5665,
    32'hf4292244, 32'h432aff97, 32'hab9423a7, 32'hfc93a039,
    32'h655b59c3, 32'h8f0ccc92, 32'hffeff47d, 32'h85845dd1,
    32'h6fa87e4f, 32'hfe2ce6e0, 32'ha3014314, 32'h4e0811a1,
    32'hf7537e82, 32'hbd3af235, 32'h2ad7d2bb, 32'heb86d391
};

// MD5 g
function logic[3:0] md5_g(input logic [7:0] t);
begin
   if (t <= 15)
       md5_g = t;
   else if (t <= 31)
       md5_g = (5*t + 1) % 16;
   else if (t <= 47)
       md5_g = (3*t + 5) % 16;
   else
       md5_g = (7*t) % 16;
end
endfunction

// MD5 f
function logic[31:0] md5_f(input logic [7:0] t);
begin
    if (t <= 15)
        md5_f = (b & c) | ((~b) & d);
    else if (t <= 31)
        md5_f = (d & b) | ((~d) & c);
    else if (t <= 47)
        md5_f = b ^ c ^ d;
    else
        md5_f = c ^ (b | (~d));
end
endfunction

// MD5 hash round
function logic[127:0] md5_op(input logic [31:0] a, b, c, d, w,
                             input logic [7:0] t);
    logic [31:0] t1, t2; // internal signals
begin
	//debug
	/*
	$display("w == %x\n",w);
	$display("--------------------------\n");
	*/
    t1 = a + md5_f(t) + md5_k[t] + w;
    t2 = b + ((t1 << S[t])|(t1 >> (32-S[t])));
    md5_op = {d, t2, b, c};
end
endfunction

//sha1 hash
 function logic [159:0] hash_op(input logic [31:0] a, b, c, d, e, w, input logic [31:0] t);
	if(t<=19)begin
		k = 32'h5A827999;
		fsha1 = (b & c) | ( (~b) & d);
	end else 	
	
	if(t <=39)begin
		k = 32'h6ED9EBA1;
		fsha1 = b ^ c ^ d;
	end else
	
	if(t<=59)begin
		k = 32'h8F1BBCDC;
		fsha1 = (b & c) | (b & d) | (c & d);
	end else 
	
	begin
		k = 32'hCA62C1D6;	
		fsha1 = b ^ c ^ d;
	end
	//debug
	/*	
	$display("a == %x\n",a);
	$display("b == %x\n",b);
	$display("c == %x\n",c);
	$display("d == %x\n",d);
	$display("e == %x\n",e);
	$display("f == %x\n",f);	
	$display("w == %x\n",w);
	*/	
	temp = {a[26:0],a[31:27]} + fsha1 + e + k + w;
	//debug
	/*
	$display("temp == %x\n",temp);	
	$display("-------------------------------");	
	*/
	e = d;
	d = c;
	c = {b[1:0],b[31:2]};
	b = a;
	a = temp;
	hash_op = {a, b, c, d, e};
endfunction

// convert from little-endian to big-endian
function logic [31:0] changeEndian(input logic [31:0] value);
	changeEndian = {value[7:0], value[15:8], value[23:16], value[31:24]};
endfunction
  
//appending pading "1"  
function logic [31:0] magic(input logic [31:0] value);
begin
	if(size%4 == 1)begin
		magic = ((value & 32'hFF000000) | 32'h00800000);
	end
	if(size%4 == 2) begin
		magic = ((value & 32'hFFFF0000) | 32'h00008000);
	end
	if(size%4 == 3) begin
		magic = ((value & 32'hFFFFFF00) | 32'h00000080);
	end
end
endfunction  
  
//determine number of blocks
function logic [31:0] determine_num_blocks(input logic [31:0] size);
	determine_num_blocks = ((((size)+8)/64)+1);
endfunction


  always_ff @(posedge clk, negedge reset_n)
  begin
    if (!reset_n) begin
      state <= SETUP;
    end else
      casex (state)
			SETUP: begin
			w[0]<=0; //set block to zeros to avoid padding 0s
			w[1]<=0;
			w[2]<=0;
			w[3]<=0; 
			w[4]<=0; 
			w[5]<=0;
			w[6]<=0;
			w[7]<=0;
			w[8]<=0;
			w[9]<=0;
			w[10]<=0;
			w[11]<=0;
			w[12]<=0;
			w[13]<=0;
			w[14]<=0;
			w[15]<=0;
			if(opcode == 2'b01 || opcode == 2'b00) begin // set constants depending on operation
				h0 <= 32'h67452301;
				h1 <= 32'hEFCDAB89;
				h2 <= 32'h98BADCFE;
				h3 <= 32'h10325476;
				h4 <= 32'hC3D2E1F0;
			end
			if(opcode == 2'b10) begin
				h0 <= 32'h6a09e667;
				h1 <= 32'hbb67ae85;
				h2 <= 32'h3c6ef372;
				h3 <= 32'ha54ff53a;
				h4 <= 32'h510e527f;
				h5 <= 32'h9b05688c;
				h6 <= 32'h1f83d9ab;
				h7 <= 32'h5be0cd19;
			end
			//temp_addr <= output_addr; //uncomment if doing more than one hash 
			state <= IDLE;
			done <= 0;
		end	
			
      IDLE: // start
          if(start) begin // READ first word
				current_block<=determine_num_blocks(size);
				registers <= size/4 + size[0];
				if(size >= 64) begin //if message is more than 512 bits
					limit <= 16;
				end
				if (size < 64) begin //if message is less than 512 bits
					limit <= size/4 + size[0];
				end					
            counter <= 0;
            state <= READ;	
				mem_addr <= message_addr - 1; //account for reading dealys		
          end
		
		READ: begin
			mem_addr <= mem_addr + 1;
			w[counter-2] <= changeEndian(mem_read_data); //-2 to account for delays
			counter <= counter + 1;
			state <= READ;
			if (counter>limit)begin
				counter <= 0;
				registers <= registers - limit;
				state <= PAD;
			end
		end
			
		PAD: begin
			if(current_block == 1)begin
				w[15] <= (size << 3); //append size
				if(size % 4 == 0)begin
					w[0] <= 32'h80000000; //full block of padding
				end
				if(limit < 16) begin
					if(size % 4 > 0)begin //if unfinished word need to magic
						w[limit-1] <= magic(w[limit-1]);
					end
					if(size % 4 == 0)begin //if finished word need to make w[limit] = 1 
						w[limit] <= 32'h80000000;
					end
				end
			end
			if(current_block == 2 && size % 4 > 0 && registers < 1)begin //unfinished last word
					w[limit-1] <= magic(w[limit-1]);
			end
			if(registers<16)begin //set limit for next block
				limit <= registers;
			end
			else begin
				limit <= 16;
			end
			a <= h0; //prep hash constants
			b <= h1;
			c <= h2;
			d <= h3;
			e <= h4;		
			f <= h5;
			g <= h6;
			h <= h7;
			state <= HASH;
		end

		HASH: begin
			if(opcode == 2'b00)begin //md5
				if(counter < 64)begin
					if(counter < 16)begin
						{a, b, c, d} <= md5_op(a, b, c, d, w[counter], counter);					
					end	
				end
				if(counter > 15)begin
					{a, b, c, d} <= md5_op(a, b, c, d, w[md5_g(counter)], counter);
				end				
				counter <= counter + 1;
				if(counter==64)begin
					h0 <= h0 + a;
					h1 <= h1 + b;
					h2 <= h2 + c;
					h3 <= h3 + d;
					state <= POST_HASH;	
				end
			end				
			if(opcode == 2'b01)begin //sha1
				if(counter < 80)begin	
					if(counter < 16)begin
						//debug
						//$display("t == %d\n",t);
						{a,b,c,d,e} <= hash_op(a, b, c, d, e, w[counter], counter);
						counter <= counter + 1;
					end	
					if( counter >= 16)begin
						//debug
					  // $display("t == %d\n",t);
						{a,b,c,d,e} <= hash_op(a, b, c, d, e, ((w[counter-a1]^w[counter-b1]^w[counter-c1]^w[counter-d1])<<1 | 
						(w[counter-a1]^w[counter-b1]^w[counter-c1]^w[counter-d1])>>31), counter);
						w[counter-e1]<= ((w[counter-a1]^w[counter-b1]^w[counter-c1]^w[counter-d1])<<1 | 
						(w[counter-a1]^w[counter-b1]^w[counter-c1]^w[counter-d1])>>31);
						counter <= counter + 1;	
						if((((counter+1)-3)%16)==0)begin
							a1 <= a1 + 16;
						end
						if((((counter+1)-8)%16)==0)begin
							b1 <= b1 + 16;
						end	
						if((((counter+1)-14)%16)==0)begin
							c1 <= c1 + 16;
						end			
						if(((counter+1)%16)==0)begin
							d1 <= d1 + 16;
							e1 <= e1 + 16;					
						end						
					end	 
				end
				if (counter == 80)begin
					h0 <= h0 + a;
					h1 <= h1 + b;
					h2 <= h2 + c;
					h3 <= h3 + d;
					h4 <= h4 + e;
					a1 <= 3;
					b1 <= 8;
					c1 <= 14;
					d1 <= 16;
					e1 <= 16;
					state <= POST_HASH;	
				end
			end
			if(opcode == 2'b10) begin //sha256
				if(counter < 64)begin
					if(counter < 16)begin		
						{a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w[counter], counter);
					end
					if(counter> 14 )begin
						w[15] <= w[0] + (rightrotate(w[1],   7) ^ rightrotate(w[1],  18) ^ (w[1]  >>  3)) +
						w[9] + (rightrotate(w[14], 17) ^ rightrotate(w[14], 19) ^ (w[14] >> 10));
						for (int i=0; i<15; i++) 
							w[i] <= w[i+1];	
					end
					if(counter > 15)begin
						{a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w[15], counter);
					end
					counter <= counter + 1;
				end
				if (counter == 64)begin
					h0 <= h0 + a;
					h1 <= h1 + b;
					h2 <= h2 + c;
					h3 <= h3 + d;
					h4 <= h4 + e;
					h5 <= h5 + f;
					h6 <= h6 + g;
					h7 <= h7 + h;
					state <= POST_HASH;
				end
			end	
		end		
		
		POST_HASH:begin
				w[0] <= 0; //set block to zeros again
				w[1] <= 0;
				w[2] <= 0;
				w[3] <= 0; 
				w[4] <= 0; 
				w[5] <= 0;
				w[6] <= 0;
				w[7] <= 0;
				w[8] <= 0;
				w[9] <= 0;
				w[10] <= 0;
				w[11] <= 0;
				w[12] <= 0;
				w[13] <= 0;
				w[14] <= 0;
				w[15] <= 0;
				counter <= 0;
				current_block <= current_block - 1; //decrement block counter
				if(limit > 0)begin
					state <= READ;
					mem_addr <= mem_addr - 2;
				end
				if(limit < 1 )begin
					state <= PAD;
				end
				if(current_block == 1)begin
					state <= OUTPUT;
					//mem_addr <= temp_addr //uncomment if doing more than one hash
				end
		end	
			
		OUTPUT:begin	
		    mem_we <= 1;
		    mem_addr <= output_addr; //comment out if doing more than one hash
			 if(counter == 0)begin
				mem_write_data <= h0;
			 end
			 if(counter == 1)begin
				mem_write_data <= h1;
			 end	
			 if(counter == 2)begin
				mem_write_data <= h2;
			 end	
			 if(counter == 3)begin
			 mem_write_data<=h3;
				if (opcode == 2'b00) begin
					state <= DONE;
				end
			 end	
			 if(counter==4)begin
				mem_write_data <= h4;
			 	if (opcode == 2'b01)begin
					state <= DONE;
				end
			 end
			 if(counter == 5)begin
				mem_write_data <= h5;
			 end		
			 if(counter == 6)begin
				mem_write_data <= h6;
			 end				
			 if(counter == 7)begin
				mem_write_data <= h7;
				if (opcode == 2'b10)begin
					state <= DONE;
				end
			 end				 
          mem_addr <= output_addr + counter;
			 counter <= counter + 1;
			 //temp_address <= output_addr + counter; //uncomment if doing more than one hash
			 
		end	 
			
      DONE: begin
		/* //debug
			$display("-------- FINAL RESULT --------");		
			$display("h0== %x\n",h0);
			$display("h1 == %x\n",h1);
			$display("h2 == %x\n",h2);
			$display("h3 == %x\n",h3);
			$display("h4 == %x\n",h4);
	   */		
         done <= 1;
         state <= SETUP;
		end	
      endcase
end		
  endmodule

 

module tb_super_hash_processor();

logic           clk, reset_n, start;
logic   [  1:0] opcode;
logic   [ 31:0] message_addr, size, output_addr;
logic           done, mem_clk, mem_we;
logic   [ 15:0] mem_addr;
logic   [ 31:0] mem_write_data;
logic   [ 31:0] mem_read_data;

logic   [127:0] md5_hash; // results here
logic   [159:0] sha1_hash; // results here
logic   [255:0] sha256_hash; // results here

logic   [ 31:0] dpsram[0:16383]; // each row has 32 bits
logic   [ 31:0] dpsram_tb[0:16383]; // for result testing, testbench only

logic   [ 31:0] message_seed; // modify message_seed below

int             message_size = 505; // in bytes 
int             pad_length;

int             t, m;
int             outloop;
int             cycles;
int             total_cycles;
int             rounds;

logic           correct;

logic   [127:0] md5_digest;
logic   [159:0] sha1_digest;
logic   [255:0] sha256_digest;

logic   [ 31:0] h0;
logic   [ 31:0] h1;
logic   [ 31:0] h2;
logic   [ 31:0] h3;
logic   [ 31:0] h4;
logic   [ 31:0] h5;
logic   [ 31:0] h6;
logic   [ 31:0] h7;

logic   [ 31:0] a, b, c, d, e, f, g, h;
logic   [ 31:0] s1, s0;

logic   [ 31:0] w[0:79];

// instantiate your design
super_hash_processor super_hash_processor_inst (clk, reset_n, start, opcode, message_addr, size, output_addr, done,
    mem_clk, mem_we, mem_addr, mem_write_data, mem_read_data);

parameter string hnames[0:2] = {"MD5", "SHA1", "SHA256"};

// ---------------------------------------------------------------------------------------

// MD5 S constants
parameter byte S[0:63] = '{
    8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22, 8'd7, 8'd12, 8'd17, 8'd22,
    8'd5, 8'd9,  8'd14, 8'd20, 8'd5, 8'd9,  8'd14, 8'd20, 8'd5, 8'd9,  8'd14, 8'd20, 8'd5, 8'd9,  8'd14, 8'd20,
    8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23, 8'd4, 8'd11, 8'd16, 8'd23,
    8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21, 8'd6, 8'd10, 8'd15, 8'd21
};

// MD5 K constants
parameter int md5_k[0:63] = '{
    32'hd76aa478, 32'he8c7b756, 32'h242070db, 32'hc1bdceee,
    32'hf57c0faf, 32'h4787c62a, 32'ha8304613, 32'hfd469501,
    32'h698098d8, 32'h8b44f7af, 32'hffff5bb1, 32'h895cd7be,
    32'h6b901122, 32'hfd987193, 32'ha679438e, 32'h49b40821,
    32'hf61e2562, 32'hc040b340, 32'h265e5a51, 32'he9b6c7aa,
    32'hd62f105d, 32'h02441453, 32'hd8a1e681, 32'he7d3fbc8,
    32'h21e1cde6, 32'hc33707d6, 32'hf4d50d87, 32'h455a14ed,
    32'ha9e3e905, 32'hfcefa3f8, 32'h676f02d9, 32'h8d2a4c8a,
    32'hfffa3942, 32'h8771f681, 32'h6d9d6122, 32'hfde5380c,
    32'ha4beea44, 32'h4bdecfa9, 32'hf6bb4b60, 32'hbebfbc70,
    32'h289b7ec6, 32'heaa127fa, 32'hd4ef3085, 32'h04881d05,
    32'hd9d4d039, 32'he6db99e5, 32'h1fa27cf8, 32'hc4ac5665,
    32'hf4292244, 32'h432aff97, 32'hab9423a7, 32'hfc93a039,
    32'h655b59c3, 32'h8f0ccc92, 32'hffeff47d, 32'h85845dd1,
    32'h6fa87e4f, 32'hfe2ce6e0, 32'ha3014314, 32'h4e0811a1,
    32'hf7537e82, 32'hbd3af235, 32'h2ad7d2bb, 32'heb86d391
};

// MD5 g
function logic[3:0] md5_g(input logic [7:0] t);
begin
   if (t <= 15)
       md5_g = t;
   else if (t <= 31)
       md5_g = (5*t + 1) % 16;
   else if (t <= 47)
       md5_g = (3*t + 5) % 16;
   else
       md5_g = (7*t) % 16;
end
endfunction

// MD5 f
function logic[31:0] md5_f(input logic [7:0] t);
begin
    if (t <= 15)
        md5_f = (b & c) | ((~b) & d);
    else if (t <= 31)
        md5_f = (d & b) | ((~d) & c);
    else if (t <= 47)
        md5_f = b ^ c ^ d;
    else
        md5_f = c ^ (b | (~d));
end
endfunction

// MD5 hash round
function logic[127:0] md5_op(input logic [31:0] a, b, c, d, w,
                             input logic [7:0] t);
    logic [31:0] t1, t2; // internal signals
begin
    t1 = a + md5_f(t) + md5_k[t] + w;
    t2 = b + ((t1 << S[t])|(t1 >> (32-S[t])));
    md5_op = {d, t2, b, c};
end
endfunction

// ---------------------------------------------------------------------------------------

// SHA1 f
function logic [31:0] sha1_f(input logic [7:0] t);
begin
   if (t <= 19)
       sha1_f = (b & c) | ((~b) & d);
   else if (t <= 39)
       sha1_f = b ^ c ^ d;
   else if (t <= 59)
       sha1_f = (b & c) | (b & d) | (c & d);
   else
       sha1_f = b ^ c ^ d;
end
endfunction

// SHA1 k
function logic [31:0] sha1_k(input logic [7:0] t);
begin
   if (t <= 19)
       sha1_k = 32'h5a827999;
   else if (t <= 39)
       sha1_k = 32'h6ed9eba1;
   else if (t <= 59)
       sha1_k = 32'h8f1bbcdc;
   else
       sha1_k = 32'hca62c1d6;
end
endfunction

// SHA1 hash round
function logic [159:0] sha1_op(input logic [31:0] a, b, c, d, e, w,
                               input logic [7:0] t);
   logic [31:0] temp, tc; // internal signals
begin
   temp = ((a << 5)|(a >> 27)) + sha1_f(t) + e + sha1_k(t) + w;
   tc = ((b << 30)|(b >> 2));
   sha1_op = {temp, a, tc, c, d};
end
endfunction

// ---------------------------------------------------------------------------------------

// SHA256 K constants
parameter int sha256_k[0:63] = '{
   32'h428a2f98, 32'h71374491, 32'hb5c0fbcf, 32'he9b5dba5, 32'h3956c25b, 32'h59f111f1, 32'h923f82a4, 32'hab1c5ed5,
   32'hd807aa98, 32'h12835b01, 32'h243185be, 32'h550c7dc3, 32'h72be5d74, 32'h80deb1fe, 32'h9bdc06a7, 32'hc19bf174,
   32'he49b69c1, 32'hefbe4786, 32'h0fc19dc6, 32'h240ca1cc, 32'h2de92c6f, 32'h4a7484aa, 32'h5cb0a9dc, 32'h76f988da,
   32'h983e5152, 32'ha831c66d, 32'hb00327c8, 32'hbf597fc7, 32'hc6e00bf3, 32'hd5a79147, 32'h06ca6351, 32'h14292967,
   32'h27b70a85, 32'h2e1b2138, 32'h4d2c6dfc, 32'h53380d13, 32'h650a7354, 32'h766a0abb, 32'h81c2c92e, 32'h92722c85,
   32'ha2bfe8a1, 32'ha81a664b, 32'hc24b8b70, 32'hc76c51a3, 32'hd192e819, 32'hd6990624, 32'hf40e3585, 32'h106aa070,
   32'h19a4c116, 32'h1e376c08, 32'h2748774c, 32'h34b0bcb5, 32'h391c0cb3, 32'h4ed8aa4a, 32'h5b9cca4f, 32'h682e6ff3,
   32'h748f82ee, 32'h78a5636f, 32'h84c87814, 32'h8cc70208, 32'h90befffa, 32'ha4506ceb, 32'hbef9a3f7, 32'hc67178f2
};

// SHA256 hash round
function logic [255:0] sha256_op(input logic [31:0] a, b, c, d, e, f, g, h, w,
                                 input logic [7:0] t);
    logic [31:0] S1, S0, ch, maj, t1, t2; // internal signals
begin
    S1 = rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25);
    ch = (e & f) ^ ((~e) & g);
    t1 = h + S1 + ch + sha256_k[t] + w;
    S0 = rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22);
    maj = (a & b) ^ (a & c) ^ (b & c);
    t2 = S0 + maj;

    sha256_op = {t1 + t2, a, b, c, d + t1, e, f, g};
end
endfunction

// ---------------------------------------------------------------------------------------

// left rotation
function logic [31:0] leftrotate(input logic [31:0] x);
begin
    leftrotate = (x << 1) | (x >> 31);
end
endfunction

// right rotation
function logic [31:0] rightrotate(input logic [31:0] x,
                                  input logic [7:0] r);
begin
    rightrotate = (x >> r) | (x << (32-r));
end
endfunction

// convert from little-endian to big-endian
function logic [31:0] changeEndian(input logic [31:0] value);
    changeEndian = {value[7:0], value[15:8], value[23:16], value[31:24]};
endfunction

// ---------------------------------------------------------------------------------------

// clock generator
always begin
    #10;
    clk = 1'b1;
    #10
    clk = 1'b0;
end

// main testbench
initial
begin
  total_cycles = 0;

  for (opcode = 0; opcode < 3; opcode = opcode + 1) begin
    // RESET HASH CO-PROCESSOR

    @(posedge clk) reset_n = 0;
    for (m = 0; m < 2; m = m + 1) @(posedge clk);
    reset_n = 1;
    for (m = 0; m < 2; m = m + 1) @(posedge clk);

    // SET MESSAGE LOCATION

    size = message_size;

    case (opcode)
    2'b00: begin // md5
        message_addr = 32'd1000;
        message_seed = 32'h01234567; 
      end
    2'b01: begin // sha1
        message_addr = 32'd2000;
        message_seed = 32'h34567012; 
      end
    default: begin // sha256
        message_addr = 32'd3000;
        message_seed = 32'h45670123; 
      end
    endcase

    output_addr = message_addr + ((message_size-1))/4 + 1;

    // CREATE AND DISPLAY MESSAGETEXT

    $display("-----------\n");
    $display("Messagetext\n");
    $display("-----------\n");

    dpsram[message_addr+0] = message_seed;
    dpsram_tb[0] = changeEndian(message_seed); // change Endian // for testbench only

    $display("%x\n", dpsram[message_addr]);

    for (m = 1; m < (message_size-1)/4+1; m = m + 1) begin // data generation
        dpsram[message_addr+m] = (dpsram[message_addr+m-1]<<1)|(dpsram[message_addr+m-1]>>31);
        dpsram_tb[m] = changeEndian(dpsram[message_addr+m]); // change Endian
        $display("%x\n", dpsram[message_addr+m]);
    end

    // START PROCESSOR

    start = 1'b1;
    for (m = 0; m < 2; m = m + 1) @(posedge clk);
    start = 1'b0;

    // PERFORM PADDING OF MESSAGE

    // calculate total number of bytes after padding (before appending total length)
    if ((message_size + 1) % 64 <= 56 && (message_size + 1) % 64 > 0)
        pad_length = (message_size/64)*64 + 56;
    else
        pad_length = (message_size/64+1)*64 + 56;

    case (message_size % 4) // pad bit 1
    0: dpsram_tb[message_size/4] = 32'h80000000;
    1: dpsram_tb[message_size/4] = dpsram_tb[message_size/4] & 32'h FF000000 | 32'h 00800000;
    2: dpsram_tb[message_size/4] = dpsram_tb[message_size/4] & 32'h FFFF0000 | 32'h 00008000;
    3: dpsram_tb[message_size/4] = dpsram_tb[message_size/4] & 32'h FFFFFF00 | 32'h 00000080;
    endcase

    for (m = message_size/4+1; m < pad_length/4; m = m + 1) begin
        dpsram_tb[m] = 32'h00000000;
    end

    dpsram_tb[pad_length/4] = message_size >> 29; // append length of message in bits (before pre-processing)
    dpsram_tb[pad_length/4+1] = message_size * 8;
    pad_length = pad_length + 8; // final length after pre-processing

    outloop = pad_length/64; // break message into 512-bit chunks (64 bytes)

    // COMPUTE NUMBER OF ROUNDS

    if (opcode == 2'b01) // sha1
        rounds = 80;
    else // md5 or sha256
        rounds = 64;

    // SET INITIAL HASH

    case (opcode)
    2'b00: begin // md5
        h0 = 32'h67452301;
        h1 = 32'hEFCDAB89;
        h2 = 32'h98BADCFE;
        h3 = 32'h10325476;
        h4 = 32'h00000000;
        h5 = 32'h00000000;
        h6 = 32'h00000000;
        h7 = 32'h00000000;
      end
    2'b01: begin // sha1
        h0 = 32'h67452301;
        h1 = 32'hEFCDAB89;
        h2 = 32'h98BADCFE;
        h3 = 32'h10325476;
        h4 = 32'hC3D2E1F0;
        h5 = 32'h00000000;
        h6 = 32'h00000000;
        h7 = 32'h00000000;
      end
    default: begin // sha256
        h0 = 32'h6a09e667;
        h1 = 32'hbb67ae85;
        h2 = 32'h3c6ef372;
        h3 = 32'ha54ff53a;
        h4 = 32'h510e527f;
        h5 = 32'h9b05688c;
        h6 = 32'h1f83d9ab;
        h7 = 32'h5be0cd19;
      end
    endcase

    // COMPUTE SHA256 HASH

    for (m = 0; m < outloop; m = m + 1) begin
        // W ARRAY EXPANSION

        for (t = 0; t < rounds; t = t + 1) begin
            if (t < 16) begin
                w[t] = dpsram_tb[t+m*16];
            end else begin
              case (opcode)
              2'b00: begin // md5
                w[t] = w[md5_g(t)];
              end
              2'b01: begin // sha1
                w[t] = leftrotate(w[t-3] ^ w[t-8] ^ w[t-14] ^ w[t-16]);
              end
              default: begin // sha256
                s0 = rightrotate(w[t-15], 7) ^ rightrotate(w[t-15], 18) ^ (w[t-15] >> 3);
                s1 = rightrotate(w[t-2], 17) ^ rightrotate(w[t-2], 19) ^ (w[t-2] >> 10);
                w[t] = w[t-16] + s0 + w[t-7] + s1;
              end
              endcase
            end
        end

        // INITIAL HASH AT ROUND K

        a = h0;
        b = h1;
        c = h2;
        d = h3;
        e = h4;
        f = h5;
        g = h6;
        h = h7;

        // HASH ROUNDS

        for (t = 0; t < rounds; t = t + 1) begin
            case (opcode)
            2'b00: 
                {a, b, c, d} = md5_op(a, b, c, d, w[t], t);
            2'b01: 
                {a, b, c, d, e} = sha1_op(a, b, c, d, e, w[t], t);
            default:
                {a, b, c, d, e, f, g, h} = sha256_op(a, b, c, d, e, f, g, h, w[t], t);
            endcase
        end

        // FINAL HASH

        h0 = h0 + a;
        h1 = h1 + b;
        h2 = h2 + c;
        h3 = h3 + d;
        h4 = h4 + e;
        h5 = h5 + f;
        h6 = h6 + g;
        h7 = h7 + h;
    end

    md5_digest = {h0, h1, h2, h3};
    sha1_digest = {h0, h1, h2, h3, h4};
    sha256_digest = {h0, h1, h2, h3, h4, h5, h6, h7};

    // WAIT UNTIL ENTIRE FRAME IS HASHED, THEN DISPLAY HASH RESULT

    wait (done == 1);

    // DISPLAY HASH RESULT

    $display("-----------------------\n");
    $display("correct %s hash result is:\n", hnames[opcode]);
    $display("-----------------------\n");
    case (opcode)
    2'b00: // md5
        $display("%x\n", md5_digest);
    2'b01: // sha1
        $display("%x\n", sha1_digest);
    default: // sha256
        $display("%x\n", sha256_digest);
    endcase

    md5_hash = {
        dpsram[output_addr],
        dpsram[output_addr+1],
        dpsram[output_addr+2],
        dpsram[output_addr+3]
    };

    sha1_hash = {
        dpsram[output_addr],
        dpsram[output_addr+1],
        dpsram[output_addr+2],
        dpsram[output_addr+3],
        dpsram[output_addr+4]
    };

    sha256_hash = {
        dpsram[output_addr],
        dpsram[output_addr+1],
        dpsram[output_addr+2],
        dpsram[output_addr+3],
        dpsram[output_addr+4],
        dpsram[output_addr+5],
        dpsram[output_addr+6],
        dpsram[output_addr+7]
    };

    $display("-----------------------\n");
    $display("Your %s result is:        \n", hnames[opcode]);
    $display("-----------------------\n");
    case (opcode)
    2'b00: // md5
        $display("%x\n", md5_hash);
    2'b01: // sha1
        $display("%x\n", sha1_hash);
    default: // sha256
        $display("%x\n", sha256_hash);
    endcase

    $display("***************************\n");

    correct = 1'b0;
    case (opcode)
    2'b00: // md5
        correct = (md5_digest == md5_hash);
    2'b01: // sha1
        correct = (sha1_digest == sha1_hash);
    default: // sha256
        correct = (sha256_digest == sha256_hash);
    endcase

    if (correct) begin
        $display("Congratulations! You have the correct %s hash result!\n", hnames[opcode]);
        $display("Total number of %s cycles: %d\n\n", hnames[opcode], cycles);
    end else begin
        $display("Error! The %s hash result is wrong!\n", hnames[opcode]);
    end

    total_cycles = total_cycles + cycles;

    $display("***************************\n");
  end
  $display("FINAL TOTAL NUMBER OF CYCLES: %d\n\n", total_cycles);
  $display("***************************\n");

  $stop;
end

// memory model
always @(posedge mem_clk)
begin
    if (mem_we) // write
        dpsram[mem_addr] = mem_write_data;
    else // read
        mem_read_data = dpsram[mem_addr];
end

// track # of cycles
always @(posedge clk)
begin
    if (!reset_n)
        cycles = 0;
    else
        cycles = cycles + 1;
end

endmodule