Area-Efficient FPGA Implementation of Cryptographic SHA3-512

International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
Area-Efficient FPGA Implementation of
Cryptographic SHA3-512
1
Nayana M S, 2Mrs. Bindu A U,
1
VLSI & Embedeb systems, Dept of ECE, SIET, Tumkur, India.
2
Assistant professor, Dept of ECE, SIET, Tumkur, India.
Abstract — SHA (Secure Hash Algorithm) is the
condensed representation of binary data. A
cryptographic hash function is a deterministic process
whose input is arbitrary random block of data and
produces an output of fixed size, which is known as the
hash value. These functions were initially introduced to
provide
information
security,
integrity
and
authentication.In recent years there have been serious
and alarming cryptanalytic attacks on several
commonly used hash functions, such as MD4, MD5,
SHA0, SHA1 and SHA2. This culminated with the
design of SHA3 for 512 bits, based on “Keccak
algorithm” which is logically optimized for area
efficiency, best throughput, enhanced operating
frequency and reduced latency by integrating Rho, Pi
and Chi steps of algorithm into a single step. SHA3 also
provides stringent security properties including preimage resistance and collision resistance. This work
presents a compact design of newly selected Secure
Hash Algorithm (SHA-3) by dividing the basic Keccak
architecture in to padder module and permutation
module that reflects the sponge construction. The
modules are designed, simulated and verified using
Xilinx ISI Design Suite 14.5 software tool and
implemented on Xilinx Spartan 6 Field programmable
Gate Array (FPGA) device.
Keywords - cryptographic hash function; SHA1, SHA2
and SHA3; Keccak algorithm; sponge construction.
I. INTRODUCTION
In recent days, security is a big risk in the
transmission medium due to the development of the
Internet and multimedia contents such as audio,
video, image, etc. It enables us to easily obtain digital
contents via the net. However, it causes several
problems, such as infraction of ownership and illegal
distribution of the copy.
The method followed to address this security issues is
based on cryptography technique. The technique is
based on hashing function. Cryptography is a method
of storing and transmitting data in a particular form
so that only those for whom it is intended can read
and process it. It is one of the most useful fields in
the wireless communication area and personal
communication systems, where information security
has become more and more important area of interest.
ISSN: 2231-5381
In order to make a very secure cryptographic portable
electronic device, the selected well-known algorithm
must be trusted, time-tested and widely peerreviewed in the global cryptographic community.
Cryptographic algorithms supervise the specific
information on security requirements such as data
authentication but not encryption, data confidentiality
and data integrity. The function of authentication
services is to assure recipient that the message is
from the source it claims to be. The data integrity
assures that information and programs are changed
only in a specified and authorized manner. The date
confidentiality assures that the private or confidential
information is not made available to unauthorized
individual.
A cryptographic hash function should be highly
sensitive to the smallest change in the input message.
A small change in single digit in the input message
should produce a large change in the output hash
value of the message. The message can be a plaintext
file, a soft ware, or executable program.
SHA is also called “Message Digest” or
“Fingerprint” because it is the condensed
representation of electronic data and are easy to
generate for a given file. The hash algorithms are
typically composed of a compression function that
operates on fixed-length pieces of the input and the
process is repeated until all the input blocks are
consumed. SHA-3 posses the following important
and stringent security properties:
Collision resistance: Computationally unviable to
produce two messages with same message digest.
Pre-image resistance: Impossible to recreate a
message from a given message digest.
This project work presents an efficient design and
implementation of keccak SHA-3 standard by
dividing the basic architecture [Fig. 2] in to smaller
modules that exactly reflects the sponge construction
[Fig.1] from which algorithm can be easily generated.
http://www.ijettjournal.org
Page 455
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
FPGAs are ideal platform for the implementation of
cryptographic algorithms, because modern FPGAs
are equipped with enhanced embedded resources
such as BRAMs, dedicated memory controller blocks
(MCBs), PLL, Global Clock Lines, Digital Signal
Processing (DSP) blocks in addition to LUTs and
CLBs that can be used to optimize the
implementations.
blocks and never output during the squeezing phase.
The capacity c actually determines the attainable
security level of the construction.
The rest of the paper is organized as follows: Section
II briefly presents the Hash technology, Section III
introduces the proposed architectures, Section IV
includes the FPGA synthesis results and comparisons
with previous woks, while the paper conclusions are
discussed in the last section.
Figure 1: Sponge construction
II. HASH TECHNOLOGY
SHA3 supports four fixed-output-length variants i.e.
Hash function is a family of sponge functions. The
sponge construction (shown in Figure 1) is a simple
iterated construction for building a function f with
variable-length input and arbitrary output length
based on a fixed-length transformation or
permutation operating on a fixed number b of bits.
Here b is called the width.
The sponge construction builds a function SPONGE [
f , pad, r] using a fixed-length transformation or
permutation f , a sponge-compliant padding rule
“pad” and a parameter bit-rate r. A finite-length
output can be obtained by truncating it to its ℓ first
bits. This instance of the sponge construction is
called sponge function. The sponge construction
operates on a state of b = r + c bits. The sum r+c
determine the width of the permutation used in the
sponge construction and are restricted to values in
{25, 50, 100, 200, 400, 800, 1600}.
The sponge construction processes the message in
two phases:
Absorption: The sponge state initially consists of all
zeros. The first input block of length r is XORED
with r bits of the state; and transform functions are
applied on the state. Next input block is then XORED
with this state like the previous one and transformed.
This continues till all the input is consumed.
Squeezing: The outer part of the state is iteratively
returned as output blocks, interleaved with
applications of the function f . The number of
iterations is determined by the requested number of
bits ℓ.
Finally the output is truncated to its first ℓ bits. The cbit inner state is never directly affected by the input
ISSN: 2231-5381
n ∈ {224, 256, 384, 512}. The 4 output lengths and
the corresponding required capacity, rate with
associated security levels are listed in Table 1.
Table I: output lengths supported by SHA3.
Out
Colli
Pre-
Requ
Requ
SHA-3
-put
-sion
image
-ired
-ired
inst
length
resistan
resist
capacity
Rate(r)
-ance
-ce
-ance
(c)
n=
s <=
s <=
1152
SHA3
224
112
224
n=
s <=
s <=
256
128
256
n=
s <=
s <=
384
192
384
n=
s <=
s <=
512
256
512
448
n224
512
1088
SHA3
n256
768
832
SHA3
n384
1024
576
SHA3
n512
s: security strength level.
The sequential Keccak SHA3-512 architecture is
shown in Figure 2. The architecture has 128-bit input
data just to save extra input bits. The next block is the
padder block which pads the required number of
zeros with the input data in order to form 1600-bit
state and then inversion is applied on each byte. The
output from the padder block is forwarded to 2 x 1
Multiplexer (MUX) which drives the output data
from padder to the compression-box of the
architecture and selects the input data for the first
round and feedback data for other twenty three
rounds with the help of controlling signal (Ctrl 1).
http://www.ijettjournal.org
Page 456
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
The basic architecture is divided in to two modules
they are: 1) Padder module and 2) Permutation
module (shown in Figure 3 and 4 respectively), and
they exactly reflects the sponge construction. The
total area covered and the operating frequency of the
project
is
compared
with
other
SHA-3
implementations and listed in Table II.
Figure 2: The Basic block diagram of Keccak SHA3- 512.
When Ctrl 1 is low, MUX select the input data and at
high, MUX will select the feedback data. The padded
message is directly copied to Reg_A and the 1600
bits are arranged in 5x5 matrixes of 64-bits and
resulting bits are forward to Compression-Box (CBox). It is basically the implementation of
compression function in SHA-3 algorithm which
comprises of thetha (Θ), rho (ρ), pi (π), chi (χ) and
iota (i) step. The key feature of this algorithm is that
the rho (ρ), pi (π) and chi (χ) steps of C-Box are
implementing as a single step. This results in saving
of hardware resources and also logically optimizes
the design. After completing 24 iterations, final
output is forwarded to Reg B for storage in order to
synchronize the data-path. The last component in the
architecture is the Truncating component where
inversion per byte is performed on the output bits and
then truncated to the desired length of hash output.
III. PROPOSED METHODOLOGY
Figure 3: Block Diagram of padder module
Padder module: The padder module consists of Reg
A, shifter, 2:1 mux, 576-bit buffer as shown in Fig 3.
From the given message, first 32 bits of data is
temporarily stored in Reg_A and data is forwarded to
shifter. If the control signal In_ready is high, then it
indicates that the 32 input bits are ready and if the
control signal In_ready is low, then it indicates that
all blocks of message are consumed. The shifter will
left shift the data by 32 times and then forwarded to
buffer. The buffer is of 576-bit wide, the new 32 bits
of data is consumed only when the buffer is not full.
In the second round, the data in the buffer is left
shifted by 32-bits and get concatenated with new 32
input bits. The process continues until the 576-bit
buffer is full and the padder output is forwarded to
permutation module. The next data blocks are
padded, if the padder module receives the
acknowledgement signal ackn from permutation
module.
The design techniques were proposed in the basic
SHA-3 architecture (shown in Fig 2) in order to
achieve better time performance. In order to achieve
the main objective of the project work i.e. low-area
constraint, the basic architecture is designed using
divide and conquer approach. From the divide and
conquer technique, the required algorithm can be
easily obtained by generating sponge function.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 457
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
SHA-3. The Truncating block becomes active only
when the control signal Message_full is high. The
control signal Message_full is high, if and only if all
the input data blocks are consumed. The permutation
operation compresses the data such that if any
manipulations occurred in confidential files to be
transmitted leads to change in hash value. The c-bit
i.e. 1024 zeroes are never directly affected by the
input blocks and never output during the squeezing
phase. The capacity c actually determines the
attainable security level of the construction.
IV. IMPLEMENTATION RESULTS AND
COMPARISON
The designs has been implemented and verified on
Xilinx ISI Design Suite, System Edition 14.6 tool.
The targeted device for the implementation was a
Xilinx Vertex 6. Each step of SHA- 3 design has
been implemented and tested as an individual
module. These modules were instantiated in the main
code of the design to examine its results in detail.
Figure 4: Block Diagram of permutation module
The permutation module performs 2 main functions:
1) f-permutation and 2) Truncation as shown in Fig 4.
For the padded 576 bits of data remaining 1024
zeroes will be added such that r+c=1600. The 1600
bits are arranged as 5x5 state arrays with 64 bit word
length. If the control signal First_round is high then
the padded data will be applied for transformation
block and immediately the control signal First_round
is disabled, such that no more padded data are
allowed. As and when the padded data is consumed
by permutation module an acknowledgement signal
Ackn will be sent to padder module to pad next block
of data.
Transformation is the main stage in the permutation
module in which each round is sub-divided into five
steps i.e. Theta (Θ), Rho (ρ) and Pi (π), Chi (χ), Iota
(i) [4]. The transformed data is stored temporally in
the register and applied for 24 rounds of
transformations. The 24 such iterations reflect the
trade-off between performance and safety margin
made in the design but finally, the proposed design
come up with collision free hash function. Round
constants are the 64-bit constant values that need to
be substituted during transformations. Depending
upon the iteration count during transformations, the
Round constant values are substituted. The counter
will monitor the iteration rounds and the Round
constant value will change according to count value.
The Truncating block performs squeezing operation
by truncating the remaining LSB bits and the MSB
512 bits obtained will be the final hash value of
ISSN: 2231-5381
Table II shows the implementation results of above
SHA-3 hash core in terms of Area, Frequency and
Throughput (TP). The maximum operating frequency
achieved is 368.72MHz with a throughput of 8.5
Gbps and the design takes 220 CLB slices with 24
clock cycles are required to reach final hash value.
The proposed design results are compared with
previously reported FPGA based hardware designs of
SHA-3 in open literature in terms of area, frequency
and throughput (TP) in Table II. The focus in this
work is to utilize minimum area resources with
sufficient TP.
The design reported in S. Kerckhof et al. [11] is
utilizing minimum number of area resources and
needs 2154 number of clock cycles for final hash
value that results in less TP as compared to other
designs.
Table II: Comparison results of SHA3-512
Implemen
-tation
Technology
Slices
Freque
TP
-ncy
(Gbps)
(MHz)
Proposed
Design
Virtex6
220
368.72
8.5
[3]
V5
240
301.02
7.224
http://www.ijettjournal.org
Page 458
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
[2] Guido Bertoni, Joan Daemen, Michaël Peeters
and Gilles Van Assche. “Keccak sponge function
family main document”, version 1.2, April 2009.
[11]
V6
188
285
0.08
[13]
V4
2024
143
6.07
[14]
V5
1229
238.4
1.0805
[10]
V5
2573
285
5.70
[12]
V5
1197
263.16
6.32
[15]
V5
1220
-
6.56
[2]
V5
444
265
0.07
[16]
0.13
µm
-
250
10.67
0.13
µm
-
[17]
0.1
[3] FIPS-202, “Federal information processing
standards publication fips-202, secure hash
algorithm-3 (sha-3),” 2014.
[4] “Compact Implementation of SHA3-512 on
FPGA” by Alia Arshad, Dur-e-Shahwar kundi,
Arshad Aziz. Department of Electrical Engineering
National University of Sciences and Technology
Islamabad, Pakistan.
4.4
Mbps
The TP reported by A. Akin et al. [13] and Kris Gaj
et al. [14] is better than the previous designs but
requires much more hardware resources.
The designs reported in K.Latif et al. [12] and
E.Hom. et al. [15] shows the better TP of 6.32 and
6.56 respectively which is still low as compared to
our compact design, but these designs utilizes large
number of slices. The above comparison shows that
our design is better than previously reported FPGA
implementations in terms of TP 8.5.
V. CONCLUSION
This work presents the design for compact hardware
implementation of SHA3-512. The tradeoff between
area and throughput is well balanced and the
proposed design present the best possible results both
in term of area and throughput as compared to
previous reported results. The logical optimization by
using divide and conquer technique in building
architecture, merging the three transforms i.e. rho, pi
and chi in to a single transform and by exploring
maximum parallelism in the algorithm are the
contributing factors. This optimization results in
overall reduced latency which significantly enhanced
the system performance.
REFERENCES
[1] “Cryptography & Network Security Principles &
Practice, 5th edition, William Stalling.
[5] “The KECCAK reference Version 3.0” by G.
Bertoni, J. Daemen, M. Peeters, and G. Van Assche,
January 2011.
[6] “Keccak Specifications”, Submission to NIST
(Round 3), January 2011, by G. Bertoni, J. Daemen,
M. Peeters, and G. Van Assche.
[7] “Performance analysis of sha-2 and sha-3
finalists” by Ram Krishna Dahal, Jagdish Bhatta,
Tanka Nath Dhamala. Central Department of
Computer Science & IT, Tribhuvan University,
Kathmandu, Nepal.
[8] “Pushing the Limits of SHA-3 Hardware
Implementations to Fit on RFID” by Peter Pessl and
Michael Hutter, Institute for Applied Information
Processing and Communications (IAIK), Graz
University of Technology, Inffeldgasse 16a, 8010
Graz, Austria.
[9] “Design of FPGA Based Encryption Algorithm
using KECCAK Hashing Functions” by
Deepthi Barbara Nickolas, Mr. A. Sivasankar, PG
Scholar, Department of ECE, Anna University:
Regional
Center,
Madurai,
Tamilnadu,
India..Assistant professor, Department of ECE, Anna
University: Regional Center, Madurai, Tamilnadu,
India.
[10] “FPGA-Based Design Approaches of Keccak
Hash Function” by George Provelengios, National
and Kapodistrian University of Athens, Athens,
Greece, Paris Kitsos, Computer Science, Hellenic
Open University Patras, Greece, Christos Koulamas,
Industrial Systems Institute Patras, Greece, Nicolas
Sklavos,
KNOSSOSnet
Research
Group,Technological Educational Institute of Patras
Patras, Greece.
[11] Stéphanie Kerckhof, François Durvaux, Nicolas
Veyrat-Charvillon, Francesco Regazzoni, Guerric
ISSN: 2231-5381
http://www.ijettjournal.org
Page 459
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015
Meurice de Dormale, François-Xavier Standaert,
“Compact FPGA implementations of the five SHA-3
finalists”, 10th IFIP Smart Card Research and
Advanced Applications 2011 (CARDIS 2011),
Leuven, Belgium, pp. 217-233, September 14-16,
2011.
[12] Kashif Latif, M Muzaffar Rao, Arshad Aziz and
Athar Mahboob,“Efficient hardware implementations
and hardware performance evaluation of SHA-3
finalists”, NIST Third SHA-3 Candidate Conference,
Washington D.C., March 22-23, 2012.
[13] Abdulkadir Akin, Aydin Aysu, Onur Can
Ulusel,
Erkay
Savas,
“Efficient
hardware
implementations of high throughput SHA-3
candidates Keccak, Luffa and Blue Midnight Wish
for single- and multi-message hashing”, NIST 2nd
SHA-3 Candidate Conference, Santa Barbara, August
23-24, 2010.
[14] K. Gaj, E. Homsirikamol, and M. Rogawski,
“Comprehensive
comparison
of
hardware
performance of fourteen round 2 sha-3 candidates
with 512-bit outputs using field programmable gate
arrays,” 2nd SHA-3 Candidate Conference, pp 23-24,
August 2010.
[15] E. Homsirikamol, M. Rogawski, and K. Gaj,
“comparing hardware performance of round 3 sha-3
candidates using multiple hardware architectures in
xilinx and altera fpgas,” ECRYPT II Hash Workshop,
pp. 1–15, 19-20 May 2011.
[16] Xu Guo, Meeta Srivastav, Sinan Huang, Dinesh
Ganta, Michael B.Henry, Leyla Nazhandali and
Patrick Schaumont, “Silicon implementation of SHA3 finalists: BLAKE, Grostl, JH, Keccak and Skein”,
ECRYPT II Hash Workshop 2011, Tallinn, Estonia,
19-20 May 2011.
[17] Elif Bilge Kavun and Tolga Yalcin, “A
lightweight implementation of Keccak hash function
for radio-frequency identification applications”,
Radio Frequency Identification: Security and Privacy
Issues, Lecture Notes in Computer Science, 2010.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 460