How to interface QDR-II+ synchronous SRAM with high-speed FPGAs (Part 1)

How to interface QDR-II+ synchronous
SRAM with high-speed FPGAs (Part 1)
Here's a closer look at the challenges and pitfalls, as well as the techniques to
optimise the system.
By Reshmi Ravindran
Applications Engineer
Cypress Semiconductor
and
Ajay Bharadwaj
Senior Applications Engineer
Cypress Semiconductors
Quad data rate synchronous static random access memory (SRAM) is an essential part of next-generation
networking equipment operating at higher throughput rates. QDR SRAM offers low latency compared to dynamic
random access memory (DRAM). The random transaction rates of QDR SRAM are higher than for DRAM, as well.
QDR SRAM modules are suited for high bandwidth applications and used for look up tables, buffering packets,
linked lists, etc. SRAMs are also a popular choice for Level 2 (L2) cache for FPGA-based systems. QDR SRAMs are
typically interfaced to application-specific networking processors or high-speed FPGAs. Getting the best
performance from both processor and memory requires properly interfacing the two. This article takes a closer
look at the challenges and pitfalls, and the techniques to optimise the system.
Basics of QDR-II+ SRAM
The latest offering of QDR SRAMs operate up to 633MHz and are touted to have improved data valid window to
enable host processors to capture data easily at high speeds. QDR uses two different input/output (I/O) ports: a
read port used to read from the memory and a write port used to write into the memory. There are independent
clock domains for the read and write ports. Data is written and read on both the rising and falling edge of a clock
(i.e., double data rate). Four data items are transmitted per clock cycle, and hence are called quad data rate
memories.
Let's review the hardware details of interfacing the QDR-II+ SRAM with an FPGA. QDR-II+ SRAMs are available in
densities from 18 Mb to 144 Mb. They are internally organised as having two or four blocks. These are available as
burst-of-two or burst-of-four devices; the names indicate the minimum number of data words that can be written to
or read from the memory in a single transaction.
Consider a QDR-II+ SRAM with 18 Mb density and having 18 data lines. This means it is organised as 1 Mb ×18. For
a burst-of-two device, 36 bits of data can be written and read in a single transaction (i.e., at the same time). For a
burst-of-four device, 64 bits of device can be read and written in a single transaction. Internally for a burst-of-two
device, there are two blocks of memory which are 512 K ×18 each and for a burst-of-four device, internally there
are four blocks of memory which are 256 K ×18 each.
The number of address lines for the burst-of-two devices is 19 and the number of address lines for the burst-of-four
device is 18 (figure 1). Both devices have 18 data lines for writing into the device and 18 separate data lines for
reading from the device. The address is indicated by “A,” the write port is indicated by “D,” and the read port is
indicated by “Q.”
Power requirements
The main power requirements for the SRAM are Vdd and Vddq. Vdd is the core power of the system. This is used to
power up the core of the memory and is used to keep the contents of memory intact. V ddq is the I/O power and is
responsible for input/output transactions. The voltage levels on the output lines are a function of the I/O power.
Typically for high-speed systems, the core voltage and I/O voltage are different.
Over the last few years, there has been a drastic reduction in operating voltage to save power. Having different core
and I/O voltages ensures that the high switching noise from the I/O will not affect the core voltage. Proper
bypassing and decoupling techniques have to be used to ensure proper power integrity of the system. This is very
important for reliable operation, especially when the memory is located far away from the power supply on the
board and the same power supply is used to power multiple chips in the design. A decoupling capacitor prevents
EE Times-India | eetindia.com
Copyright © 2012 eMedia Asia Ltd.
Page 1 of 4
voltage swings on power and ground lines, gives low impedance path from power to ground plane, and provides a
return path between power and ground planes.
Both Vdd and Vddq pins must have multiple capacitors. Small capacitors with low series inductance along with large
bulk capacitors must be placed in parallel to provide burst current at high-frequency transitions in the power
supply. Decoupling capacitors with low capacitance must be placed as close to the memory as possible, and bulk
capacitors must be placed close to the de-coupling capacitance. This will help to minimise the current loops and
hence lower the radiation in the system.
Figure 1a: Block diagram of 18 Mb (1 Mb ×18) QDR-II+ burst-of-two SRAM.
Figure 1b: Block diagram of 18 Mb (1 Mb ×18) QDR-II+ burst-of-four SRAM.
Clocking system for QDR-II+ device
The clocking system for QDR-II+ devices can be divided into input clocks and output clocks. Input clocks are
referred to as K and K# clocks. These are provided to the memory by the external controller. These are not
differential clocks but are single-ended; however, they are out of phase with each other by 180°. The rising edge of
the K clock is used to capture synchronous inputs on the device. All accesses are initiated on the rising edge of the K
clock. All synchronous data inputs pass to the input registers and to the core of the memory using the K and K#
clocks. The K and K# clocks also pass data from the memory core to the output registers.
The other set of clocks are the output clocks, CQ and CQ#. These clocks help to simplify data capture for high-speed
systems. The CQ clock is referenced with respect to the K clock, and CQ# is referenced with respect to the K# clock.
The CQ and CQ# clocks are generated by the QDR-II+ device and are called echo clocks. The data on the Q pins are
source synchronous with respect to the echo clocks. The user has to shift the echo clock to latch the data. The echo
clocks can be phase shifted through board trace delay or by using on-chip circuitry in an FPGA. If circuitry within an
FPGA is used to capture the echo clock, then the trace length of the CQ and CQ# clocks must be same as that of the Q
pins so that the FPGA can phase shift and capture the data accordingly.
A phase-locked loop (PLL) internal to the chip generates the echo clocks. The advantage of using echo clocks to
capture the data is that any jitter that present in the K/K# clocks does not propagate to the output clocks. There is a
pin on the QDR-II+ device called the DOFF# pin. This pin is used to switch on or switch off the PLL inside the device.
During power up when the DOFF# pin is tied high and 20 µs of stable K/K# clock is provided, then the PLL is locked
and the echo clock is generated synchronously to the K/K# clock. When the DOFF# is made low, the PLL is switched
off and there is sub-optimal performance of the memory. There is a minimum frequency for the K/K# clock for the
EE Times-India | eetindia.com
Copyright © 2012 eMedia Asia Ltd.
Page 2 of 4
PLL to lock; this frequency is provided by the QDR-II+ SRAM manufacturer in the datasheet. Using a frequency
below will not lock the PLL, which can affect memory performance.
Locking of the PLL is very critical for the proper operation of the memory device. The following conditions have to
be satisfied for the PLL to lock to the correct frequency:
• DOFF# must be high
• Stable K/K# clock has to be provided for a time specified in datasheet (20 µs).
Switching off the PLL of the device is used by the external controllers to train the memory. When the PLL is off, the
maximum speed of operation of the system is limited. The FPGA uses this mode to check for the operation of the
devices before actual memory operations begin.
Read and write operation
The control signals for the read and write operation are RPS#, WPS#, QVLD and BWS#. RPS# is sampled on the
rising edge of the K clock and a read operation is initiated when RPS# is low. A write operation is initiated on the
rising edge of the K clock when WPS# is low. BWS# is sampled on the rising edge of the clock and is used to write
selectively to one particular byte of the memory. De-selecting BWS# ignores the corresponding byte of data, so that
it is not written to the memory. The trace length for address lines, ‘D’ lines, and the control lines should be closely
matched. QVLD is an output signal that indicates valid output data. QVLD signal is edge-aligned to the CQ and CQ#
lines.
Programmable output driver impedance
The QDR-II+ SRAM chip has a pin called the ‘ZQ.’ A resistance has to be connected to this pin to ground. The value of
resistance connected adjusts the output driver impedance. The value of resistance must be five times the desired
output impedance of the driver; to obtain an output impedance of 50 Ω, for example, the value of ZQ should be equal
to 250 Ω.
For high-speed digital devices, terminating the driver impedance with the transmission-line impedance is critical to
proper signal integrity for the overall system. The impedance is matched by making the source impedance equal to
the load impedance. By changing the value of resistance connected to ZQ pin, the Zsource of the output drivers can be
changed accordingly (figure 2).
Zload must be equal to Zline, which in turn must be equal to Zsource. Zsource is controlled by ZQ, in this case. Zline is the
characteristic impedance of the trace that can be matched to be equal to Zsource. The PCB from the memory to the
memory controller acts like a transmission line. The Zload should also be matched near the FPGA end. The entire
system has to be simulated using IBIS models available for the memory to determine the actual termination values.
Figure 2: Configuration of output driver impedance.
Termination of signal lines
Signal integrity is a very important aspect of high-speed digital design and is also very important for interfacing of
QDR-II+ SRAM. The drive modes of the inputs and outputs for QDR-II+ SRAMs are high-speed transceiver logic
(HSTL). HSTL is a standard interface for digital ICs that calibrates the signal to a reference voltage rather than
ground. This enables smaller swings in I/O signal and improves performance by improving signal integrity. HSTL is
now becoming a de-facto standard for high-speed digital systems. HSTL requires a reference voltage level that is
50% of maximum voltage. This has to be provided to the Vref pin of the QDR-II+ SRAM.
Figure 3: HSTL I/O levels.
EE Times-India | eetindia.com
Copyright © 2012 eMedia Asia Ltd.
Page 3 of 4
It is very important to terminate all high-frequency signals because mismatched impedance causes signals to reflect
back and forth along the transmission lines, causing ringing and thus impacting the reliability of the system. To
eliminate reflection at the source, the impedance of the source must be matched with the transmission line
impedance. To eliminate reflection at the load, the impedance of the load must be matched with the impedance of
the trace.
Although multiple terminations schemes are available, the most popular and recommended method to terminate
the signal is to perform termination at the load with a pull up resistance to Vddq/2 (figure 4). This scheme requires a
separate voltage source that can sink and source currents to match the receiver outputs transfer rates. The value of
the pull up resistance can be adjusted to match the load and ensure signal integrity is proper.
Figure 4: Parallel termination at the load.
All input pins of the QDR-II+ SRAM must be terminated for proper signal integrity. The K/K# clocks must each be
terminated separately by having a pull up resistance to Vddq/2. K/K# signals are not fully differential signals and
common termination resistance between them is not recommended. The output driver impedance must be matched
accordingly to the board impedance for best performance. Certain parts in the QDR-II+ families have on-die
termination. These are resistances that are present within the chip and can be programmed according to the
termination required. These help in reducing external components and hence conserve board space.
Figure 5: Hardware connections for the interface between QDRII+ SRAM and FPGA.
The block diagram in figure 5 summarises all the connections required for designing the hardware for QDR-II+
SRAM. About the author
Reshmi Ravindran works as an Applications Engineer at Cypress Semiconductor and supports Cypress’ SRAM
products. She holds a Masters in VLSI & Embedded Systems from Model Engineering College, India and a Bachelors
in Electronics and Communication from Govt. Rajiv Gandhi Institute of Technology, India.
Ajay Bharadwaj is currently working with Cypress Semiconductors as a Senior Applications Engineer. He holds a
Bachelor’s degree in Electronics and Communication engineering. He was a co-founder of a medical device start-up.
His interests include analogue design, digital design and entrepreneurship.
EE Times-India | eetindia.com
Copyright © 2012 eMedia Asia Ltd.
Page 4 of 4