Introduction
This report details the final
project I completed for the Fall 2005 offering of UCSC’s CMPE 125 course: Logic Design with Verilog. The projects that led up to this one included design of a simple
pipelined CPU, an SRAM controller, and an SRAM tester.
For
this and prior projects, we used the Altera Excalibur
development board (Fig. 1). This board
features an APEX FPGA, and an SO-DIMM slot in which we placed one 128Megabyte
SDRAM module, from Micron. The part
numbers are EP20K200EFC484-2X, and MT48LC8M16A2P-75, respectively; and the data
sheets can be found at http://www.altera.com/literature/ds/apex.pdf
and http://download.micron.com/pdf/datasheets/dram/sdram/128MSDRAM.pdf,
respectively.

Fig.
1: Altera APEX / Excalibur Development Kit
The
SDRAM module contained four of the previously mentioned SDRAM chips, addressed
such that each chip stored 16 bits of one 64-bit memory location. Internally, these addresses are structured as
a three-dimensional address of 4 banks, 4096 rows per bank, and 512 columns per
row; giving the module a total of 8M memory addresses of 64bits.
We
used Quartus version 4.2 with ModelSim
to develop and simulate our Verilog, and we built and
ran C code using the Nios Cross Development Kit.
The
goal of the project was very simple – design a Verilog-based
module to interface between a Nios CPU and the actual
SDRAM hardware, which will both control the read/write functions of the SDRAM
as well as perform an exhaustive test for electrically stuck address lines, data
lines, or data bits. For example,
address line A0 might be connected to A1, which will result in both lines
behaving identically rather than separately; or one bit within the SDRAM might
never change from binary 0 to binary 1.
A CPU-controlled “fault-injection logic” module was given to us for
testing faults on a known-good SDRAM chip, but we were also expected to
recognize actual bad RAM. The only
constraints our tester had were that it had to check for and recognize any
possible electrical errors, and it had to do it under
45 seconds.
Some
of the main challenges of this project included learning the interface and
signal timing for the SDRAM; working with the Nios
Avalon bus architecture to present useful data to the CPU and any running C
code; and, most importantly, devising a testing algorithm that meets the
project requirements.
Design
This project required a careful
study of the timing characteristics of the SDRAM device. SDRAM is synchronous with the clock supplied to
it, so the frequency of that clock determines much of the timing requirements
of any controlling signals. The system
clock that was available for this project had a frequency of 33⅓MHz, or a
period of 30ns. I determined a CAS
latency of 2 is sufficient to meet the requirements of the chip, given this
clock period.
Once the timing issues had been
worked out, the next task was to create a simple controller Finite State
Machine to generate the proper signals associated with a read or write command. The
state diagram for this is included as Appendix A.
SDRAM can operate in single read /
single write mode, or in some combination of burst modes, meaning that each read or write command will
sequentially operate through several columns in a row at once. Since a write to and read from every bit of
the memory is a requirement of the memory tester, and since there was a hard
limit on the runtime of the testing algorithm, I decided to implement full page burst reading and writing;
meaning all 512 columns in one row would be written to in one command. This reduces the amount of time the SDRAM
spends waiting for additional addressing commands or waiting for the CAS
latency to be met, thus reducing the total run time for the algorithm.
Since
SDRAM is dynamic, it must be
refreshed periodically. As a requirement
for the device I used, each row must be issued an auto-refresh command every
64ms. Divided by the number of rows
(4096), this comes to one auto-refresh command per 15.625us, at maximum. Coincidentally, one full-page read or write
command will require at most 516 clock cycles at 30ns, which works out to
15.48us – leaving 4 clock cycles afterwards before a refresh command would be
considered late. Combining this with the
assumption that my memory tester will not spend any idle time during its
operation, I took advantage of this timing in my controller state machine
design, and hard coded a refresh command to immediately follow every burst read
or write.
During
a read or a write, one column of data is transmitted every clock cycle. For the controller to know precisely when a
full page has been read or written, it must take advantage of a timer that can
count 512 clock cycles. This timer does
not need to count in natural binary, because its intermediate values will not
be used. Since I had decided to learn
the implementation of Linear
Feedback Shift Registers for previous assignments, and since they can be
used in place of adders for a much faster and smaller solution to counting to a
power of 2, I implemented a 9-bit (29-1 = 511) LFSR to perform this
timing function. I seed this LFSR with a
value that will result in the 511th value being the all-ones
state. To determine if the LFSR contains
the all-ones state, it counts the number of consecutive ones that get shifted
in to the LFSR using a second 9 bit shift register, which is reset to zero if
any zero is to be shifted in. The most
significant bit of this register is then used as a terminal count signal, which
is received by the controller.
To
write to every row, I had to implement another counter that produces a terminal
count at 4096. I did this using a 14-bit
LFSR, in a manner similar to the previous one.
SDRAM
must also be initialized before it can be used.
Essentially, the requirements are to send a short series of specific
commands after at least 100us has passed since the power up of the device. This initialization sequence is controlled by
a second FSM. As another convenience, I
added an additional shift register to my 9-bit LFSR to count seven terminal
counts, or 107.31us, to act as a 100us timer for this state machine. The implementation of this FSM is fairly
trivial, so an exact state diagram will not be given. At the end of the initialization sequence, it
issues a signal indicating that the controller FSM can begin to accept
commands.
The
final FSM I designed implemented the memory tester algorithm. Going back to the requirements of the
project, the tester must look for three things: stuck data lines, stuck address
lines, and stuck data bits. Most of my
peers chose to implement a testing algorithm that tests for all three faults in
one operation. Since each individual
data bit must be checked, and we have a data bus of 64 bits, this approach
results in an algorithm that requires at least 64 reads and 64 writes to every
address. The math shows that those
operations would cost a total of at least 32.46 seconds. Instead, I decided to search for an algorithm
that takes advantage of the fact that we are testing four individual chips of
16bit data width, and perform four data bit tests in one read/write. Any algorithm that attempts to reduce the
problem down to those four chips must first run tests ensuring that there are
no stuck data lines or stuck address lines between the chips, since the same
data will be written to all at once.
Therefore,
my memory tester algorithm has three stages.
The first stage checks for stuck data lines by performing a 64 bit
walking-ones read/write on a single address.
Unfortunately, the capability of my controller FSM is limited to a full
page read/write, so the implementation of this test writes the pattern to 512
addresses. By using a walking 1’s
pattern across the 64 bits, data lines that are stuck both internally and
externally to each chip are discovered.
The
second stage checks for stuck address lines.
Consider the simplified case in which only two address lines exist, 0
and 1. A test which writes pattern X to
both addresses first, then reads pattern X and writes pattern Y to each address
in turn will discover if those addresses are stuck together when it finds
pattern Y before it expects to. However,
since the SDRAM module has internal address lines for each column, “pattern X”
and “pattern Y” must consist of 512 patterns each, unique to themselves and
each other. Fortunately, I had already
created a device that creates 512 unique patterns – the 9-bit LFSR. Therefore, my implementation of this test
writes the contents of the 9-bit LFSR to every address, then reads that data
and writes the inverted value back for every address.
After
the previous two stages complete with success, the tester knows that all data
and address lines are valid, and it can perform the reduced test for individual
misbehaving bits. It implements another
walking-ones pattern, this time only 16-bits wide. This pattern is physically repeated through
the entire 64-bit data bus, so that each chip gets one copy of the
pattern. Each pattern is then written to
and read from every address, until it completes.
Doing
additional timing analysis, this implementation requires a total of 35 full
page operations to every address, which results in an overall operation time of
approximately 8 seconds. The final state
diagram for this FSM is included as Appendix B.
The top-level design of this project
included the interface to the Nios processor and
Avalon bus. C code run on the Nios can request several different types of data from the
memory tester, including its current status and whether or not the test was
successful. These requests are handled
as simple signals that control the contents of the data bus output to the
processor. A block diagram of my entire
design is shown in Fig. 2.

Fig.
2: System-level block diagram
Results
This project was a tremendously
interesting debugging experience. With
Murphy’s Law in effect, everything that could go wrong,
did. The debugging techniques I
developed from this project are perhaps the most important things I learned in
the class as a whole.
The hardest bug I encountered was
that my finished tester performed as expected in the ModelSim
environment, but hung during the pass-through test when run on the board. My first step to debugging this was to ensure
the Phase-Lock Loop (PLL) was set to properly synchronize the FPGA clock with
the SDRAM clock, and it was. Then, after
a series of small changes, I discovered a Full Compile command issued in Quartus would sometimes not notice changes made to
hierarchically lower files, and thus I was not seeing the changes I was
making. After I sorted that out, I added
in a state register to my top level design that was constantly updated with the
bitwise OR of itself and the state bits from my memory tester FSM. Since I used one-hot encoding for my state
bits, this would tell me which states my FSM had visited. I then displayed this as output at regular
intervals in the C code. This confirmed
that my memory tester FSM was never getting past a certain state, and always
looping back.
Further inspection of this showed
that it never transitioned from state AL_inc_14A to AL_read_A
(shown in Appendix B), which means the signal tc_14blfsr was never going high at
that time. Double checking the
simulation, I found that it in fact was going high at the correct time. As a test, I removed the pipelined method of
counting the LFSR terminal count and replaced it with a unary AND of each bit
of the LFSR, which would go high at the same expected time (all-ones state). This, in fact, fixed the problem. Since the problem originated with that
pipelined terminal count, and it was not apparent in simulation, I decided not
to attempt to fix it in the interest of time.
After all the problems were sorted
out of my design, it ran as expected!
The pass-through test takes approximately 8.6 seconds, and with the
fault-injection logic turned on, it detects all faults in at most 250
milliseconds. The reason for the large
discrepancy is that the fault-injection logic cannot properly simulate a single
stuck bit, but can only simulate an entire stuck data line. Since the majority of the time taken in the
tester is to cover for single stuck bits, and since my implementation does this
last, it makes sense that any other fault would show failure in that amount of
time.
Conclusions
Overall, my design and my chosen
algorithm worked very well. However,
there are three major areas where I think my design is less than perfect. First, because of my chosen algorithm, my
memory tester FSM was rather complicated.
Almost every state had a conditional transition (or two)! But for that algorithm, I believe my
implementation is optimal.
Secondly, while the use of LFSRs did reduce
the amount of logic necessary for the design and kept the clock out low, they
also introduced a significant amount of design complexity which eventually
resulted in the major bug I talked about previously. Finally, my design’s functionality suffered
very slightly because it cannot perform reads and writes of arbitrary length,
although it made up for it in simplicity.
Appendices
Altera Development Board image Copyright © Altera
All other
images Copyright © 2005 Dan Healy
Errors? Comments? Job
offers? Contact Dan Healy: hl_tdc@yahoo.com