Final Report: UCSC’s CMPE 125 Class, Fall ‘05

Verilog SDRAM Controller & Memory Tester

 

By: Dan Healy

2005/12/05

 

 

Introduction

 

            This report details the final project I completed for the Fall 2005 offering of UCSC’s CMPE 125 course: Logic Design with Verilog.  The projects that led up to this one included design of a simple pipelined CPU, an SRAM controller, and an SRAM tester.

 

For this and prior projects, we used the Altera Excalibur development board (Fig. 1).  This board features an APEX FPGA, and an SO-DIMM slot in which we placed one 128Megabyte SDRAM module, from Micron.  The part numbers are EP20K200EFC484-2X, and MT48LC8M16A2P-75, respectively; and the data sheets can be found at http://www.altera.com/literature/ds/apex.pdf and http://download.micron.com/pdf/datasheets/dram/sdram/128MSDRAM.pdf, respectively.

 

Fig. 1: Altera APEX / Excalibur Development Kit

 

The SDRAM module contained four of the previously mentioned SDRAM chips, addressed such that each chip stored 16 bits of one 64-bit memory location.  Internally, these addresses are structured as a three-dimensional address of 4 banks, 4096 rows per bank, and 512 columns per row; giving the module a total of 8M memory addresses of 64bits.

 

We used Quartus version 4.2 with ModelSim to develop and simulate our Verilog, and we built and ran C code using the Nios Cross Development Kit.

 

The goal of the project was very simple – design a Verilog-based module to interface between a Nios CPU and the actual SDRAM hardware, which will both control the read/write functions of the SDRAM as well as perform an exhaustive test for electrically stuck address lines, data lines, or data bits.   For example, address line A0 might be connected to A1, which will result in both lines behaving identically rather than separately; or one bit within the SDRAM might never change from binary 0 to binary 1.  A CPU-controlled “fault-injection logic” module was given to us for testing faults on a known-good SDRAM chip, but we were also expected to recognize actual bad RAM.  The only constraints our tester had were that it had to check for and recognize any possible electrical errors, and it had to do it under 45 seconds.

 

Some of the main challenges of this project included learning the interface and signal timing for the SDRAM; working with the Nios Avalon bus architecture to present useful data to the CPU and any running C code; and, most importantly, devising a testing algorithm that meets the project requirements.

 

 

 

Design

 

            This project required a careful study of the timing characteristics of the SDRAM device.  SDRAM is synchronous with the clock supplied to it, so the frequency of that clock determines much of the timing requirements of any controlling signals.  The system clock that was available for this project had a frequency of 33⅓MHz, or a period of 30ns.  I determined a CAS latency of 2 is sufficient to meet the requirements of the chip, given this clock period.

 

            Once the timing issues had been worked out, the next task was to create a simple controller Finite State Machine to generate the proper signals associated with a read or write command.  The state diagram for this is included as Appendix A.

 

            SDRAM can operate in single read / single write mode, or in some combination of burst modes, meaning that each read or write command will sequentially operate through several columns in a row at once.  Since a write to and read from every bit of the memory is a requirement of the memory tester, and since there was a hard limit on the runtime of the testing algorithm, I decided to implement full page burst reading and writing; meaning all 512 columns in one row would be written to in one command.  This reduces the amount of time the SDRAM spends waiting for additional addressing commands or waiting for the CAS latency to be met, thus reducing the total run time for the algorithm.

 

Since SDRAM is dynamic, it must be refreshed periodically.  As a requirement for the device I used, each row must be issued an auto-refresh command every 64ms.  Divided by the number of rows (4096), this comes to one auto-refresh command per 15.625us, at maximum.  Coincidentally, one full-page read or write command will require at most 516 clock cycles at 30ns, which works out to 15.48us – leaving 4 clock cycles afterwards before a refresh command would be considered late.  Combining this with the assumption that my memory tester will not spend any idle time during its operation, I took advantage of this timing in my controller state machine design, and hard coded a refresh command to immediately follow every burst read or write.

 

During a read or a write, one column of data is transmitted every clock cycle.  For the controller to know precisely when a full page has been read or written, it must take advantage of a timer that can count 512 clock cycles.  This timer does not need to count in natural binary, because its intermediate values will not be used.  Since I had decided to learn the implementation of Linear Feedback Shift Registers for previous assignments, and since they can be used in place of adders for a much faster and smaller solution to counting to a power of 2, I implemented a 9-bit (29-1 = 511) LFSR to perform this timing function.  I seed this LFSR with a value that will result in the 511th value being the all-ones state.  To determine if the LFSR contains the all-ones state, it counts the number of consecutive ones that get shifted in to the LFSR using a second 9 bit shift register, which is reset to zero if any zero is to be shifted in.  The most significant bit of this register is then used as a terminal count signal, which is received by the controller.

 

To write to every row, I had to implement another counter that produces a terminal count at 4096.  I did this using a 14-bit LFSR, in a manner similar to the previous one.

 

SDRAM must also be initialized before it can be used.  Essentially, the requirements are to send a short series of specific commands after at least 100us has passed since the power up of the device.  This initialization sequence is controlled by a second FSM.  As another convenience, I added an additional shift register to my 9-bit LFSR to count seven terminal counts, or 107.31us, to act as a 100us timer for this state machine.  The implementation of this FSM is fairly trivial, so an exact state diagram will not be given.  At the end of the initialization sequence, it issues a signal indicating that the controller FSM can begin to accept commands.

 

The final FSM I designed implemented the memory tester algorithm.  Going back to the requirements of the project, the tester must look for three things: stuck data lines, stuck address lines, and stuck data bits.  Most of my peers chose to implement a testing algorithm that tests for all three faults in one operation.  Since each individual data bit must be checked, and we have a data bus of 64 bits, this approach results in an algorithm that requires at least 64 reads and 64 writes to every address.  The math shows that those operations would cost a total of at least 32.46 seconds.  Instead, I decided to search for an algorithm that takes advantage of the fact that we are testing four individual chips of 16bit data width, and perform four data bit tests in one read/write.  Any algorithm that attempts to reduce the problem down to those four chips must first run tests ensuring that there are no stuck data lines or stuck address lines between the chips, since the same data will be written to all at once.

 

Therefore, my memory tester algorithm has three stages.  The first stage checks for stuck data lines by performing a 64 bit walking-ones read/write on a single address.  Unfortunately, the capability of my controller FSM is limited to a full page read/write, so the implementation of this test writes the pattern to 512 addresses.  By using a walking 1’s pattern across the 64 bits, data lines that are stuck both internally and externally to each chip are discovered.

 

The second stage checks for stuck address lines.  Consider the simplified case in which only two address lines exist, 0 and 1.  A test which writes pattern X to both addresses first, then reads pattern X and writes pattern Y to each address in turn will discover if those addresses are stuck together when it finds pattern Y before it expects to.  However, since the SDRAM module has internal address lines for each column, “pattern X” and “pattern Y” must consist of 512 patterns each, unique to themselves and each other.  Fortunately, I had already created a device that creates 512 unique patterns – the 9-bit LFSR.  Therefore, my implementation of this test writes the contents of the 9-bit LFSR to every address, then reads that data and writes the inverted value back for every address.

 

After the previous two stages complete with success, the tester knows that all data and address lines are valid, and it can perform the reduced test for individual misbehaving bits.  It implements another walking-ones pattern, this time only 16-bits wide.  This pattern is physically repeated through the entire 64-bit data bus, so that each chip gets one copy of the pattern.  Each pattern is then written to and read from every address, until it completes.

 

Doing additional timing analysis, this implementation requires a total of 35 full page operations to every address, which results in an overall operation time of approximately 8 seconds.  The final state diagram for this FSM is included as Appendix B.

           

            The top-level design of this project included the interface to the Nios processor and Avalon bus.  C code run on the Nios can request several different types of data from the memory tester, including its current status and whether or not the test was successful.  These requests are handled as simple signals that control the contents of the data bus output to the processor.  A block diagram of my entire design is shown in Fig. 2.

 

Fig. 2: System-level block diagram

 

Results

 

            This project was a tremendously interesting debugging experience.  With Murphy’s Law in effect, everything that could go wrong, did.  The debugging techniques I developed from this project are perhaps the most important things I learned in the class as a whole.

 

            The hardest bug I encountered was that my finished tester performed as expected in the ModelSim environment, but hung during the pass-through test when run on the board.  My first step to debugging this was to ensure the Phase-Lock Loop (PLL) was set to properly synchronize the FPGA clock with the SDRAM clock, and it was.  Then, after a series of small changes, I discovered a Full Compile command issued in Quartus would sometimes not notice changes made to hierarchically lower files, and thus I was not seeing the changes I was making.  After I sorted that out, I added in a state register to my top level design that was constantly updated with the bitwise OR of itself and the state bits from my memory tester FSM.  Since I used one-hot encoding for my state bits, this would tell me which states my FSM had visited.  I then displayed this as output at regular intervals in the C code.  This confirmed that my memory tester FSM was never getting past a certain state, and always looping back.

 

            Further inspection of this showed that it never transitioned from state AL_inc_14A to AL_read_A (shown in Appendix B), which means the signal  tc_14blfsr was never going high at that time.  Double checking the simulation, I found that it in fact was going high at the correct time.  As a test, I removed the pipelined method of counting the LFSR terminal count and replaced it with a unary AND of each bit of the LFSR, which would go high at the same expected time (all-ones state).  This, in fact, fixed the problem.  Since the problem originated with that pipelined terminal count, and it was not apparent in simulation, I decided not to attempt to fix it in the interest of time.

 

            After all the problems were sorted out of my design, it ran as expected!  The pass-through test takes approximately 8.6 seconds, and with the fault-injection logic turned on, it detects all faults in at most 250 milliseconds.  The reason for the large discrepancy is that the fault-injection logic cannot properly simulate a single stuck bit, but can only simulate an entire stuck data line.  Since the majority of the time taken in the tester is to cover for single stuck bits, and since my implementation does this last, it makes sense that any other fault would show failure in that amount of time.

 

Conclusions

 

            Overall, my design and my chosen algorithm worked very well.  However, there are three major areas where I think my design is less than perfect.  First, because of my chosen algorithm, my memory tester FSM was rather complicated.  Almost every state had a conditional transition (or two)!  But for that algorithm, I believe my implementation is optimal.  Secondly, while the use of LFSRs did reduce the amount of logic necessary for the design and kept the clock out low, they also introduced a significant amount of design complexity which eventually resulted in the major bug I talked about previously.  Finally, my design’s functionality suffered very slightly because it cannot perform reads and writes of arbitrary length, although it made up for it in simplicity.

 

 

Appendices

 

  1. SDRAM Controller FSM State Diagram
  2. Memory Tester FSM State Diagram
  3. Gantt Chart
  4. Full Compilation Reports
  5. Complete Verilog Code

 

Altera Development Board image Copyright © Altera

All other images Copyright © 2005 Dan Healy

Errors? Comments? Job offers? Contact Dan Healy: hl_tdc@yahoo.com