This is a multi-core implementation of the LC2K processor in (synthesizable?) SystemVerilog by Austin Maliszewski email@example.com.
About the LC2K
The LC2K is a toy processor that was designed as a teaching tool for students in the EECS 370 at the University of Michigan. In 370, students implement four LC2K simulators in C. My implementation in SystemVerilog is independent of EECS 370; the only thing they have in common is that this processor is "binary" compatible with the LC2K from the class.
In addition to the 8 instructions (add, nand, lw, sw, beq, jalr, noop, halt) that the 370 LC2K supports, I've added 8 additional instructions to help facilitate multicore operations (lwl, swc) and to make the processor a little more intelligent. The new arithmetic operations are: sub, and, or, nor, not, sl, and sr.
The general form of an arithmetic instruction is
[label] opcode r1 r2 rdest
The form of a load/store/beq is:
[label] opcode r1 r2 offset
where r1 and r2 are registers, and offset is a 16bit 2's complement number. A load does reg[r2] = mem[reg[r1] + offset]. A store does mem[reg[r1] + offset] = reg[r2]. The beq instruction compares the first two arguments, and branches to PC + 1 + offset. (Given a label, a conforming assembler will generate the correct offset).
The JALR instruction is:
[label] opcode r1 r2
It does, sequentially, r2 = PC + 1, branch to R1.
List of Instructions
add (opcode 0000)
nand (opcode 0001)
lw (opcode 0010)
sw (opcode 0011)
beq (opcode 0100)
jalr (opcode 0101)
halt (opcode 0110)
noop (opcode 0111)
or (opcode 1000)
and (opcode 1001)
lwl (opcode 1010)
swc (opcode 1011) [r2 = 1 if swc succeded; r2 = 0 otherwise]
not (opcode 1100) [note; takes 3 args, but ignores middle]
sl (opcode 1101)
sr (opcode 1110)
sub (opcode 1111)
The core supports all the LC2K instructions from the course as well as load-lock and store-conditional instructions. It also implements all the additional instructions mentioned above. I believe the core correctly implements all the instructions, but I've only run a handful of test cases. More extensive testing needs to be done.
I've implemented both instruction and data caches. The data cache is write-back and write-allocate. It implements the MESI protocol, and has been exercised somewhat, but needs more testing.
NOTE: The instruction caches do not snoop. If you change the instructions for any processor, you MUST flush that processor's instruction cache if it is possible that the old instructions could be cached. Flushing your own data cache is also a good idea, but not necessary, since the instruction cache misses on the far core will cause BRLs to go out on the bus. If you don't flush your cache, the timing is less predictable.
How do I run it?
You need a SystemVerilog compatible simulator on which to run the core. Conveniently, you can download a free version of ModelSim from Altera's website. You can get it here: http://dl.altera.com/?product=modelsim_ae#tabs-2
Generating a Makefile
The default Makefile is for my installation of ModelSim ASE. You'll need to ensure that the ModelSim executables (vlib, vlog, vsim) are in your path and that you update the Makefile to point at the location of the ModelSim directory.
How many cores?
When you say "make", the core is compiled and then is simulated. It will run the program in the file program.mem. It will run a dual core processor by default. If you want to run with a quad core processor, you can say "make quad". You can say "make single" to run the single core processor.
A note on this: Any program written for a smaller number of cores should work fine on a testbench running a larger number of cores. That said, the single core testbench will probably be faster for running single core tests, etc. Much of the other cores are gated when they're not running, so unless the simulator is terrible (which it might be?), it won't add too much extra work to simulate them.
I've gotten some version of it to run on all of the big simulators. For reasons that I don't understand, it doesn't run on Icarus Verilog, which was what I was originally targeting.
The VCS branch has a Makefile for VCS that should work as long as you have VCS in your path.
The ncsim branch has a Makefile for NCSim that should work as long as you have ncvlog, ncelab, and ncsim in your path.
The console is not available (yet) in anything other than ModelSim.
Managing the cores is done through memory-mapped registers. The following structure exists in the memory system for each core, where n is the index of the core.
0x4000_n000 ; Core State (0 = halted, 1 = running) 0x4000_n001 ; PC 0x4000_n002 ; Flush Caches (bit 0 = icache, bit 1 = dcache) NB: The core is effectively halted while flushing the dcache. Reading back this register will tell you if a flush is in progress for either cache. 0x4000_n003 ; Architectural reset (PC = 0; register = 0; caches zeroed.) 0x4000_n010- ; Registers 0x4000_n017 0xffff_ffff ; Who am I? (CPUID, read only).
IMPORTANT: Playing with the state of a running core is inherently dangerous. You certainly can change the PC or change the registers on a running core, but is probably a non-optimal way to do virtually anything you'd want to do with it. Ultimately, don't do stupid things.
REALLY IMPORTANT: Sending an architectural reset is exactly "pulling reset" on the whole pipeline. Everything in progress stops, the dcache is cleared without writing back and all the registers are zeroed. The value you write to the register must be exactly 0xDEADC0DE in order for the reset to occur. The core state remains unchanged, to allow you to call reset on yourself.
I'm working on getting some sort of a console working. I've added a new console module that is accessed by memory-mapped registers. It lives at 0x4001_0000. You can read it to read from the console, or write to it to write to the console. Those memory accesses are blocking. (Nonblocking version at 0x4001_0001).
Right now, I only have the console working on the single core testbench, but getting it working on the other ones should be pretty easy.
To use the console, run the single core testbench and attach to the pipes test_i and test_o (for input and output respectively). That is, you can do "cat test_o" to see the output and you can "echo something > test_i" to send input.
Just a brief note on academic integrity. I believe that publishing this project isn't a violation of academic integrity. Turning this implementation into something that you could submit to EECS 370 is almost certainly much more work and much harder than actually doing it.
Likewise, my implementation is also not cycle accurate to any of the EECS370 projects. It could perhaps be used to generate output from an LC2K processor for verification purposes, but the format of that output isn't anything close to that of the LC2K processors in EECS 370. It's certainly beyond being diff-able.
That said, if you do have concerns about this project and academic integrity, please don't hesitate to contact me.
I've also committed an assembler for the LC2K. It's provided as a Linux binary only. Having IA'ed for EECS 370, I'm well aware of the number of LC2K assemblers floating around the web in source-code form, and don't wish to further contribute to that. I do intend to put together an assembler in Python, or something suitably different from C to make it difficult to turn it in to 370.
You'll need to use my assembler to assemble multicore programs, or at a minimum, you'll need to add lwl (opcode 1010) and swc (opcode 1011) to your own assembler.