Commits

Jordan Earls  committed bf2c3b8

Initial import of tinycpu code

  • Participants

Comments (0)

Files changed (24)

+Copyright (c) 2012 Jordan "Earlz/hckr83" Earls  <http://lastyearswishes.com>
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+1. Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+3. The name of the author may not be used to endorse or promote products
+   derived from this software without specific prior written permission.
+   
+THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
+INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
+AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL
+THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+# vhdl files
+FILES = src/*
+VHDLEX = .vhd
+ 
+# testbench
+TESTBENCHPATH = testbench/${TESTBENCH}$(VHDLEX)
+ 
+#GHDL CONFIG
+GHDL_CMD = ghdl
+GHDL_FLAGS  = --ieee=synopsys --warn-no-vital-generic
+ 
+SIMDIR = simulation
+# Simulation break condition
+#GHDL_SIM_OPT = --assert-level=error
+GHDL_SIM_OPT = --stop-time=1000ns
+ 
+WAVEFORM_VIEWER = gtkwave
+ 
+all: compile run view
+ 
+new :
+	echo "Setting up project ${PROJECT}"
+	mkdir src testbench simulation	
+ 
+compile :
+ifeq ($(strip $(TESTBENCH)),)
+		@echo "TESTBENCH not set. Use TESTBENCH=value to set it."
+		@exit 2
+endif                                                                                             
+ 
+	mkdir -p simulation
+	$(GHDL_CMD) -i $(GHDL_FLAGS) --workdir=simulation --work=work $(TESTBENCHPATH) $(FILES)
+	$(GHDL_CMD) -m  $(GHDL_FLAGS) --workdir=simulation --work=work $(TESTBENCH)
+	@mv $(TESTBENCH) simulation/$(TESTBENCH)                                                                                
+ 
+run :
+	@$(SIMDIR)/$(TESTBENCH) $(GHDL_SIM_OPT) --wave=$(SIMDIR)/$(TESTBENCH).ghw                                     
+ 
+view :
+	$(WAVEFORM_VIEWER) --dump=$(SIMDIR)/$(TESTBENCH).ghw                                                 
+ 
+clean :
+	$(GHDL_CMD) --clean --workdir=simulation

File README.md.txt

+==TinyCPU==
+
+TinyCPU is an 8-bit processor designed to be small, yet fairly fast. 
+
+Goals:
+
+The goals of TinyCPU are basically to have a small 8-bit processor that can be embedded with minimal logic required, but also fast enough to do what's needed.
+With these goals, I try to lay out instructions in a way so that they are trivial to decode, for instance, nearly all ALU opcodes fit within 2 opcode groups,
+and the ALU is arranged so that no translation needs to be done to decode these groups. It is also designed to be fast. Because XST failed at synthesizing
+every attempt I threw at multi-port registerfiles, I instead decided to make it braindead simple and just provide a port for every register. This means that
+every register can be accessed at the same time, preventing me from having to worry about how many registers are accessed in an opcode, and therefore enabling
+very rich opcodes. Also, with the standard opcode format, decoding should hopefully be a breeze involving basically only 2 or 3 states. 
+
+Features:
+1. Single clock cycle for all instructions without memory access
+2. Two clock cycles for memory access instructions
+3. 7 general purpose registers arranged as 2 banks of 4 registers, as well as 2 fixed registers
+4. IP and SP are treated as normal registers, enabling very intuitive opcodes such as "push and move" without polluting the opcode space
+5. Able to use up to 255 input and output ports
+6. Fixed opcode size of 2 bytes
+7. Capable of addressing up to 65536 bytes of memory with 4 segment registers for "extended" memory accesses
+8. Conditional execution is built into every opcode
+9. Von Neuman machine, ie data is code and vice-versa
+
+
+Plans:
+
+Although a lot of the processor is well underway and coded, there is still some minor planning taking place. The instruction list is still not formalized
+and as of this writing, there is still room for 3 "full" opcodes, and 4 opcodes in a group not completely allocated.
+
+Software:
+
+I can already tell getting software running on this will be difficult, though I have a plan for loading software through the UART built into the papilio-one. 
+Also, I will create a fairly awesome assembler for this architecture using the DSL capabilities of Ruby. I created a prototype x86 assembler in Ruby before, so
+it shouldn't be any big deal.. and it should be a lot easier than writing an assembler in say C... Also, I have no immediate plans of porting a C compiler.
+This is mainly because of the small segment size(just 256 bytes).. though I'm considering adding a way to "extend" segments in some way without changing the opcode
+format. 
+
+Oddities:
+
+I used this opportunity to try out my "JIT/JIF" comparison mechanism. Basically, instead of doing something like
+
+cmp r0,r1
+jgt .greater
+mov r0,0xFF
+.greater:
+mov r1,0x00
+
+You can instead do
+cmp_greater_than r0,r1
+jit .greater --jit=jump if true
+mov r0,0xFF
+.greater:
+mov r1,0x00
+
+or because of the awesome conditional execution that's built in:
+
+cmp greater_than r0,r1
+mov_if_true r0,0xFF
+mov r1,0x00
+
+
+Short comings:
+
+This truth-register mechanism is unlike anything I've ever seen, and I'm really curious as to how it will act in actual logic. Because of how it works, conditional jumps are needed
+a lot less often, which in the future could mean less cache missing (if I ever implement a cache, that is) It's only bad part is that multiple comparisons are needed
+when doing something like `if r0>0 and r0<10 then r3=0`:
+
+mov r4,0
+mov r5,10
+cmp_greater r0,r4
+jif .skip
+cmp_lessthan r0,r5
+mov_if_true r3,0
+.skip:
+;continue on
+
+Another apparent thing is that code size is going to be difficult to keep down, especially since each segment can only contain 128 instructions.
+One possible solution is adding a "overflow into segment" option where when IP rolls over from 255 to 0, it will also increment CS by 1

File assembler/asm.rb

+PREFIX = "x\"";
+SUFFIX = "\"";
+SEPERATOR = ", ";
+
+
+
+class OpcodeByte1
+  attr_accessor :op, :register, :cond;
+  def to_hex
+    s = (op << 4 | register.number << 1 | cond).to_s(16);
+    if s.length == 1
+      "0"+s;
+    elsif s.length == 0
+      "00";
+    else
+      s
+    end
+  end
+end
+
+class OpcodeByte2
+  attr_accessor :cond, :reg2, :useextra, :reg3;
+  def to_hex
+    s=(cond << 7 | reg2.number << 4 | useextra << 3 | reg3.number).to_s(16);
+    if s.length == 1
+      "0"+s;
+    elsif s.length==0
+      "00";
+    else
+      s;
+    end
+  end
+end
+  
+
+class Register8
+  attr_accessor :number
+  def initialize(num)
+	@number=num
+  end
+end
+class OpcodeOption
+  attr_accessor :number
+  def initialize(num)
+    @number=num;
+  end
+end
+
+$iftr = 0; #0 for no condition, 1 for if TR, 2 for if not TR
+$useextra = 0;
+$position = 0;
+
+def set_cond(o1, o2)
+  if $iftr==0 then
+    o1.cond=0;
+    o2.cond=0;
+  elsif $iftr==1 then
+    o1.cond=1;
+    o2.cond=0;
+  else
+    o1.cond=0;
+    o2.cond=1;
+  end
+end
+def output_op(value)
+  printf PREFIX + value + SUFFIX;
+  printf SEPERATOR;
+  $position+=2;
+end
+
+
+def mov_r8_imm8(reg,imm)
+  o = OpcodeByte1.new();
+  o.op = 0;
+  o.register=reg;
+  if $iftr<2 then
+    o.cond=$iftr;
+  else
+    raise "if_tr_notset is not allowed with this opcode";
+  end
+  output_op(o.to_hex.rjust(2,"0") + imm.to_s(16).rjust(2,"0"))
+end
+def mov_rm8_imm8(reg,imm)
+  o=OpcodeByte1.new();
+  o.op=1;
+  o.register=reg;
+  if $iftr<2 then
+    o.cond=$iftr;
+  else
+    raise "if_tr_notset is not allowed with this opcode";
+  end
+  output_op(o.to_hex.rjust(2,"0") + imm.to_s(16).rjust(2,"0"));
+end
+
+def do_group_reg_reg(opcode,group,reg1,reg2)
+  o1 = OpcodeByte1.new()
+  o1.op=opcode;
+  o1.register=reg1;
+  o2 = OpcodeByte2.new()
+  o2.useextra=$useextra;
+  o2.reg2=reg2;
+  o2.reg3=OpcodeOption.new(group); #opcode group
+  set_cond(o1,o2)
+  output_op(o1.to_hex.rjust(2,"0") + o2.to_hex.rjust(2,"0"))
+end
+def do_subgroup_reg(opcode,group,subgroup,reg1)
+  o1 = OpcodeByte1.new()
+  o1.op=opcode;
+  o1.register=reg1;
+  o2 = OpcodeByte2.new()
+  o2.useextra=$useextra;
+  o2.reg2=OpcodeOption.new(subgroup);
+  o2.reg3=OpcodeOption.new(group); #opcode group
+  set_cond(o1,o2)
+  output_op(o1.to_hex.rjust(2,"0") + o2.to_hex.rjust(2,"0"))
+end
+
+def and_reg_reg(reg1, reg2)
+  do_group_reg_reg(4,0,reg1,reg2)
+end;
+def or_reg_reg(reg1, reg2)
+  do_group_reg_reg(4,1,reg1,reg2)
+end;
+def xor_reg_reg(reg1, reg2)
+  do_group_reg_reg(4,2,reg1,reg2)
+end;
+def not_reg_reg(reg1, reg2)
+  do_group_reg_reg(4,3,reg1,reg2)
+end;
+def lsh_reg_reg(reg1, reg2)
+  do_group_reg_reg(4,4,reg1,reg2)
+end;
+def rsh_reg_reg(reg1, reg2)
+  do_group_reg_reg(4,5,reg1,reg2)
+end;
+def lro_reg_reg(reg1, reg2)
+  do_group_reg_reg(4,6,reg1,reg2)
+end;
+def rro_reg_reg(reg1, reg2)
+  do_group_reg_reg(4,7,reg1,reg2)
+end;
+#comparisons
+def cmpgt_reg_reg(reg1, reg2)
+  do_group_reg_reg(3,0,reg1,reg2)
+end;
+def cmpgte_reg_reg(reg1, reg2)
+  do_group_reg_reg(3,1,reg1,reg2)
+end;
+def cmplt_reg_reg(reg1, reg2)
+  do_group_reg_reg(3,2,reg1,reg2)
+end;
+def cmplte_reg_reg(reg1, reg2)
+  do_group_reg_reg(3,3,reg1,reg2)
+end;
+def cmpeq_reg_reg(reg1, reg2)
+  do_group_reg_reg(3,4,reg1,reg2)
+end;
+def cmpneq_reg_reg(reg1, reg2)
+  do_group_reg_reg(3,5,reg1,reg2)
+end;
+def cmpeq_reg_0(reg1)
+  do_group_reg_reg(3,6,reg1,Register8.new(0)) #last arg isn't used
+end;
+def cmpneq_reg_0(reg1)
+  do_group_reg_reg(3,7,reg1,Register8.new(0))
+end;
+
+def mov_reg_mreg(reg1, reg2)
+  do_group_reg_reg(5,2,reg1,reg2)
+end
+def mov_mreg_reg(reg1, reg2)
+  do_group_reg_reg(5,3,reg1,reg2)
+end
+def mov_reg_reg(reg1, reg2)
+  do_group_reg_reg(5,1,reg1,reg2)
+end
+
+  
+
+def mov(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Integer and arg2<0x100 then
+    mov_r8_imm8 arg1,arg2 
+  elsif arg1.kind_of? Array and arg2.kind_of? Integer and arg2<0x100 then
+    if arg1.length>1 or arg1.length<1 or not arg1[0].kind_of? Register8 then
+      raise "memory reference is not correct. Only a register is allowed";
+    end
+    reg=arg1[0];
+    mov_rm8_imm8 reg, arg2
+  elsif arg1.kind_of? Array and arg2.kind_of? Register8 then
+    if arg1.length>1 or arg1.length<1 or not arg1[0].kind_of? Register8 then
+      raise "memory reference is not correct. Only a register is allowed";
+    end
+    mov_mreg_reg arg1[0], arg2
+  elsif arg1.kind_of? Register8 and arg2.kind_of? Array then
+    if arg2.length>1 or arg2.length<1 or not arg2[0].kind_of? Register8 then
+      raise "memory reference is not correct. Only a register is allowed";
+    end
+    mov_reg_mreg arg1,arg2[0]
+  elsif arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    mov_reg_reg arg1, arg2
+  else
+    raise "No suitable mov opcode found";
+  end
+end
+def and_(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    and_reg_reg arg1,arg2
+  else
+    raise "No suitable and opcode found";
+  end
+end
+def or_(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    or_reg_reg arg1,arg2
+  else
+    raise "No suitable or opcode found";
+  end
+end
+def xor_(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    xor_reg_reg arg1,arg2
+  else
+    raise "No suitable xor opcode found";
+  end
+end
+def not_(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    not_reg_reg arg1,arg2
+  else
+    raise "No suitable not opcode found";
+  end
+end
+def rsh(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    rsh_reg_reg arg1,arg2
+  else
+    raise "No suitable rsh opcode found";
+  end
+end
+def lsh(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    lsh_reg_reg arg1,arg2
+  else
+    raise "No suitable lsh opcode found";
+  end
+end
+def rro(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    rro_reg_reg arg1,arg2
+  else
+    raise "No suitable rro opcode found";
+  end
+end
+def lro(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    lro_reg_reg arg1,arg2
+  else
+    raise "No suitable lro opcode found";
+  end
+end
+
+def cmpgt(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    cmpgt_reg_reg arg1,arg2
+  else
+    raise "No suitable cmpgt opcode found";
+  end
+end
+def cmpgte(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    cmpgte_reg_reg arg1,arg2
+  else
+    raise "No suitable cmpgte opcode found";
+  end
+end
+def cmplt(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    cmplt_reg_reg arg1,arg2
+  else
+    raise "No suitable cmplt opcode found";
+  end
+end
+def cmplte(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    cmplte_reg_reg arg1,arg2
+  else
+    raise "No suitable cmplte opcode found";
+  end
+end
+def cmpeq(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    cmpeq_reg_reg arg1,arg2
+  elsif arg1.kind_of? Register8 and arg2.kind_of? Integer and arg2==0 then
+    cmpeq_reg_0 arg1
+  else
+    raise "No suitable cmpeq opcode found";
+  end
+end
+def cmpneq(arg1,arg2)
+  if arg1.kind_of? Register8 and arg2.kind_of? Register8 then
+    cmpneq_reg_reg arg1,arg2
+  elsif arg1.kind_of? Register8 and arg2.kind_of? Integer and arg2==0 then
+    cmpneq_reg_0 arg1
+  else
+    raise "No suitable cmpneq opcode found";
+  end
+end
+
+def Label
+  attr_accessor :name, :pos
+  def initialize(name, pos)
+    @name=name;
+    @pos=pos;
+  end
+end
+$labellist={}
+def new_label(name)
+  $labellist[name.to_s]=$position;
+end
+def lbl(name)
+  $labellist[name.to_s];
+end
+  
+  
+def if_tr_set
+  $iftr = 1
+  yield
+  $iftr = 0
+end
+
+
+r0=Register8.new(0)
+r1=Register8.new(1)
+r2=Register8.new(2)
+r3=Register8.new(3)
+r4=Register8.new(4)
+r5=Register8.new(5)
+sp=Register8.new(6)
+ip=Register8.new(7)
+
+
+#test code follows. Only do it here for convenience.. real usage should prefix assembly files with `require "asm.rb"` 
+
+
+#port0(0) is LED port0(1) is a button
+
+mov r4, 1
+mov r5, 0xFD
+#mov r5, 0x01 #the port bitmask
+mov [r4],r5
+mov r3, 0
+mov [r3], 0
+mov r2, 0x02
+#poll for button
+new_label :loop
+mov r0, [r3]
+and_ r0, r2 #isolate just the button at pin 2
+cmpneq r0, 0
+if_tr_set{
+  mov [r3], 0x01
+}
+cmpeq r0,0
+if_tr_set{
+  mov [r3], 0x00
+}
+mov ip, lbl(:loop)
+
+printf("\n");
+while $position<64
+  printf("x\"0000\", ")
+  $position+=2;
+end
+puts "\nsize:" + $position.to_s

File docs/assembler.txt

+The assembler is really a big hack job. Basically, I didn't want to write a complex assembler in C or C++, so I decided to use Ruby's power as a DSL to create an assembler.
+So ruby does all the heavy syntax stuff, and I just do a bit more meta-programming than anyone probably should. 
+
+Anyway, Because of how it's made, you can of course use Ruby for any kind of assembler generation such as loops or whatever. Keep in mind, that after the program file runs, it outputs machine code though..
+
+A simple example file:
+
+----
+require "asm.rb"
+
+mov r0, 0x0F
+mov r1, 0x1C
+and r0, r1 
+
+-----
+
+And you run it by doing something like `ruby MyAssemblyFile.rb`. 
+
+If you're just wanting to write some assembly code and not worry about how my assembler works, then it's quite simple. 
+Just use the command above, make sure to use `require "asm.rb"` and make sure that asm.rb is in your working directory.
+
+The asembler is definitely an `intel` style assembler. By that, I mean target registers are on the left, and source registers are on the right. 
+However, because of our unique CPU architecture, there are some interesting looking constructs. 
+
+For instance, to use a block as `Only execute if TR is set` you'd use:
+
+mov r0, 10
+mov r1, 20
+cmpgt r0, r1 #reads as TR=r0 > r1
+if_tr_set{
+  mov r0, 40
+}
+
+
+
+
+Also, for here is a quick lookup table for the assembly words that aren't obvious:
+
+rsh -- right shift
+lsh -- left shift
+rro -- right rotate
+lro -- left rotate
+cmpgt -- compare greater than
+cmpgte -- compare greater than or equal
+cmplt -- compare less than
+cmplte -- compare less than or equal
+cmpeq -- compare equal
+cmpneq -- compare not equal
+
+To avoid all sorts of hell with Ruby, some assembler words must be suffixed with a _
+These are listed below:
+or_
+and_
+xor_
+not_

File docs/design.md.txt

+This is the design of TinyCPU. It's goals are as follows:
+
+1. 8-bit registers and operations (8 bit processor)
+2. 16-bit address bus
+3. fixed 16-bit instruction length
+4. use a small amount of "rich" instructions to do powerful things
+5. 1 instruction per clock cycle
+
+I/O: 
+I/O has been decided to use a memory mapped approach.
+Howto: 
+
+Basically, absolute address 0-32 is reserved for ports. Right now there is only a Port0. This port is 8-bits long. 
+Each even address is the port address. Each odd address is the bitmap for write or read for the port below it.
+
+So to make ports 7 to 4 write and 3 to 0 read, you'd assign 0xF0 to address 0x0001
+
+
+
+BIG CHANGE:
+So, apparently making a single-cycle CPU is extremely hard... so instead, we'll be striving for a 2-cycle CPU.
+Usual cycles:
+1-cycle: mov, jmp, etc general data movement
+2-cycle: ALU operations
+1-cycle with memory wait(so really 2 cycle): all instructions that reference memory
+
+
+Relative moves:
+In order to provide uesfulness to the segment-carryover feature, there are a few options for moving a "relative" amount to a register, including IP and SP
+A relative move differs in most of the opcodes in that the relative factor is treated as a signed value. 
+so for instance, a 
+mov r0,50
+mov_relative r0, -10
+
+in the ned, r0 will end up being 40. Although this feature won't see much use in general registers, IP and SP are special because of the option of using the
+segment-carryover feature. This means that SP and IP, while being 8-bit registers, can function very similar to a 16-bit register, enabling full usage of the available address space.
+
+Register list:
+r0-r5 general purpose registers
+sp stack pointer (represented as r6)
+ip instruction pointer register (represented as r7)
+cs, ds, es, ss segment registers (code segment, data segment, extra segment, stack segment)
+tr truth register for conditionals
+
+general opcode format
+
+first byte:
+first 4 bits: actual instruction
+next 3 bits: (target) register
+last 1 bit: conditional
+
+second byte: 
+first 1 bit: second portion of condition (if not immediate) (1 for only if false)
+next 1 bit: use extra segment
+next 3 bits: other register. If not 3rd register
+last 3 bits: extra opcode information or third register. such as for ADD it could be target=source+third_register
+
+...or second byte is immediate value
+
+For opcodes requiring 3 registers but without room, the target opcode is assume to be the second operation. Such as for AND, target=source AND target
+
+short list of instructions: (not final, still planning)
+immediates:
+1. move reg, immediate
+2. move [reg], immediate
+3. push and move reg, immediate (or call immediate)
+4. move (relative) reg, immediate
+
+mini-group 5. Root opcode is 5, register is to tell which opcode( up to 8). No register room, only immediate
+push immedate
+XX
+XX
+XX
+XX
+XX
+XX
+XX
+
+
+groups: (limited to 2 registers and no immediates. each group has 8 opcodes)
+group 1:
+move(store) [reg],reg
+move(load) reg,[reg]
+out reg1,reg2 (output to port reg1 value reg2)
+in reg1,reg2 (input from port reg2 and store in reg1)
+XX
+XX
+move segmentreg,reg
+move reg,segmentreg
+
+group 2:
+and reg1,reg2 (reg1=reg1 and reg2)
+or reg, reg
+xor reg,reg
+not reg1,reg2 (reg1=not reg2)
+left shift reg,reg
+right shift reg,reg
+rotate right reg,reg
+rotate left reg,reg
+
+group 3: compares
+is greater than reg1,reg2 (TR=reg1>reg2)
+is greater or equal to reg,reg
+is less than reg,reg
+is less than or equal to reg,reg
+is equal to reg,reg
+is not equal to reg,reg
+equals 0 reg
+not equals 0 reg
+
+group 4:
+push segmentreg
+pop segmentreg
+push and move reg, reg (or call reg) 
+exchange reg,reg
+exchange reg,seg
+XX
+XX
+
+group 5:
+XX
+XX
+far jmp reg1, reg2 (CS=reg1 and IP=reg2)
+far call reg1,reg2
+far jmp [reg] (first byte is CS, second byte is IP)
+push extended segmentreg, reg (equivalent to push seg; push reg)
+pop extended segmentreg, reg (equivalent to pop reg; pop seg)
+reset processor (will completely reset the processor to starting state, but not RAM or anything else)
+
+group 6:
+set default register bank to 0 (can be condensed to 1 opcode)
+set default register bank to 1
+push extended reg, reg
+pop extended reg,reg
+enable carryover seg
+disable carryover seg
+mov relative reg, reg
+exchange reg, reg
+
+super group: Super groups only have room for 1 register argument. Each subgroup has 8 opcodes, capable of 8 subgroups.
+subgroup 0: 
+push reg
+pop reg
+set TR
+reset TR
+increment reg
+decrement reg
+set register bank 0
+set register bank 1
+subgroup 1:
+enable carryover seg
+disable carryover seg
+
+
+
+3 register instructions:
+1. add reg1, reg2, reg3 (reg1=reg2+reg3)
+2. sub reg1, reg2, reg3
+
+
+opcodes used: 14 of 16. 2 more opcodes available. Decide what to do with the room later.
+
+Possible canidates for opcode compression include
+* equals 0 and not equals 0 (room for 7 sub-opcodes each) (not doing that because it'd screw with the easy ALU code
+
+
+
+
+conditionals
+0 -- always
+1 -- only if true
+for only if false, there should basically be another compare or if applicable an always afterwards
+
+
+limitations that shouldn't be passed with instructions
+* Doing 2 memory references 
+* pushing a memory reference (equates to 2 memory references)
+
+Note it is possible however to read and write 16bits at one time to the memory to consecutive addresses that are 16-bit aligned.
+
+
+segments:
+DS is used in all "normal" memory references
+SS is used in all push and pop instructions
+ES is used when the ExtraSegment bit is set for either push/pop or normal memory references
+CS is only used for fetching instructions
+
+Segment carryover:
+In order to overcome the limitations of only having a 256 byte segment, there is a workaround option to "pretend" that IP is a 16 bit register.
+When CS carryover is enabled, when IP rollover from 255 to 0 or whatever, CS will be incremented. This makes it so that if you start at address 0:0.
+you can continue as far as needed into the address space without having to do ugly far jumps at each of the borders.
+Carryover can only be done on CS and SS. The required circuitry is not implemented for DS or ES due to an extreme level of complexity required for it, also
+it would only lead to unncessarily complex code 
+
+Also of note is that `move relative` implements a "carryover" component. This component will work on either IP or SP, and uses CS and SS respectively. 
+If used on other registers, there will be no carry over functionality, though it can be used as an easy way to add or subtract an immediate from a register. 
+
+
+
+States needed:
+0. reset
+1. decode current instruction (All without memory capable within 1 clock cycle)
+2. increment IP(and SP if needed) and fetch next instruction
+3. Write 1 register to memory
+4. Read 1 register from memory
+5. Write 2 registers to memory
+6. Read 2 registers from memory
+7. Write 1 register to memory and setup increment of sp
+8. Write 2 registers to memory and setup double increment of sp
+9. Read 1 register from memory and setup decrement of sp
+10. Read 2 registers from memory and setup double decrement of sp
+11. 
+
+
+
+registerfile map:
+0000: general r0
+0001: general r1
+0010: general r2
+0011: general r3
+0100: general r4
+0101: general r5
+0110: SP (r6)
+0111: IP (r7)
+1000: second bank r0
+1001: second bank r1
+1010: second bank r2
+1011: second bank r3
+1100: CS
+1101: DS
+1110: ES
+1111: SS
+
+Banking works like if(regnumber(2) = '0') then regnumber(3)=regbank; end if;
+
+
+ALU operations
+00000 and reg1,reg2 (reg1=reg1 and reg2)
+00001 or reg, reg
+00010 xor reg,reg
+00011 not reg1,reg2 (reg1=not reg2)
+00100 left shift reg,reg (logical)
+00101 right shift reg,reg (logical)
+00110 rotate right reg,reg
+00111 rotate left reg,reg
+
+01000 is greater than reg1,reg2 (TR=reg1>reg2)
+01001 is greater or equal to reg,reg
+01010 is less than reg,reg
+01011 is less than or equal to reg,reg
+01100 is equal to reg,reg
+01101 is not equal to reg,reg
+01110 equals 0 reg
+01111 not equals 0 reg
+
+10000 Set TR
+10001 Reset TR
+10011 Increment 
+10010 Decrement
+10100 Add
+10101 Subtract
+
+
+
+Alignment restrictions:
+In general, their is very few times that a full 16-bit read or 16-bit write is done. These are the times:
+
+* Extended push
+* Extended pop
+* instruction fetch
+
+Because of this, and because I want for 2 clock cycles to be the longest instruction, I must place some alignment restrictions on the CPU
+So, IP must be aligned to a 16-bit address (must be an even number). And SP must also be aligned to a 16-bit address. 
+Though I don't plan on putting any "real" restriction to setting it to an odd address, nothing will actually work right. 
+
+Stack Details:
+Because of the need for 16-bit writes and reads of the stack, even though we're usually only using 8-bit values, we end up pushing 2 bytes at one time always.
+Stack is oppositely done from the 8086. push X will move X to SS:SP and then increment SP by 2. 
+Let's take an example program:
+--SS is 0
+mov sp, 10
+push 0xff
+
+after this, 0x00FF will be moved to SS:SP (0x0010) and then sp will be incremented by 2. If we push an 8-bit value, the value is put in the least-significant byte, and the MSB is 0
+
+
+
+On Reset:
+
+On reset, all general registers are set to 0
+CS is set to 1, IP is set to 0. SS is set to 2 and SP is set to 0. 
+Carryover is set on CS and not set on SS. DS and ES is 0. TR is false.
+Register bank 0 is selected. 
+
+Electrical operation:
+On power-on, RESET should be high for at least 2 clock cycles. HOLD can optionally be high as well after these two clock cycles.
+When HOLD is no longer needed, it should just be turned low and an extra clock cycle should be waited on for it to return to RESET state
+When RESET is held low, the processor will execute. It takes 3 clock cycles for the processor to "catch up" to actually executing instructions
+
+
+
+Register order: 
+The order of registers is read from left to right with left being the most significant bit of the 16-bit opcode.
+So for instance, 
+0101_*000*0_0*111*_0010 is `mov [r0], IP/r7`. The register portions of the opcode are surrounded by astericks  
+
+
+Implemented opcode list:
+legend:
+r = register choice
+R = register choice or opcode choice for sub groups
+C = conditional portion
+s = segment register choice
+i = immediate data
+N = not used
+o = opcode choice (for groups)
+_ = space for readability
+
+0000_rrrC_iiii_iiii
+mov reg, immediate
+
+0001_rrrC_iiii_iiii
+mov [reg], immediate
+
+group 3 comparions
+0011_rrrC_Crrr_Nooo
+opcode choices
+000: is greater than reg1,reg2 (TR=reg1>reg2)
+001: is greater or equal to reg,reg
+010: is less than reg,reg
+011: is less than or equal to reg,reg
+100: is equal to reg,reg
+101: is not equal to reg,reg
+110: equals 0 reg
+111: not equals 0 reg
+
+group 4 bitwise
+0100_rrrC_Crrr_Nooo
+opcode choices
+000: and reg1,reg2 (reg1=reg1 and reg2)
+001: or reg, reg
+010: xor reg,reg
+011: not reg1,reg2 (reg1=not reg2)
+100: left shift reg,reg
+101: right shift reg,reg
+110: rotate right reg,reg
+111: rotate left reg,reg
+
+group 5 misc
+0101_rrrC_CRRR_sooo
+opcode choices:
+000: subgroup 5-0
+  RRR choices:
+  000: push reg
+  001: pop reg
+001: mov reg, reg
+010: mov reg, [reg]
+011: mov [reg], reg
+
+
+
+
+
+
+
+
+
+
+library IEEE;
+use IEEE.STD_LOGIC_1164.ALL;
+use IEEE.NUMERIC_STD.ALL;
+use work.tinycpu.all;
+
+
+
+
+entity alu is
+        
+  port(
+    Op: in std_logic_vector(4 downto 0);
+    DataIn1: in std_logic_vector(7 downto 0);
+    DataIn2: in std_logic_vector(7 downto 0);
+    DataOut: out std_logic_vector(7 downto 0);
+    TR: out std_logic
+   );
+end alu;
+
+architecture Behavioral of alu is
+begin
+  process(DataIn1, DataIn2, Op)
+  begin
+    TR <= '0';
+    case Op is 
+--bitwise operations
+      when "00000" => --and
+        DataOut <= DataIn1 and DataIn2;
+      when "00001" => --or
+        DataOut <= DataIn1 or DataIn2;
+      when "00010" => --xor
+        DataOut <= DataIn1 xor DataIn2;
+      when "00011" => --not
+        DataOut <= not DataIn2; --ignore DataIn1 here so that mapping these operations is as simple as possible
+      when "00100" => --left shift (logical)
+        DataOut <= std_logic_vector(shift_left(unsigned(DataIn1),to_integer(unsigned(DataIn2(2 downto 0)))));
+      when "00101" => --right shift(logical)
+        DataOut <= std_logic_vector(shift_right(unsigned(DataIn1),to_integer(unsigned(DataIn2(2 downto 0))))); 
+      when "00110" => --left rotate
+        DataOut <= std_logic_vector(rotate_left(unsigned(DataIn1),to_integer(unsigned(DataIn2(2 downto 0))))); 
+      when "00111" => --right rotate
+        DataOut <= std_logic_vector(rotate_right(unsigned(DataIn1),to_integer(unsigned(DataIn2(2 downto 0))))); 
+--comparisons
+      when "01000" => --greater than
+        DataOut <= "00000000";
+        if(to_integer(unsigned(DataIn1)) > to_integer(unsigned(DataIn2))) then
+          TR <= '1';
+        else
+          TR <= '0';
+        end if;
+      when "01001" => --greater than or equal
+        DataOut <= "00000000";
+        if(to_integer(unsigned(DataIn1)) >= to_integer(unsigned(DataIn2))) then
+          TR <= '1';
+        else
+          TR <= '0';
+        end if;
+      when "01010" => --less than
+        DataOut <= "00000000";
+        if(to_integer(unsigned(DataIn1)) < to_integer(unsigned(DataIn2))) then
+          TR <= '1';
+        else
+          TR <= '0';
+        end if;
+      when "01011" => --less than or equal
+        DataOut <= "00000000";
+        if(to_integer(unsigned(DataIn1)) <= to_integer(unsigned(DataIn2))) then
+          TR <= '1';
+        else
+          TR <= '0';
+        end if;
+      when "01100" => --equals to
+        DataOut <= "00000000";
+        if(to_integer(unsigned(DataIn1)) = to_integer(unsigned(DataIn2))) then
+          TR <= '1';
+        else
+          TR <= '0';
+        end if;
+      when "01101" => --not equal
+        DataOut <= "00000000";
+        if(to_integer(unsigned(DataIn1)) /= to_integer(unsigned(DataIn2))) then
+          TR <= '1';
+        else
+          TR <= '0';
+        end if;
+      when "01110" => --equal to 0
+        DataOut <= "00000000";
+        if(to_integer(unsigned(DataIn1)) = 0) then
+          TR <= '1';
+        else
+          TR <= '0';
+        end if;
+      when "01111" => --not equal to 0
+        DataOut <= "00000000";
+        if(to_integer(unsigned(DataIn1)) /= 0) then
+          TR <= '1';
+        else
+          TR <= '0';
+        end if;
+--other operations
+      when "10000" => --set TR
+        DataOut <= "00000000";
+        TR <= '1';
+      when "10001" => --reset TR
+        DataOut <= "00000000";
+        TR <= '0';
+      when "10010" => --increment
+        DataOut <= std_logic_vector(unsigned(DataIn1) + 1); 
+      when "10011" => --decrement
+        DataOut <= std_logic_vector(unsigned(DataIn1) - 1); 
+      when "10100" => --add
+        DataOut <= std_logic_vector(unsigned(DataIn1) + unsigned(DataIn2));
+      when "10101" => --subtract
+        DataOut <= std_logic_vector(unsigned(DataIn1) - unsigned(DataIn2)); 
+
+      when others => 
+        DataOut <= "00000000";
+        TR <= '1';
+    end case;
+  end process;
+end Behavioral;

File src/blockram.vhd

+--RAM module
+--4096*8 bit file
+--simultaneous write/read support
+--16 bit or 8 bit data bus
+--16 bit address bus
+--On Reset, will load a "default" RAM image
+
+library IEEE;
+use IEEE.STD_LOGIC_1164.ALL;
+use ieee.std_logic_arith.all;
+use IEEE.NUMERIC_STD.ALL;
+use ieee.std_logic_unsigned.all; 
+
+entity blockram is
+  port(
+    Address: in std_logic_vector(7 downto 0); --memory address
+    WriteEnable: in std_logic_vector(1 downto 0); --write 1 byte at a time option
+    Enable: in std_logic; 
+    Clock: in std_logic;
+    DataIn: in std_logic_vector(15 downto 0);
+    DataOut: out std_logic_vector(15 downto 0)
+  );
+end blockram;
+
+architecture Behavioral of blockram is
+    type ram_type is array (255 downto 0) of std_logic_vector (7 downto 0);
+    signal RAM0: ram_type; --Spartan 3Es don't natively support byte-wide write enables, so we'll just emulate it with 2 banks of RAM
+    signal RAM1: ram_type;
+    signal di0, di1: std_logic_vector(7 downto 0);
+    signal do : std_logic_vector(15 downto 0);
+begin
+  di0 <= DataIn(7 downto 0) when WriteEnable(0)='1' else do(7 downto 0);
+    di1 <= DataIn(15 downto 8) when WriteEnable(1)='1' else do(15 downto 8);
+  process (Clock)
+  begin
+    if rising_edge(Clock) then
+      if Enable = '1' then
+        if WriteEnable(0)='1' then
+          RAM0(conv_integer(Address)) <= di0;
+        else
+          do(7 downto 0) <= RAM0(conv_integer(Address)) ;
+        end if;
+        if WriteEnable(1)='1' then
+          RAM1(conv_integer(Address)) <= di1;
+        else
+          do(15 downto 8) <= RAM1(conv_integer(Address));
+        end if;
+      end if;
+    end if;
+  end process;
+  DataOut <= do;
+end Behavioral;

File src/bootrom.vhd

+
+library ieee;
+use ieee.std_logic_1164.all;
+use IEEE.NUMERIC_STD.ALL;
+
+entity bootrom is
+port (CLK : in std_logic;
+      EN : in std_logic;
+      ADDR : in std_logic_vector(4 downto 0);
+      DATA : out std_logic_vector(15 downto 0));
+end bootrom;
+
+architecture syn of bootrom is
+  constant ROMSIZE: integer := 64;
+  type ROM_TYPE is array(0 to ROMSIZE/2-1) of std_logic_vector(15 downto 0);
+  signal ROM: ROM_TYPE := (x"0801", x"0afd", x"5853", x"0600", x"1600", x"0402", x"5032", x"4020", x"3007", x"1701", x"3006", x"1700", x"0e0c", 
+x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000", x"0000");
+  signal rdata : std_logic_vector(15 downto 0);
+begin
+
+    rdata <= ROM(to_integer(unsigned(ADDR)));
+
+    process (CLK)
+    begin
+        if (CLK'event and CLK = '1') then
+            if (EN = '1') then
+                DATA <= rdata;
+            end if;
+        end if;
+    end process;
+
+end syn;
+
+                

File src/carryover.vhd

+library IEEE;
+use IEEE.STD_LOGIC_1164.ALL;
+use IEEE.NUMERIC_STD.ALL;
+use work.tinycpu.all;
+
+entity carryover is 
+  port(
+    EnableCarry: in std_logic; --When disabled, SegmentIn goes to SegmentOut
+    DataIn: in std_logic_vector(7 downto 0);
+    SegmentIn: in std_logic_vector(7 downto 0);
+    Addend: in std_logic_vector(7 downto 0); --How much to increase DataIn by (as a signed number). Believe it or not, that's the actual word for what we need.
+    DataOut: out std_logic_vector(7 downto 0);
+    SegmentOut: out std_logic_vector(7 downto 0);
+    Clock: in std_logic
+--    Debug: out std_logic_vector(8 downto 0)
+   );
+end carryover;
+
+architecture Behavioral of carryover is
+  signal temp: std_logic_vector(8 downto 0) := "000000000";
+  signal temp2: std_logic_vector(7 downto 0);
+begin
+  --treat as unsigned because it doesn't actually matter for addition and just make carry and borrow correct
+  process(DataIn, SegmentIn,Addend, EnableCarry)
+    
+  begin
+    --if rising_edge(Clock) then
+      temp <= std_logic_vector(unsigned('0' & DataIn) + unsigned( Addend)); 
+  --    if ('1' and ((not Addend(7)) and DataIn(7) and temp(8)))='1' then 
+      if (EnableCarry and ((not Addend(7)) and DataIn(7) and not temp(8)))='1' then 
+        SegmentOut <= std_logic_vector(unsigned(SegmentIn)+1);
+      elsif (EnableCarry and (Addend(7) and not DataIn(7) and temp(8)))='1' then 
+        SegmentOut <= std_logic_vector(unsigned(SegmentIn)-1);
+      else
+        SegmentOut <= SegmentIn;
+      end if;
+    --end if;
+  end process;
+  --Debug <= Temp;
+  DataOut <= temp(7 downto 0);
+end Behavioral;

File src/core.vhd

+--Core module. 
+--This module is basically connects everything and decodes the opcodes.
+--The only thing above this is toplevel.vhd which actually sets the pinout for the FPGA
+
+
+library IEEE;
+use IEEE.STD_LOGIC_1164.ALL;
+use IEEE.NUMERIC_STD.ALL;
+use work.tinycpu.all;
+
+entity core is 
+  port(
+    --memory interface 
+    MemAddr: out std_logic_vector(15 downto 0); --memory address (in bytes)
+    MemWW: out std_logic; --memory writeword
+    MemWE: out std_logic; --memory writeenable
+    MemIn: in std_logic_vector(15 downto 0);
+    MemOut: out std_logic_vector(15 downto 0);
+    --general interface
+    Clock: in std_logic;
+    Reset: in std_logic; --When this is high, CPU will reset within 1 clock cycles. 
+    --Enable: in std_logic; --When this is high, the CPU executes as normal, when low the CPU stops at the next clock cycle(maintaining all state)
+    Hold: in std_logic; --when high, CPU pauses execution and places Memory interfaces into high impendance state so the memory can be used by other components
+    HoldAck: out std_logic; --when high, CPU acknowledged hold and buses are in high Z
+    --todo: port interface
+
+    --debug ports:
+    DebugIR: out std_logic_vector(15 downto 0); --current instruction
+    DebugIP: out std_logic_vector(7 downto 0); --current IP
+    DebugCS: out std_logic_vector(7 downto 0); --current code segment
+    DebugTR: out std_logic; --current value of TR
+    DebugR0: out std_logic_vector(7 downto 0)
+   );
+end core;
+
+architecture Behavioral of core is
+  component fetch is 
+    port(
+      Enable: in std_logic;
+      AddressIn: in std_logic_vector(15 downto 0);
+      Clock: in std_logic;
+      DataIn: in std_logic_vector(15 downto 0); --interface from memory
+      IROut: out std_logic_vector(15 downto 0);
+      AddressOut: out std_logic_vector(15 downto 0) --interface to memory
+    );
+  end component;
+  component alu is
+    port(
+      Op: in std_logic_vector(4 downto 0);
+      DataIn1: in std_logic_vector(7 downto 0);
+      DataIn2: in std_logic_vector(7 downto 0);
+      DataOut: out std_logic_vector(7 downto 0);
+      TR: out std_logic
+    );
+  end component;
+  component carryover is 
+    port(
+      EnableCarry: in std_logic; --When disabled, SegmentIn goes to SegmentOut
+      DataIn: in std_logic_vector(7 downto 0);
+      SegmentIn: in std_logic_vector(7 downto 0);
+      Addend: in std_logic_vector(7 downto 0); --How much to increase DataIn by (as a signed number). Believe it or not, that's the actual word for what we need.
+      DataOut: out std_logic_vector(7 downto 0);
+      SegmentOut: out std_logic_vector(7 downto 0);
+      Clock: in std_logic
+    );
+  end component;
+  component registerfile is
+  port(
+    WriteEnable: in regwritetype;
+    DataIn: in regdatatype;
+    Clock: in std_logic;
+    DataOut: out regdatatype
+  );
+  end component;
+
+  constant REGIP: integer := 7;
+  constant REGSP: integer := 6;
+  constant REGSS: integer := 15;
+  constant REGES: integer := 14;
+  constant REGDS: integer := 13;
+  constant REGCS: integer := 12;
+
+  type ProcessorState is (
+    ResetProcessor,
+    FirstFetch1, --the fetcher needs two clock cycles to catch up
+    FirstFetch2,
+    Firstfetch3,
+    Execute,
+    WaitForMemory,
+    HoldMemory,
+    WaitForAlu -- wait for settling is needed when using the ALU
+  );
+  signal state: ProcessorState;
+  signal HeldState: ProcessorState; --state the processor was in when HOLD was activated
+
+  --carryout signals
+  signal CarryCS: std_logic;
+  signal CarrySS: std_logic;
+  signal IPAddend: std_logic_vector(7 downto 0);
+  signal SPAddend: std_logic_vector(7 downto 0);
+  signal IPCarryOut: std_logic_vector(7 downto 0);
+  signal CSCarryOut: std_logic_vector(7 downto 0);
+  signal SPCarryOut: std_logic_vector(7 downto 0);
+  signal SSCarryOut: std_logic_vector(7 downto 0);
+
+  --register signals
+  signal regWE:regwritetype;
+  signal regIn: regdatatype;
+  signal regOut: regdatatype;
+  --fetch signals
+  signal fetchEN: std_logic;
+  signal IR: std_logic_vector(15 downto 0);
+  --alu signals
+  signal AluOp: std_logic_vector(4 downto 0);
+  signal AluIn1: std_logic_vector(7 downto 0);
+  signal AluIn2: std_logic_vector(7 downto 0);
+  signal AluOut: std_logic_vector(7 downto 0);
+  signal AluTR: std_logic;
+  signal TR: std_logic;
+  signal TRData: std_logic;
+  signal UseAluTR: std_logic;
+  
+  --control signals
+  signal InReset: std_logic;
+  signal OpAddress: std_logic_vector(15 downto 0); --memory address to use for operation of an instruction
+  signal OpDataIn: std_logic_vector(15 downto 0); 
+  signal OpDataOut: std_logic_vector(15 downto 0);
+  signal OpWW: std_logic;
+  signal OpWE: std_logic;
+  signal OpDestReg1: std_logic_vector(3 downto 0);
+  signal OpUseReg2: std_logic;
+  signal OpDestReg2: std_logic_vector(3 downto 0);
+
+  --opcode shortcut signals
+  signal opmain: std_logic_vector(3 downto 0);
+  signal opimmd: std_logic_vector(7 downto 0);
+  signal opcond1: std_logic; --first conditional bit
+  signal opcond2: std_logic; --second conditional bit
+  signal opreg1: std_logic_vector(2 downto 0);
+  signal opreg2: std_logic_vector(2 downto 0);
+  signal opreg3: std_logic_vector(2 downto 0);
+  signal opseges: std_logic; --use ES segment
+
+  signal regbank: std_logic;
+  
+  signal fetcheraddress: std_logic_vector(15 downto 0);
+
+  
+  signal bankreg1: std_logic_vector(3 downto 0); --these signals have register bank stuff baked in
+  signal bankreg2: std_logic_vector(3 downto 0);
+  signal bankreg3: std_logic_vector(3 downto 0);
+  signal FetchMemAddr: std_logic_vector(15 downto 0);
+
+  signal UsuallySS: std_logic_vector(3 downto 0);
+  signal UsuallyDS: std_logic_vector(3 downto 0);
+  signal AluRegOut: std_logic_vector(3 downto 0);
+begin
+  reg: registerfile port map(
+    WriteEnable => regWE,
+    DataIn => regIn,
+    Clock => Clock,
+    DataOut => regOut
+  );
+  carryovercs: carryover port map(
+    EnableCarry => CarryCS,
+    DataIn => regOut(REGIP),
+    SegmentIn => regOut(REGCS),
+    Addend => IPAddend,
+    DataOut => IPCarryOut,
+    SegmentOut => CSCarryOut,
+    Clock => Clock
+  );
+  carryoverss: carryover port map(
+    EnableCarry => CarrySS,
+    DataIn => regOut(REGSP),
+    SegmentIn => RegOut(REGSS),
+    Addend => SPAddend,
+    DataOut => SPCarryOut,
+    SegmentOut => SSCarryOut,
+    Clock => Clock
+  );
+  fetcher: fetch port map(
+    Enable => fetchEN,
+    AddressIn => fetcheraddress, 
+    Clock => Clock,
+    DataIn => MemIn,
+    IROut => IR,
+    AddressOut => FetchMemAddr
+  );
+  cpualu: alu port map(
+    Op => AluOp,
+    DataIn1 => AluIn1,
+    DataIn2 => AluIn2,
+    DataOut => AluOut,
+    TR => AluTR
+  );
+  fetcheraddress <= regIn(REGCS) & regIn(REGIP);
+  MemAddr <= OpAddress when state=WaitForMemory else FetchMemAddr;
+  MemOut <= OpDataOut when (state=WaitForMemory and OpWE='1') else "ZZZZZZZZZZZZZZZZ" when state=HoldMemory else x"0000";
+  MemWE <= OpWE when state=WaitForMemory else 'Z' when state=HoldMemory else '0';
+  MemWW <= OpWW when state=WaitForMemory else 'Z' when state=HoldMEmory else '0';
+  OpDataIn <= MemIn;
+  --opcode shortcuts
+  opmain <= IR(15 downto 12);
+  opimmd <= IR(7 downto 0);
+  opcond1 <= IR(8);
+  opcond2 <= IR(7);
+  opreg1 <= IR(11 downto 9);
+  opreg3 <= IR(2 downto 0);
+  opreg2 <= IR(6 downto 4);
+  opseges <= IR(3);
+  --debug ports
+  DebugCS <= regOut(REGCS);
+  DebugIP <= regOut(REGIP);
+  DebugR0 <= regOut(0);
+  DebugIR <= IR;
+  DebugTR <= TR;
+  --register addresses with registerbank baked in
+  bankreg1 <= ('1' & opreg1) when (regbank='1' and opreg1(2)='0') else '0' & opreg1;
+  bankreg2 <= ('1' & opreg2) when (regbank='1' and opreg2(2)='0') else '0' & opreg2;
+  bankreg3 <= ('1' & opreg3) when (regbank='1' and opreg3(2)='0') else '0' & opreg3;
+  --UsuallySegment shortcuts (only used when not an immediate
+  UsuallyDS <= "1101" when opseges='0' else "1110";
+  UsuallySS <= "1111" when opseges='0' else "1110";
+  TR <= TRData when UseAluTR='0' else AluTR;
+  
+  foo: process(Clock, Hold, state, IR, inreset, reset, regin, regout, IPCarryOut, CSCarryOut)
+  begin
+    if rising_edge(Clock) then
+
+    --states
+      if reset='1' and hold='0' then
+        InReset <= '1';
+        state <= ResetProcessor;
+        HoldAck <= '0';
+        CarryCS <= '1';
+        CarrySS <= '0';
+        regWE <= (others => '1');
+        regIn <= (others => "00000000");
+        regIn(REGCS) <= x"01";
+        regIn(REGSS) <= x"02";
+        IPAddend <= x"00";
+        SPAddend <= x"00";
+        AluOp <= "10001"; --reset TR in ALU
+        regbank <= '0';
+        fetchEN <= '1';
+        OpDataOut <= "ZZZZZZZZZZZZZZZZ";
+        OpAddress <= x"0000";
+        OpWE <= '0';
+        opWW <= '0';
+        TRData <= '0';
+        UseAluTR <= '0';
+        OpDestReg1<= x"0";
+        OpDestReg2 <= x"0";
+        OpUseReg2 <= '0';
+        --finish up
+      elsif InReset='1' and reset='0' and Hold='0' then --reset is done, start executing
+        InReset <= '0';
+        fetchEN <= '1';
+        state <= FirstFetch1;
+      elsif Hold = '1' and (state=HoldMemory or state=Execute or state=ResetProcessor) then
+        --do not hold immediately if waiting on memory or if waiting on the first fetch of an instruction after reset
+        state <= HoldMemory;
+        HoldAck <= '1';
+        FetchEN <= '0';
+      elsif Hold='0' and state=HoldMemory then
+        if reset='1' or InReset='1' then
+          state <= ResetProcessor;
+        else
+          state <= Execute;
+        end if;
+        FetchEN <= '1';
+      elsif state=FirstFetch1 then --we have to let IR get loaded before we can execute.
+        --regWE <= (others => '0');
+        fetchEN <= '1'; --already enabled, but anyway
+        --regWE <= (others => '0');
+        IPAddend <= x"02";
+        SPAddend <= x"00"; --no addend unless pushing or popping
+        RegWE <= (others => '0');
+        regIn(REGIP) <= IPCarryOut;
+        regWE(REGIP) <= '1';
+        regWE(REGCS) <= '1';
+        regIn(REGCS) <= CSCarryOut;
+        state <= Execute; 
+      elsif state=FirstFetch2 then
+        state <= FirstFetch3;
+        
+      elsif state=FirstFetch3 then
+        state <= Execute;
+      elsif state=WaitForMemory then
+        state <= Execute;
+        FetchEn <= '1';
+        IpAddend <= x"02";
+        --SpAddend <= x"00";
+        --SP can change here... really I don't *think* it can change from within Execute... so maybe that's redundant
+        regIn(REGSP) <= SPCarryOut; --with addend being 0, it'll just write SP to SP so it won't change, but this makes code easier for me
+        regIn(REGSS) <= SSCarryOut;
+        regWE(REGSP) <= '1';
+        regWE(REGSS) <= '1';
+        if OpWE='0' then
+          regIn(to_integer(unsigned(OpDestReg1))) <= OpDataIn(7 downto 0);
+          regWE(to_integer(unsigned(OpDestReg1))) <= '1';
+          if OpUseReg2='1' then
+            regIn(to_integer(unsigned(OpDestReg2))) <= OpDataIn(15 downto 8);
+            regWE(to_integer(unsigned(OpDestReg2))) <= '1';
+          end if;
+        end if;
+      elsif state=WaitForAlu then
+        state <= Execute;
+        regIn(to_integer(unsigned(AluRegOut))) <= AluOut;
+        regWE(to_integer(unsigned(AluRegOut))) <= '1';
+        FetchEN <= '1';
+        IPAddend <= x"02";
+        SPAddend <= x"00";
+      end if;
+
+
+      if state=Execute then
+        fetchEN <= '1';
+        --reset to "usual"
+        IPAddend <= x"02";
+        SPAddend <= x"00"; --no addend unless pushing or popping
+        RegWE <= (others => '0');
+        regIn(REGIP) <= IPCarryOut;
+        regWE(REGIP) <= '1';
+        regWE(REGCS) <= '1';
+        regIn(REGCS) <= CSCarryOut;
+        OpUseReg2 <= '0';
+        OpAddress <= "ZZZZZZZZZZZZZZZZ";
+        if UseAluTR='1' then
+          UseAluTR<='0';
+        end if;
+        --actual decoding
+        if opcond1='0' or (opcond1='1' and TR='1') then
+          case opmain is 
+            when "0000" => --mov reg,imm
+              regIn(to_integer(unsigned(bankreg1))) <= opimmd;
+              regWE(to_integer(unsigned(bankreg1))) <= '1';
+            when "0001" => --mov [reg],imm
+              OpAddress <= regOut(REGDS) & regOut(to_integer(unsigned(bankreg1)));
+              OpWE <= '1';
+              OpDataOut <= x"00" & opimmd;
+              OpWW <= '0';
+              state <= WaitForMemory;
+              IPAddend <= x"00"; --disable all this because we have to wait a cycle to write memory
+              FetchEN <= '0';
+            when "0011" => --group 3 comparisons
+              TRData <= AluTR;
+              UseAluTR <= '1';
+              AluOp <= "01" & opreg3; --nothing hard here, ALU does it all for us
+              AluIn1 <= regOut(to_integer(unsigned(bankreg1)));
+              AluIn2 <= regOut(to_integer(unsigned(bankreg2)));
+            when "0100" => --group 4 bitwise operations
+              --setup wait state
+              State <= WaitForAlu;
+              FetchEN <= '0';
+              IPAddend <= x"00";
+              AluOp <= "00" & opreg3; --nothing hard here, ALU does it all for us
+              AluIn1 <= regOut(to_integer(unsigned(bankreg1)));
+              AluIn2 <= regOut(to_integer(unsigned(bankreg2)));
+              AluRegOut <= bankreg1;
+              --regIn(to_integer(unsigned(bankreg1))) <= AluOut;
+              --regWE(to_integer(unsigned(bankreg1))) <= '1';
+           when "0101" => --group 5
+              case opreg3 is
+                when "000" => --subgroup 5-0
+                  case opreg2 is
+                    when "000" => --push reg
+                      SpAddend <= x"02"; --set SP to increment
+                      OpAddress <= regOut(to_integer(unsigned(UsuallySS))) & regOut(REGSP);
+                      OpWE <= '1';
+                      OpDataOut <= x"00" & regOut(to_integer(unsigned(bankreg1)));
+                      OpWW <= '1';
+                      state <= WaitForMemory;
+                      IPAddend <= x"00";
+                      FetchEN <= '0';
+                    when "001" => --pop reg
+                      SPAddend <= x"FE"; --set SP to decrement
+                      --TODO account for carryover properties
+                      OpAddress <= regOut(to_integer(unsigned(UsuallySS))) & std_logic_vector(unsigned(regOut(REGSP))-2); --decrement 2 here "early" 
+                      OpWE <= '0';
+                      OpDestReg1 <= bankreg1;
+                      --regIn(to_integer(unsigned(bankreg1))) <= OpData(7 downto 0);
+                      OpWW <= '0';
+                      state <= WaitForMemory;
+                      IPAddend <= x"00";
+                      FetchEN <= '0';
+                    when others =>
+                      --synthesis off
+                      report "Not implemented subgroup 5-0" severity error;
+                      --synthesis on
+                  end case;
+                when "001" => --mov reg, reg
+                  regIn(to_integer(unsigned(bankreg1))) <= regOut(to_integer(unsigned(bankreg2)));
+                  regWE(to_integer(unsigned(bankreg1))) <= '1';
+                when "010" => --mov reg, [reg] (load)
+                  OpDestReg1 <= bankreg1;
+                  OpWE <= '0';
+                  OpAddress <= regOut(to_integer(unsigned(UsuallyDS))) & regOut(to_integer(unsigned(bankreg2)));
+                  IpAddend <= x"00";
+                  FetchEN <= '0';
+                  state <= WaitForMemory;
+                when "011" => --mov [reg], reg (store)
+                  OpDataOut <= x"00" & regOut(to_integer(unsigned(bankreg2)));
+                  OpWW <= '0';
+                  OpWE <= '1';
+                  OpAddress <= regOut(to_integer(unsigned(UsuallyDS))) & regOut(to_integer(unsigned(bankreg1)));
+                  IpAddend <= x"00";
+                  FetchEN <= '0';
+                  state <= WaitForMemory;
+                when others =>
+                  --synthesis off
+                  report "Not implemented group 5" severity error;
+                  --synthesis on
+              end case;
+            when others => 
+              --synthesis off
+              report "Not implemented" severity error;
+              --synthesis on
+          end case;
+        end if;
+      end if;
+
+    end if;
+
+
+    
+  end process;
+
+
+
+
+
+
+
+
+  
+end Behavioral;

File src/fetch.vhd

+--This component interfaces with the memory controller and fetches the next instruction according to IP and CS
+--Each instruction is 16 bits.
+
+--How it works: IROut keeps the instruction that was featched in the "last" clock cycle. 
+--What is basically required is that AddressIn must be the value that CS:IP "will be" in the next clock cycle
+--This can cause some (in my opinion) odd logic at times, but should not have any problems synthesizing
+
+
+
+
+library IEEE;
+use IEEE.STD_LOGIC_1164.ALL;
+use IEEE.NUMERIC_STD.ALL;
+use work.tinycpu.all;
+
+entity fetch is 
+  port(
+    Enable: in std_logic;
+    AddressIn: in std_logic_vector(15 downto 0);
+    Clock: in std_logic;
+    DataIn: in std_logic_vector(15 downto 0); --interface from memory
+    IROut: out std_logic_vector(15 downto 0);
+    AddressOut: out std_logic_vector(15 downto 0) --interface to memory
+   );
+end fetch;
+
+architecture Behavioral of fetch is
+  signal IR: std_logic_vector(15 downto 0);
+begin
+  process(Clock, AddressIn, DataIn, Enable)
+  begin
+    --if(rising_edge(Clock)) then
+      if(Enable='1') then
+        IR <= DataIn;
+        AddressOut <= AddressIn;
+      else
+        IR <= x"FFFF"; --avoid a latch
+        AddressOut <= "ZZZZZZZZZZZZZZZZ";
+      end if;
+    --end if;
+  end process;
+  --AddressOut <= AddressIn when Enable='1' else "ZZZZZZZZZZZZZZZZ";
+  IROut <= IR;
+end Behavioral;

File src/memory.vhd

+--Memory management component
+--By having this separate, it should be fairly easy to add RAMs or ROMs later
+--This basically lets the CPU not have to worry about how memory "Really" works
+--currently just one RAM. 1024 byte blockram.vhd mapped as 0 - 1023
+
+library IEEE;
+use IEEE.STD_LOGIC_1164.ALL;
+use IEEE.NUMERIC_STD.ALL;
+
+
+
+entity memory is
+  port(
+    Address: in std_logic_vector(15 downto 0); --memory address (in bytes)
+    WriteWord: in std_logic; --if set, will write a full 16-bit word instead of a byte. Address must be aligned to 16-bit address. (bottom bit must be 0)
+    WriteEnable: in std_logic;
+    Clock: in std_logic;
+    DataIn: in std_logic_vector(15 downto 0);
+    DataOut: out std_logic_vector(15 downto 0);
+
+    Port0: inout std_logic_vector(7 downto 0)
+--    Reset: in std_logic
+    
+    --RAM/ROM interface (RAMA is built in to here
+    --RAMBDataIn: out std_logic_vector(15 downto 0);
+    --RAMBDataOut: in std_logic_vector(15 downto 0);
+    --RAMBAddress: out std_logic_vector(15 downto 0);
+    --RAMBWriteEnable: out std_logic_vector(1 downto 0);
+  );
+end memory;
+
+architecture Behavioral of memory is
+
+  component blockram
+    port(
+      Address: in std_logic_vector(7 downto 0); --memory address
+      WriteEnable: in std_logic_vector(1 downto 0); --write or read
+      Enable: in std_logic; 
+      Clock: in std_logic;
+      DataIn: in std_logic_vector(15 downto 0);
+      DataOut: out std_logic_vector(15 downto 0)
+    );
+  end component;
+
+  constant R1START: integer := 15;
+  constant R1END: integer := 1023+15;
+  signal addr: std_logic_vector(15 downto 0) := (others => '0');
+  signal R1addr: std_logic_vector(7 downto 0);
+  signal we: std_logic_vector(1 downto 0);
+  signal datawrite: std_logic_vector(15 downto 0);
+  signal dataread: std_logic_vector(15 downto 0);
+  --signal en: std_logic;
+  signal R1we: std_logic_vector(1 downto 0);
+  signal R1en: std_logic;
+  signal R1in: std_logic_vector(15 downto 0);
+  signal R1out: std_logic_vector(15 downto 0);
+
+  signal port0we: std_logic_vector(7 downto 0);
+  signal port0temp: std_logic_vector(7 downto 0);
+  signal port0out: std_logic_vector(7 downto 0);
+  signal port0in: std_logic_vector(7 downto 0);
+  
+begin
+  R1: blockram port map (R1addr, R1we, R1en, Clock, R1in, R1out);
+
+
+  gen2: for I in 0 to 7 generate
+    port0(I) <= port0out(I) when port0we(I)='1' else 'Z';
+    port0in(I) <= port0out(I) when port0we(I)='1' else port0(I);
+  end generate gen2;
+
+  addrwe: process(Address, WriteWord, WriteEnable, DataIn)
+  begin
+    addr <= Address(15 downto 1) & '0';
+    if WriteEnable='1' then
+      if WriteWord='1' then
+        we <= "11";
+        datawrite <= DataIn;
+      else
+        if Address(0)='0' then
+          we <= "01";
+          datawrite <= x"00" & DataIn(7 downto 0); --not really necessary
+        else
+          we <= "10";
+          datawrite <= DataIn(7 downto 0) & x"00";
+        end if;
+      end if;
+    else
+      datawrite <= x"0000";
+      we <= "00";
+    end if;
+  end process;
+  
+  assignram: process (we, datawrite, addr, r1out, port0, WriteEnable, Address, Clock, port0temp, port0we, DataIn)
+  variable tmp: integer;
+  variable tmp2: integer;
+  variable found: boolean := false;
+  begin
+    tmp := to_integer(unsigned(addr));
+    tmp2 := to_integer(unsigned(Address));
+    if tmp2 <= 15 then --internal registers/mapped IO
+      if rising_edge(Clock) then
+        if WriteWord='0' then
+          if tmp2=0 then
+            --dataread <= x"0000";
+            
+            gen: for I in 0 to 7 loop
+              if WriteEnable='1' then
+                if port0we(I)='1' then --1-bit port set to WRITE mode
+                  
+                  Port0out(I) <= DataIn(I);
+                  if I=0 then
+                   -- report string(DataIn(I));
+                    --assert(DataIn(I)='1') report "XXXXX" severity note;
+                    --port0(I) <= '1';
+                  end if;
+                  port0temp(I) <= DataIn(I);
+                  --dataread(I) <= DataIn(I);
+                else
+                  port0out(I) <= '0';
+                  --port0(I) <= 'Z';
+                  --port0temp(I) <= '0';
+                  --dataread(I) <= port0(I);
+                end if;
+              end if;
+            end loop gen;
+          elsif tmp2=1 then
+            --dataread <= x"00" & port0we;
+            if WriteEnable='1' then
+              port0we <= DataIn(7 downto 0);
+              --dataread<=x"00" & DataIn(7 downto 0);
+              setwe: for I in 0 to 7 loop
+                if DataIn(I)='0' then
+                  --port0(I) <= 'Z';
+                  port0temp(I) <= '0';
+                else
+                  if port0temp(I)='0' then
+                    port0out(I) <= '0';
+                    port0temp(I) <= '0';
+                  else
+                    port0out(I) <= port0temp(I);
+                  end if;
+                end if;
+              end loop setwe;
+            else
+              --dataread <= x"00" & port0we;
+            end if;
+          else
+            --synthesis off
+            report "Memory address is outside of bounds of RAM and registers" severity warning;
+            --synthesis on
+          end if;
+        
+        else
+          --synthesis off
+          report "WriteWord is not allowed in register area. Ignoring access" severity warning;
+          --synthesis on
+        end if;
+      end if;
+      dataread <= x"00" & port0in;
+--       outgen: for I in 0 to 7 loop
+--         if tmp2=0 then
+--           
+--           if port0we(I)='1' then
+--             if WriteEnable='1' then
+--               dataread(I) <= DataIn(I);
+--             else
+--               dataread(I) <= port0temp(I);
+--             end if;
+--           else
+--             if I=1 then
+--               --assert(port0(I)='1') report "XXX" severity note;
+--             end if;
+--             if port0(I)='1' then
+--               dataread(I) <= '1';
+--             else
+--               dataread(I) <= '0';
+--             end if;
+--             --dataread(I) <='1'; -- port0(I);
+--           end if;
+--         elsif tmp2=1 then
+--           if WriteEnable='1' then
+--             dataread(I) <= DataIn(I);
+--           else
+--             dataread(I) <= port0we(I);
+--           end if;
+--         else
+--           dataread(I) <= '0';
+--         end if;
+--       end loop outgen;
+      R1en <= '0';
+      R1we <= "00";
+      R1in <= x"0000";
+      R1addr <= x"00";
+    elsif tmp >= R1START and tmp <= R1END then --RAM bank1
+      --map all to R1
+      found := true;
+      R1en <= '1';
+      R1we <= we;
+      R1in <= datawrite;
+      dataread <= R1out;
+      R1addr <= addr(8 downto 1);
+    else
+      R1en <= '0';
+      R1we <= "00";
+      R1in <= x"0000";
+      R1addr <= x"00";
+      dataread <= x"0000";
+    end if;
+  end process;
+
+  readdata: process(Address, dataread)
+  begin
+    if to_integer(unsigned(Address))>15 then
+      if Address(0) = '0' then
+        DataOut <= dataread;
+      else
+        DataOut <= x"00" & dataread(15 downto 8);
+      end if;
+    else
+      DataOut <= x"00" & dataread(7 downto 0);
+    end if;
+  end process;
+end Behavioral;

File src/registerfile.vhd

+--registerfile module
+--16 registers, read/write port for all registers. 
+--8 bit registers
+
+library IEEE;
+use IEEE.STD_LOGIC_1164.ALL;
+use IEEE.NUMERIC_STD.ALL;
+use ieee.std_logic_unsigned.all; 
+use work.tinycpu.all;
+
+entity registerfile is
+
+port(
+  WriteEnable: in regwritetype;
+  DataIn: in regdatatype;
+  Clock: in std_logic;
+  DataOut: out regdatatype
+);
+end registerfile;
+
+architecture Behavioral of registerfile is
+  type registerstype is array(0 to 15) of std_logic_vector(7 downto 0);
+  signal registers: registerstype;
+  --attribute ram_style : string;
+  --attribute ram_style of registers: signal is "distributed";
+begin
+  regs: for I in 0 to 15 generate
+    process(WriteEnable(I), DataIn(I), Clock)
+    begin
+      if rising_edge(Clock) then --I really hope this one falling_edge component doesn't bite me in the ass later
+        if(WriteEnable(I) = '1') then
+          registers(I) <= DataIn(I);
+        end if;
+      end if;
+    end process;
+    DataOut(I) <= registers(I) when WriteEnable(I)='0' else DataIn(I);
+     -- DataOut(I) <= registers(I);
+  end generate regs;
+end Behavioral;

File src/tinycpu.vhd

+
+library IEEE;
+use IEEE.STD_LOGIC_1164.all;
+
+package tinycpu is
+
+        type regdatatype is array(15 downto 0) of std_logic_vector(7 downto 0);
+        type regwritetype is array(15 downto 0) of std_logic;
+
+end tinycpu;
+
+package body tinycpu is
+
+ 
+end tinycpu;
+--Memory management component
+--By having this separate, it should be fairly easy to add RAMs or ROMs later
+--This basically lets the CPU not have to worry about how memory "Really" works
+--currently just one RAM. 1024 byte blockram.vhd mapped as 0 - 1023
+
+library IEEE;
+use IEEE.STD_LOGIC_1164.ALL;
+use IEEE.NUMERIC_STD.ALL;
+
+
+
+entity top is
+  port(
+    Reset: in std_logic;
+    Hold: in std_logic;
+    HoldAck: out std_logic;
+    Clock: in std_logic;
+    DMA: in std_logic; --when high, Address, WriteEnable, and Data are connected to memory
+    Address: in std_logic_vector(15 downto 0); --memory address (in bytes)
+    WriteEnable: in std_logic;
+    Data: inout std_logic_vector(15 downto 0);
+    Port0: inout std_logic_vector(7 downto 0);
+    --debug ports
+    DebugR0: out std_logic_vector(7 downto 0)
+  );
+end top;
+
+architecture Behavioral of top is
+
+  component memory is
+    port(
+      Address: in std_logic_vector(15 downto 0); --memory address (in bytes)
+      WriteWord: in std_logic; --if set, will write a full 16-bit word instead of a byte. Address must be aligned to 16-bit address. (bottom bit must be 0)
+      WriteEnable: in std_logic;
+      Clock: in std_logic;
+      DataIn: in std_logic_vector(15 downto 0);
+      DataOut: out std_logic_vector(15 downto 0);
+      Port0: inout std_logic_vector(7 downto 0)
+    );
+  end component;
+
+  component core is 
+    port(
+      --memory interface 
+      MemAddr: out std_logic_vector(15 downto 0); --memory address (in bytes)
+      MemWW: out std_logic; --memory writeword
+      MemWE: out std_logic; --memory writeenable
+      MemIn: in std_logic_vector(15 downto 0);
+      MemOut: out std_logic_vector(15 downto 0);
+      --general interface
+      Clock: in std_logic;
+      Reset: in std_logic; --When this is high, CPU will reset within 1 clock cycles. 
+      --Enable: in std_logic; --When this is high, the CPU executes as normal, when low the CPU stops at the next clock cycle(maintaining all state)
+      Hold: in std_logic; --when high, CPU pauses execution and places Memory interfaces into high impendance state so the memory can be used by other components
+      HoldAck: out std_logic; --when high, CPU acknowledged hold and buses are in high Z
+      --todo: port interface
+
+      --debug ports:
+      DebugIR: out std_logic_vector(15 downto 0); --current instruction
+      DebugIP: out std_logic_vector(7 downto 0); --current IP
+      DebugCS: out std_logic_vector(7 downto 0); --current code segment
+      DebugTR: out std_logic; --current value of TR
+      DebugR0: out std_logic_vector(7 downto 0)
+    );
+  end component;
+  component bootrom is
+    port(
+        CLK : in std_logic;
+        EN : in std_logic;
+        ADDR : in std_logic_vector(4 downto 0);
+        DATA : out std_logic_vector(15 downto 0)
+    );
+  end component;
+  signal cpuaddr: std_logic_vector(15 downto 0);
+  signal cpuww: std_logic;
+  signal cpuwe: std_logic;
+  signal cpumemin: std_logic_vector(15 downto 0);
+  signal cpumemout: std_logic_vector(15 downto 0);
+  signal debugir: std_logic_vector(15 downto 0);
+  signal debugip: std_logic_vector(7 downto 0);
+  signal debugcs: std_logic_vector(7 downto 0);
+  signal debugtr: std_logic;
+
+  signal MemAddress: std_logic_vector(15 downto 0); --memory address (in bytes)
+  signal MemWriteWord: std_logic; --if set, will write a full 16-bit word instead of a byte. Address must be aligned to 16-bit address. (bottom bit must be 0)
+  signal MemWriteEnable: std_logic;
+  signal MemDataIn: std_logic_vector(15 downto 0);
+  signal MemDataOut: std_logic_vector(15 downto 0);
+
+  signal BootAddress: std_logic_vector(4 downto 0);
+  signal BootMemAddress: std_logic_vector(15 downto 0);
+  signal BootDataIn: std_logic_vector(15 downto 0);
+  signal BootDataOut: std_logic_vector(15 downto 0);
+  signal BootDone: std_logic;
+  signal BootFirst: std_logic;
+  signal Port0Temp: std_logic_vector(7 downto 0);
+  constant ROMSIZE: integer := 64;
+  signal counter: std_logic_vector(4 downto 0);
+begin
+  cpu: core port map (
+    MemAddr => cpuaddr,
+    MemWW => cpuww,
+    MemWE => cpuwe,
+    MemIn => cpumemin,
+    MemOut => cpumemout,
+    Clock => Clock,
+    Reset => Reset,
+    Hold => Hold,
+    HoldAck => HoldAck,
+    DebugIR => DebugIR,
+    DebugIP => DebugIP,
+    DebugCS => DebugCS,
+    DebugTR => DebugTR,
+    DebugR0 => DebugR0
+  );
+  mem: memory port map(
+    Address => MemAddress,
+    WriteWord => MemWriteWord,
+    WriteEnable => MemWriteEnable,
+    Clock => Clock,
+    DataIn => MemDataIn,
+    DataOut => MemDataOut,
+    Port0 => Port0Temp --Port0
+  );
+  rom: bootrom port map(
+    clk => clock,
+    EN => '1',
+    Addr => BootAddress,
+    Data => BootDataOut
+  );
+  Port0 <= Port0Temp when Reset='0' else x"FF";
+  MemAddress <= cpuaddr when (DMA='0' and Reset='0') else BootMemAddress when (Reset='1' and DMA='0') else Address;
+  MemWriteWord <= cpuww when DMA='0' and Reset='0' else '1' when Reset='1'  and DMA='0' else '1';
+  MemWriteEnable <= cpuwe when DMA='0' and Reset='0' else'1'  when Reset='1' and DMA='0' else WriteEnable;
+  MemDataIn <= cpumemout when DMA='0' and Reset='0' else Data when WriteEnable='1' else BootDataIn when Reset='1' and DMA='0' else "ZZZZZZZZZZZZZZZZ";
+  cpumemin <= MemDataOut;
+  Data <= MemDataOut when DMA='1' and Reset='0' and WriteEnable='0' else "ZZZZZZZZZZZZZZZZ";
+  bootload: process(Clock, Reset)
+  begin
+    if rising_edge(clock) then
+      if Reset='0' then
+        counter <= "00000";
+        BootDone <= '0';
+        BootAddress <= "00000";
+        BootDataIn <= BootDataOut;
+        BootFirst <= '1';
+      elsif Reset='1' and BootFirst='1' then
+        BootMemAddress <= "00000001000" & "00000";
+        BootAddress <= "00001";
+        --BootDataIn <= BootDataOut;
+        counter <= "00001";
+        BootFirst <= '0';
+      elsif Reset='1' and BootDone='0' then
+        BootMemAddress <= "0000000100" & std_logic_vector(unsigned(counter)-1) & "0";
+        BootAddress <= std_logic_vector(unsigned(counter) + 1);
+        BootDataIn <= BootDataOut;
+        counter <= std_logic_vector(unsigned(counter) + 1);
+        if to_integer(unsigned(counter))>=(ROMSIZE/2-2) then
+          BootDone <= '1';
+        end if;
+      else
+        
+      end if;
+    end if;
+  end process;
+end Behavioral;

File testbench/alu_tb.vhd

+LIBRARY ieee;
+USE ieee.std_logic_1164.ALL;
+USE ieee.numeric_std.ALL;
+use work.tinycpu.all;
+
+ENTITY alu_tb IS
+END alu_tb;
+ 
+ARCHITECTURE behavior OF alu_tb IS 
+ 
+-- Component Declaration for the Unit Under Test (UUT)
+ 
+  component alu is 
+    port(
+      Op: in std_logic_vector(4 downto 0);
+      DataIn1: in std_logic_vector(7 downto 0);
+      DataIn2: in std_logic_vector(7 downto 0);
+      DataOut: out std_logic_vector(7 downto 0);
+      TR: out std_lo