If a program is stored in a slow external memory device (e.g. a serial Flash chip), it must be copied to the RAM before executing. While it is possible to design a RAM loading machine in Verilog or VHDL, this task can be also accomplished by the CPU itself, provided that the program RAM is accessible from the data bus.
To do this we will need a small ROM consisting of eight 32-bit words which will be accessible to the CPU via the instruction bus (it doesn't have to be connected to the data bus). Such a tiny ROM can be implemented on LUTs without using RAM blocks.
To make the bootloader as small as possible, we place a few limitations on the system address space layout:
Let's also assume that the source address is 0x04000000 (any address consisting mostly of zeros will work), and the program size is 2048 words (i.e. 8192 bytes, any size less than 1048576 will work).
// The most significant byte of the source address // (others are assumed to be zero) #define SOURCE_ADDRESS_MSB 0x04 // Program size in bytes #define PROGRAM_SIZE 8192 // Register variables #define src r1 #define dst r2 #define size r3 #define loop_ptr r4 sl src, SOURCE_ADDRESS_MSB, 24 // load source pointer to r1 // dst (r2) is already 0 after reset lcs size, PROGRAM_SIZE lcs loop_ptr, Loop Loop: lw r0, src sw dst, r0 add src, src, 4 add dst, dst, 4 cjmpul loop_ptr, dst, size
The source address is loaded using the sl instruction instead of lc since the former occupies only one word. After copying the required number of words, the last cjmpul instruction will cause the instruction pointer to overflow and transfer execution to the 0x00000000 address, which is the start of the program RAM.
To compile the code, we must specify the base address explicitly, for example:
As a result, the following machine code will be generated:
70010418 A0032000 BF04FFEC 22000100 33000200 42010104 42020204 CB040302
module bootrom( input clk_i, input [2:0] addr_i, output reg [31:0] data_o ); always@(posedge clk_i) begin case(addr_i) 3'b000: data_o = 32'h70010418; 3'b001: data_o = 32'hA0032000; 3'b010: data_o = 32'hBF04FFEC; 3'b011: data_o = 32'h22000100; 3'b100: data_o = 32'h33000200; 3'b101: data_o = 32'h42010104; 3'b110: data_o = 32'h42020204; default: data_o = 32'hCB040302; endcase end endmodule
library ieee; use ieee.std_logic_1164.all; entity bootrom is port( clk_i: in std_logic; addr_i: in std_logic_vector(2 downto 0); data_o: out std_logic_vector(31 downto 0) ); end entity; architecture rtl of bootrom is begin process(clk_i) begin if rising_edge(clk_i) then case addr_i is when "000" => data_o<=X"70010418"; when "001" => data_o<=X"A0032000"; when "010" => data_o<=X"BF04FFEC"; when "011" => data_o<=X"22000100"; when "100" => data_o<=X"33000200"; when "101" => data_o<=X"42010104"; when "110" => data_o<=X"42020204"; when others => data_o<=X"CB040302"; end case; end if; end process; end architecture;