简单的MIPS5级流水线CPU设计

时间:2024-03-01 08:59:39

更新(2017.11.08):DataRAM的时钟应该取反接入,即clka(~clk),而非clka(clk)

算是一个比较大的实验,放上来做个纪念。后续会解决Pipeline hazards

一、实验目的和要求

1.Purposes

    The general purpose is to construct a simple(namely without solving the pipeline hazards) pipeline CPU, specific purposes are listed as follow:

1) Understand the principles of Pipelined CPU

2) Understand the basic units of Pipelined CPU

3) Understand the working flow of 5-stages

4) Master the method of simple Pipelined CPU

5) Master methods of program verification of simple Pipelined CPU

 

2. Requirements

    1) Design the CPU Controller

    2) Based on the CPU Controller, Design the Datapath of 5-stages Pipelined CPU

        a. 5 Stages

        b. Register File

        c. Memory (Instruction and Data)

        d. Other basic units

    3) Verify the Pp. CPU with program and observe the execution of program

 

二、实验内容和原理

1.Contents

    Design a simple pipeline CPU (without solving pipe hazards). The design should be done by using ISE and verified on the experiment board. What\'s more, one needs to write instructions to execute on the CPU, to show that there DOES exist hazards.

 

2.Princples

    My design is mainly based on the following schematic:

 

 

 

 

 

 

 

 

 

 

    The pipeline CPU is known for its high performance with respect to its special strategy that instructions are executed in different stages( in MIPS we have 5 stages, i.e. IF, ID, EX, MEM, and WB). Therefore in MIPS pipeline the efficiency is almost 5 times higher than a single-clock CPU. One should notice that the efficiency is NOT exactly 5 times due to other costs in real design, especially when hazards exist. Pipeline hazards may severely ruin a pipeline CPU if not solved properly. But here we simply don\'t consider hazards, instead, we just need to rewrite the single-cycle CPU to a new one with 5 stages.

    One can easily observe that at the same time, different stage is dealing with different instructions rather than the same one:

    

 

 

 

 

 

 

 

 

 

    The key to design a pipeline CPU is to pass the proper stuff generated by the current stage to the next stage.

    In IF, we need to send PC+4 and instruction to ID. In ID we decode the instruction and send ALL the signals to EX, what\'s more, outputs of register files, the extended immediate , rt and rd fields should also be sent. In EX, we send some of the signals to MEM(they\'re RegWrite, MemRead, MemWrite, MemtoReg and Branch), branch address, the results of ALU, the output of register files(RtData), and the write-back address of register files. In MEM, we send RegWrite, RegtoMem, the results of ALU, write-back address and the output of DataRAM to WB. In WB, we write the data back to registers.

 

三、实验过程和数据记录

1. Procedures

    1) New a project in ISE, notice that parameters of the top module should be correct (choose Kintex7).

    2) Design IF

    IF means "instruction fetch", it includes PC register, multiplexer for PC value, and an adder(32-bit). The codes are as follow:

    module IF(clk,reset,branch_or_pc,

branch_addr,next_pc_if,inst_if,

pc

);

input clk;

input reset;

input branch_or_pc;//Branch&ALU_zero

input[31:0] branch_addr;//Branch跳转地址

output[31:0] next_pc_if;//pc+4

output[31:0] inst_if;//从ROM中读的指令

output[31:0] pc;

 

//PC的多选器

reg[31:0] pc_in;

always@(*)begin

case(branch_or_pc)

1\'b0:pc_in<=next_pc_if;//没有分支也没有jump

1\'b1:pc_in<=branch_addr;//有Branch

endcase

end

 

//PC寄存器

reg[31:0] pc;

always@(posedge clk)begin

if(reset) pc<=32\'b0;//复位

else pc<=pc_in;

end

 

//计算下一个PC的加法器

adder_32bits adder32_bits_if(

.a(pc),

.b(32\'b00000000000000000000000000000100),

.c(next_pc_if)

);

 

//指令ROM

InstructionROM InstructionROM(.a(pc[11:2]),.spo(inst_if));

endmodule

    Notice that next_pc_if and inst_if need to be sent to the next stage. The ROM stores the instruction and is used for test the pipeline CPU. Bits for 2 to 11 from pc is sent to ROM, as shown in the code.

    3) Design ID

    ID means "instruction fetch". This module includes the CPU controller, which is responsible for decoding, and register files. The codes of MEM are as follow:

    module ID(clk,reset,inst_id,

RegWrite_wb,RegWriteAddr_wb,RegWriteData_wb,

RegDst_id,MemtoReg_id,RegWrite_id,

MemWrite_id,MemRead_id,ALUCode_id,

ALUSrcB_id,Branch_id,

Imm_id,RsData_id,RtData_id,

RtAddr_id,RdAddr_id

);

input clk;

input reset;

input[31:0] inst_id;//IF给的指令

 

//WB级的输入

input RegWrite_wb;

input[4:0] RegWriteAddr_wb;

input[31:0] RegWriteData_wb;

 

//八个信号输出

output RegWrite_id;

output RegDst_id;

output MemRead_id;

output MemWrite_id;

output ALUSrcB_id;

output Branch_id;

output MemtoReg_id;

output[2:0] ALUCode_id;

 

//其他输出

output[31:0] Imm_id;//符号拓展

output[31:0] RsData_id;//寄存器堆输出1

output[31:0] RtData_id;//寄存器堆输出2

output[4:0] RtAddr_id;//rt

output[4:0] RdAddr_id;//rd

 

assign RtAddr_id=inst_id[20:16];//rt

assign RdAddr_id=inst_id[15:11];//rd

assign Imm_id={{16{inst_id[15]}},inst_id[15:0]};//符号扩展成32位立即数

 

/*控制模块*/

CtrlUnit CtrlUnit(

//输入

.inst(inst_id),

//输出

.RegWrite(RegWrite_id),.RegDst(RegDst_id),

.Branch(Branch_id),.MemRead(MemRead_id),.MemWrite(MemWrite_id),

.ALUCode(ALUCode_id),.ALUSrc_B(ALUSrcB_id),

.MemtoReg(MemtoReg_id)

);

 

/*寄存器堆模块*/

RegisterFiles RegisterFiles(

//输入,由WB级来提供

.clk(clk),.rst(reset),.L_S(RegWrite_wb),

.R_addr_A(inst_id[25:21]),.R_addr_B(inst_id[20:16]),

.Wt_addr(RegWriteAddr_wb),.wt_data(RegWriteData_wb),

//输出

.rdata_A(RsData_id),.rdata_B(RtData_id)

);

 

 

endmodule

    The following shows the design details for control unit:

    module CtrlUnit(inst,RegWrite,RegDst,

Branch,MemRead,

MemWrite,ALUCode,

ALUSrc_B,

MemtoReg

);

input[31:0] inst;

output RegWrite;

output RegDst;

output Branch;

output MemRead;

output MemWrite;

output[2:0] ALUCode;

output ALUSrc_B;

output MemtoReg;//1:来自mem

 

wire[5:0] op;

wire[5:0] func;

wire[4:0] rt;

assign op=inst[31:26];//op字段

assign func=inst[5:0];//func字段

 

//R指令

parameter R_type_op=6\'b000000;

parameter ADD_func=6\'b100000;

parameter AND_func=6\'b100100;

parameter XOR_func=6\'b100110;

parameter OR_func=6\'b100101;

parameter NOR_func=6\'b100111;

parameter SUB_func=6\'b100010;

 

//R_type

wire ADD,AND,NOR,OR,SUB,XOR,R_type;

assign ADD=(op==R_type_op)&&(func==ADD_func);

assign AND=(op==R_type_op)&&(func==AND_func);

assign NOR=(op==R_type_op)&&(func==NOR_func);

assign OR=(op==R_type_op)&&(func==OR_func);

assign SUB=(op==R_type_op)&&(func==SUB_func);

assign XOR=(op==R_type_op)&&(func==XOR_func);

assign R_type=ADD||AND||NOR||OR||SUB||XOR;

 

//Branch

parameter BEQ_op=6\'b000100;

parameter BNE_op=6\'b000101;

wire BEQ,BNE,Branch;

assign BEQ=(op==BEQ_op);

assign BNE=(op==BNE_op);

assign Branch=BEQ||BNE;

 

// I_type instruction decode

parameter ADDI_op=6\'b001000;

parameter ANDI_op=6\'b001100;

parameter XORI_op=6\'b001110;

parameter ORI_op=6\'b001101;

wire ADDI,ANDI,XORI,ORI,I_type;

assign ADDI=(op== ADDI_op);

assign ANDI=(op==ANDI_op);

assign XORI=(op==XORI_op);

assign ORI=(op==ORI_op);

assign I_type=ADDI||ANDI||XORI||ORI;

 

// SW ,LW instruction decode

parameter SW_op=6\'b101011;

parameter LW_op=6\'b100011;

wire SW,LW;

assign SW=(op==SW_op);

assign LW=(op==LW_op);

 

// Control Singal

assign RegWrite=LW||R_type||I_type;//要写寄存器

assign RegDst=R_type;//RegDst=1,选择rd,只有R指令这样

assign MemWrite=SW;

assign MemRead=LW;

assign MemtoReg=LW;

assign ALUSrc_B=LW||SW||I_type;

 

// ALUCode

//自己定义的,只要能在ALU里对应的上就行

parameter alu_add=3\'b010;

parameter alu_sub=3\'b110;

parameter alu_and=3\'b000;

parameter alu_or=3\'b001;

parameter alu_xor=3\'b011;

parameter alu_nor=3\'b100;

 

reg[2:0] ALUCode;

always@(*)begin

if(op==R_type_op)begin

case(func)

ADD_func: ALUCode<=alu_add;

AND_func: ALUCode<=alu_and;

XOR_func: ALUCode<=alu_xor;

OR_func: ALUCode<=alu_or;

NOR_func: ALUCode<=alu_nor;

SUB_func: ALUCode<=alu_sub;

default: ALUCode<=alu_add;

endcase

end

elsebegin

case(op)

BEQ_op: ALUCode<=alu_sub;

BNE_op: ALUCode<=alu_sub;

ADDI_op: ALUCode<=alu_add;

ANDI_op: ALUCode<=alu_and;

XORI_op: ALUCode<=alu_xor;

ORI_op: ALUCode<=alu_or;

SW_op: ALUCode<=alu_add;

LW_op: ALUCode<=alu_add;

default: ALUCode<=alu_add;

endcase

end

end

 

endmodule

    The details for register files:

    module RegisterFiles(

input clk, rst, L_S,

input[4:0] R_addr_A, R_addr_B, Wt_addr,

input[31:0] wt_data,

output[31:0] rdata_A, rdata_B

);

reg[31:0] register [1:31];

integer i;

assign rdata_A=(R_addr_A==0)?0: register[R_addr_A];

assign rdata_B=(R_addr_B==0)?0: register[R_addr_B];

 

always@(posedge clk orposedge rst)begin

if(rst==1)

for(i=1; i<32; i= i+1)

register[i]<=0;

elseif((Wt_addr!=0)&&(L_S==1))

register[Wt_addr]<= wt_data;

end

 

endmodule

    The register files are the same as the previous (used inComputer Organization course)

    4) Design EX

    EX means "execution". It contains ALU and an adder.

    The ALU is illustrated as follows:

    The design is done by drawing schematics. Therefore more details are omitted here. The components are actually the same as those inComputer Organization course.

    The codes for EX module are shown below:

    module EX(clk,next_pc_ex,

ALUCode_ex,ALUSrcB_ex,

RegDst_ex,

Imm_ex,RsData_ex,RtData_ex,

RtAddr_ex,RdAddr_ex,

//输出

Branch_addr_ex,

alu_zero_ex,alu_res_ex,RegWriteAddr_ex

);

input clk;

input[31:0] next_pc_ex;

input[2:0] ALUCode_ex;

input ALUSrcB_ex;

input RegDst_ex;

input[31:0] Imm_ex;

input[31:0] RsData_ex;

input[31:0] RtData_ex;

input[4:0] RtAddr_ex;

input[4:0] RdAddr_ex;

//

output[31:0] Branch_addr_ex;

output alu_zero_ex;

output[31:0] alu_res_ex;

outputreg[4:0] RegWriteAddr_ex;

 

//分支地址

adder_32bits adder_32bits_ex(.a(next_pc_ex),.b(Imm_ex<<2),.c(Branch_addr_ex));

 

//ALUSrcB的多选器

reg[31:0] alu_in;

always@(*)begin

case(ALUSrcB_ex)

1\'b0:alu_in<=RtData_ex;//来自寄存器堆第二个输出

1\'b1:alu_in<=Imm_ex;//来自符号扩展

endcase

end

 

//ALU

ALU ALU(.ALU_operation(ALUCode_ex),.A(RsData_ex),.B(alu_in),

.res(alu_res_ex),.zero(alu_zero_ex),.overflow()//overflow什么也不连

);

 

//写寄存器堆地址的多选器

always@(*)begin

case(RegDst_ex)

1\'b0:RegWriteAddr_ex<=RtAddr_ex;//rt

1\'b1:RegWriteAddr_ex<=RdAddr_ex;//rd

endcase

end

 

endmodule

    5) Design MEM

    module MEM(clk,MemRead_mem,

MemWrite_mem,Branch_mem,alu_zero_mem,

alu_res_mem,RtData_mem,

branch_or_pc_mem,Dout_mem

);

input clk;

//MemRead信号暂时不要了

input MemRead_mem;

input MemWrite_mem;

input Branch_mem;

input alu_zero_mem;

input[31:0]alu_res_mem;

input[31:0] RtData_mem;

output branch_or_pc_mem;

output[31:0] Dout_mem;

 

DataRAM DataRAM(

.clka(clk),//input clka

.wea(~MemRead_mem&MemWrite_mem),//input [0:0] wea

.addra(alu_res_mem[11:2]),//input [9 : 0] addra

.dina(RtData_mem),//input [31:0] dina

.douta(Dout_mem)//output [31:0] douta

);

 

//and模块,确定跳转信号

and_1bit and_1bit(.a(Branch_mem),.b(alu_zero_mem),.c(branch_or_pc_mem));

 

endmodule

    Also an adder should be included, which is serving for calculating the proper address of writing back.

    module adder_32bits(

input[31:0] a,

input[31:0] b,

output[31:0] c

);

 

assign c= a+ b;

endmodule

 

    6) Design WB

    The stage is simple and can be done in top:

    /*WB*/

//只有一个多选器,直接在顶层实现

//选择写回的内容

reg[31:0] reg_data_wb;

 

always@(*)begin

case(MemtoReg_wb)

1\'b0:reg_data_wb<=alu_res_wb;//来自ALU

1\'b1:reg_data_wb<=Dout_wb;//来自RAM

endcase

end

    

    7) Design top module

    Finally we come to the top module, this module is actually for connecting lines between registers and different stages, which is shown as below:

    module MipsPipelineCPU(clk,reset,inst_if,

alu_res_ex,Dout_mem,

RtData_id,PC_out

);

//CPU模块输入:clk、reset

//CPU模块输出:PC地址、指令、ALU运算结果、寄存器堆的数据输出B、Memory结果

//这些数据都是一开始产生就传递给输出

input clk;//100Mhz

input reset;

output[31:0] inst_if;//指令,送给顶层的data2

output[31:0] alu_res_ex;//ALU结果送给data4

output[31:0] Dout_mem;//memory输出送给data6,就是图里的Data_in

output[31:0] RtData_id;//寄存器堆的输出B,送给data5,就是图里的Data_out

output[31:0] PC_out;//pc,送给data7

 

 

/*IF级*/

wire branch_or_pc_mem;//本来是MEM级的!

wire[31:0] Branch_addr_mem;//本来是MEM级的!

wire[31:0] next_pc_if;

wire[31:0] inst_if;

IF IF(

//输入

.clk(clk),

.reset(reset),

.branch_or_pc(branch_or_pc_mem),//需要MEM的输入,branch_or_pc_mem

.branch_addr(Branch_addr_mem),//需要EX/MEM的输入

//输出

.next_pc_if(next_pc_if),

.inst_if(inst_if),

.pc(PC_out)//当前pc

);

 

/*IF-ID寄存器*/

wire[31:0] next_pc_id;

wire[31:0] inst_id;

flipflop#(.WIDTH(32))IF_ID1(

.clk(clk),

.reset(reset),

.in(inst_if),//送指令

.out(inst_id)

);

flipflop#(.WIDTH(32))IF_ID2(

.clk(clk),

.in(next_pc_if),//送pc+4

.reset(reset),

.out(next_pc_id)

);

 

//注意这里申明了WB级的东西:RegWrite和RegWriteAddr,有点混乱,写WB级注意不要重复!

wire[4:0] RtAddr_id,RdAddr_id;

wire RegWrite_wb,MemtoReg_id,RegWrite_id,MemWrite_id;

wire MemRead_id,ALUSrcB_id,RegDst_id,Branch_id;

wire[4:0] RegWriteAddr_wb;

wire[2:0] ALUCode_id;

wire[31:0] Imm_id,RsData_id,RtData_id;

 

/*ID级*/

wire[31:0] RegWriteData_wb;//WB级的东西,注意!

assign RegWriteData_wb=reg_data_wb;

 

ID ID(.clk(clk),.reset(reset),.inst_id(inst_id),

.RegWrite_wb(RegWrite_wb),.RegWriteAddr_wb(RegWriteAddr_wb),

.RegWriteData_wb(RegWriteData_wb),//送进来的数据要经过选择,在WB命名为reg_data_wb!

.RegWrite_id(RegWrite_id),.RegDst_id(RegDst_id),.MemtoReg_id(MemtoReg_id),

.MemWrite_id(MemWrite_id),.MemRead_id(MemRead_id),

.ALUCode_id(ALUCode_id),.ALUSrcB_id(ALUSrcB_id),

.Branch_id(Branch_id),.Imm_id(Imm_id),.RsData_id(RsData_id),.RtData_id(RtData_id),

.RtAddr_id(RtAddr_id),.RdAddr_id(RdAddr_id));

 

/*ID-EX级间寄存器*/

//总共14根线

wire[4:0] RtAddr_ex,RdAddr_ex;

wire MemtoReg_ex,RegWrite_ex,MemWrite_ex;

wire MemRead_ex,ALUSrcB_ex,RegDst_ex,Branch_ex;

wire[2:0] ALUCode_ex;

wire[31:0] Imm_ex,RsData_ex,RtData_ex,next_pc_ex;

flipflop#(.WIDTH(1))ID_EX1(

.clk(clk),

.reset(reset),

.in(RegWrite_id),//RegWrite

.out(RegWrite_ex)

);

flipflop#(.WIDTH(1))ID_EX2(

.clk(clk),

.reset(reset),

.in(RegDst_id),//RegDst

.out(RegDst_ex)

);

flipflop#(.WIDTH(1))ID_EX3(

.clk(clk),

.reset(reset),

.in(MemRead_id),//MemRead

.out(MemRead_ex)

);

flipflop#(.WIDTH(1))ID_EX4(

.clk(clk),

.reset(reset),

.in(MemWrite_id),//MemWrite

.out(MemWrite_ex)

);

flipflop#(.WIDTH(1))ID_EX5(

.clk(clk),

.reset(reset),

.in(ALUSrcB_id),//ALUSrcB_id

.out(ALUSrcB_ex)

);

flipflop#(.WIDTH(1))ID_EX6(

.clk(clk),

.reset(reset),

.in(MemtoReg_id),//MemtoReg

.out(MemtoReg_ex)

);

flipflop#(.WIDTH(1))ID_EX7(

.clk(clk),

.reset(reset),

.in(Branch_id),//Branch

.out(Branch_ex)

);

flipflop#(.WIDTH(3))ID_EX8(//注意这里的宽度是3!

.clk(clk),

.reset(reset),

.in(ALUCode_id),//ALUCode

.out(ALUCode_ex)

);

flipflop#(.WIDTH(32))ID_EX9(//注意是32位!

.clk(clk),

.reset(reset),

.in(next_pc_id),//pc+4

.out(next_pc_ex)

);

flipflop#(.WIDTH(32))ID_EX10(

.clk(clk),

.reset(reset),

.in(RsData_id),//寄存器堆A

.out(RsData_ex)

);

flipflop#(.WIDTH(32))ID_EX11(

.clk(clk),

.reset(reset),

.in(RtData_id),//寄存器堆B

.out(RtData_ex)

);

flipflop#(.WIDTH(32))ID_EX12(

.clk(clk),

.reset(reset),

.in(Imm_id),//Imm,符号拓展

.out(Imm_ex)

);

flipflop#(.WIDTH(5))ID_EX13(//注意宽度是5!

.clk(clk),

.reset(reset),

.in(RtAddr_id),//rt

.out(RtAddr_ex)

);

flipflop#(.WIDTH(5))ID_EX14(

.clk(clk),

.reset(reset),

.in(RdAddr_id),//rd

.out(RdAddr_ex)

);

 

/*EX级*/

wire[31:0] Branch_addr_ex;

wire[31:0] alu_res_ex;

wire alu_zero_ex;

wire[4:0] RegWriteAddr_ex;

EX EX(.clk(clk),.next_pc_ex(next_pc_ex),

.ALUCode_ex(ALUCode_ex),.ALUSrcB_ex(ALUSrcB_ex),

.RegDst_ex(RegDst_ex),

.Imm_ex(Imm_ex),.RsData_ex(RsData_ex),.RtData_ex(RtData_ex),

.RtAddr_ex(RtAddr_ex),.RdAddr_ex(RdAddr_ex),

//输出

.Branch_addr_ex(Branch_addr_ex),

.alu_zero_ex(alu_zero_ex),.alu_res_ex(alu_res_ex),

.RegWriteAddr_ex(RegWriteAddr_ex)

);

 

/*EX-MEM级间寄存器*/

wire RegWrite_mem;

wire MemRead_mem;

wire MemWrite_mem;

wire MemtoReg_mem;

wire[31:0] alu_res_mem;

wire alu_zero_mem;

wire[31:0] RtData_mem;

wire[4:0] RegWriteAddr_mem;

flipflop#(.WIDTH(1))EX_MEM1(

.clk(clk),

.reset(reset),

.in(RegWrite_ex),//RegWrite

.out(RegWrite_mem)

);

flipflop#(.WIDTH(1))EX_MEM2(

.clk(clk),

.reset(reset),

.in(MemRead_ex),//MemRead

.out(MemRead_mem)

);

flipflop#(.WIDTH(1))EX_MEM3(

.clk(clk),

.reset(reset),

.in(MemWrite_ex),//MemWrite

.out(MemWrite_mem)

);

flipflop#(.WIDTH(1))EX_MEM4(

.clk(clk),

.reset(reset),

.in(MemtoReg_ex),//MemtoReg

.out(MemtoReg_mem)

);

flipflop#(.WIDTH(1))EX_MEM5(

.clk(clk),

.reset(reset),

.in(Branch_ex),//Branch

.out(Branch_mem)

);

flipflop#(.WIDTH(32))EX_MEM6(//注意是32位!

.clk(clk),

.reset(reset),

.in(Branch_addr_ex),//Branch地址

.out(Branch_addr_mem)//注意这里送回IF级!

);

flipflop#(.WIDTH(32))EX_MEM7(

.clk(clk),

.reset(reset),

.in(alu_res_ex),//alu结果

.out(alu_res_mem)

);

flipflop#(.WIDTH(1))EX_MEM8(

.clk(clk),

.reset(reset),

.in(alu_zero_ex),//alu的零信号

.out(alu_zero_mem)

);

flipflop#(.WIDTH(32))EX_MEM9(

.clk(clk),

.reset(reset),

.in(RtData_ex),//RtData

.out(RtData_mem)

);

flipflop#(.WIDTH(5))EX_MEM10(

.clk(clk),

.reset(reset),

.in(RegWriteAddr_ex),//写回地址

.out(RegWriteAddr_mem)

);

 

/*MEM级*/

wire[31:0] Dout_mem;

 

MEM MEM(

.clk(clk),.MemRead_mem(MemRead_mem),.MemWrite_mem(MemWrite_mem),

.Branch_mem(Branch_mem),

.alu_zero_mem(alu_zero_mem),

.alu_res_mem(alu_res_mem),.RtData_mem(RtData_mem),

.branch_or_pc_mem(branch_or_pc_mem),.Dout_mem(Dout_mem)//注意信号要往回送,给IF

);

 

/*MEM-WB级间寄存器*/

wire[31:0] Dout_wb;

wire[31:0] alu_res_wb;

 

wire MemtoReg_wb;

flipflop#(.WIDTH(1))MEM_WB1(

.clk(clk),

.reset(reset),

.in(RegWrite_mem),//RegWrite

.out(RegWrite_wb)

);

flipflop#(.WIDTH(1))MEM_WB2(

.clk(clk),

.reset(reset),

.in(MemtoReg_mem),//MemtoReg

.out(MemtoReg_wb)

);

flipflop#(.WIDTH(32))MEM_WB3(//注意这里是32位

.clk(clk),

.reset(reset),

.in(Dout_mem),//Dout,RAM的输出

.out(Dout_wb)

);

flipflop#(.WIDTH(32))MEM_WB4(

.clk(clk),

.reset(reset),

.in(alu_res_mem),//alu的结果

.out(alu_res_wb)

);

flipflop#(.WIDTH(5))MEM_WB5(//注意是5位

.clk(clk),

.reset(reset),

.in(RegWriteAddr_mem),//RegWriteAddr

.out(RegWriteAddr_wb)

);

 

/*WB级*/

reg[31:0] reg_data_wb;

 

always@(*)begin

case(MemtoReg_wb)

1\'b0:reg_data_wb<=alu_res_wb;//来自ALU

1\'b1:reg_data_wb<=Dout_wb;//来自RAM

endcase

end

 

endmodule

    8) Debug

    The step is a tradition after finishing the writing the codes. Simply run the program again and again and eliminate the bugs.

    9) Test

        a. The content of RAM

        memory_initialization_radix=16;

        memory_initialization_vector=

00000000, 00000000, 00000000, 00000000, 00000000, 00000002,

00000002, 00000000, 00000000, 00000000, 00000000, 00000000,

00000000, 00000000, 00000000, 00000000, 00000000, 00000000,

00000000, 00000000, 00000001, 00000004, 00000000, 00000000;

        b. Instructions in ROM

        memory_initialization_radix=16;

        memory_initialization_vector=

8c010014,8c020015,00221820,00001020,

00232022,00642824,00853027,ac060016,

10c7fff8;

            

            The corresponding instructions are:

        c.

    10) Verify on the experiment box.

    

2. Data records

    1) Simulation

        The results for simulation are:

 

    From the picture we can see that:

    a. lw r1,$20(r0)

    The content of address 14H is 0, this instruction makes r1 get 2. And from the simulation result we can see that the output of ALU is 14H.

    b. lw r2,$21(r0)

    Notice that in address 15H, I put a 00000002 here. Therefore this instruction is supposed to make r2 2, but this does NOT happen until WB finishes. As a result, if any instruction trying to load the value of r2 before WB, the data hazard occurs!

    c. add r3,r1,r2

    Here comes the data hazard. Because r2 can\'t be written back until WB, the current value of r2 will still be 0, which leads the result of r1+r2 to 0. From the simulation result we can see that the result of ALU is 0 rather than 2, which proofs this point.

    d. add r2,r0,r0

    The output of ALU is 0.

    e. sub r4,r1,r3

    The output of ALU is still 0.

    f. and r5,r3,r4

    The output of ALU is still 0.

    g. nor r6,r4,r5

    Here because r4=0, r5=0, r6 should be ffffffff, as shown in the simulation diagram above.

    h. sw r6,$22(r0)

    Nothing special.

    i. beq r6,r7,-8

    The last instruction, notice that jump will NOT really happen.

 

    2)Pictures

    I took some pictures here, merely as a memorabilia.

 

四、实验结果分析

    Let\'s consider the test codes and test data:

        a. The content of RAM

        memory_initialization_radix=16;

        memory_initialization_vector=

00000000, 00000000, 00000000, 00000000, 00000000, 00000002,

00000002, 00000000, 00000000, 00000000, 00000000, 00000000,

00000000, 00000000, 00000000, 00000000, 00000000, 00000000,

00000000, 00000000, 00000001, 00000004, 00000000, 00000000;

        b. Instructions in ROM

        memory_initialization_radix=16;

        memory_initialization_vector=

8c010014,8c020015,00221820,00001020,

00232022,00642824,00853027,ac060016,

10c7fff8;

    We can proof that this is a pipeline CPU by observing the output of ALU. The idea is quite simple: we observe the output of ALU each time and compare it with our expectation. If any pipeline hazard occurs, the output will be different.

    Now let\'s analyses the experiment result instruction by instruction:

    a. lw r1,$20(r0)

    The content of address 14H is 0, this instruction makes r1 get 2. And from the simulation result we can see that the output of ALU is 14H.

    b. lw r2,$21(r0)

    Notice that in address 15H, I put a 00000002 here. Therefore this instruction is supposed to make r2 2, but this does NOT happen until WB finishes. As a result, if any instruction trying to load the value of r2 before WB, the data hazard occurs!

    c. add r3,r1,r2

    Here comes the data hazard. Because r2 can\'t be written back until WB, the current value of r2 will still be 0, which leads the result of r1+r2 to 0. From the simulation result we can see that the result of ALU is 0 rather than 2, which proofs this point.

    d. add r2,r0,r0

    The output of ALU is 0.

    e. sub r4,r1,r3

    The output of ALU is still 0.

    f. and r5,r3,r4

    The output of ALU is still 0.

    g. nor r6,r4,r5

    Here because r4=0, r5=0, r6 should be ffffffff, as shown in the simulation diagram above.

    h. sw r6,$22(r0)

    Nothing special.

    i. beq r6,r7,-8

    The last instruction, the output of ALU will be 0 (since r6=r7=0, 0-0=0).

 

五、讨论与心得

    The experiment is quite hard, which took me almost 3 weeks to complete. First, one should pay special attention to a questions: what need to be sent from the current stage to the next one? What\'s more, how to store them?

    For the first question, I got the answer from the schematic provided by the teacher. The answer to it is mentioned in the design principles. For the next question, I chose to use a flip-flop:

 

 

 

 

 

 

 

    When the positive edge of the clock rises, it receives the input of the current stage then pass it to the next stage. The design is elegant because the parameter WIDTH can be dynamically in top module (considering that we need to send different width of inputs to the flipflop).

    I also met some problems in this experiment, here\'re typical ones:

    1) The signal to DataRAM should be: ~MemRead_mem&MemWrite_mem.At the beginning I misunderstood the meaning of wea port (wea actually means write-enable) and sent a wrong signal to it, making the output of DataRAM always 00000000. After correction, an expected 00000002 could then be observed.

    2) The Anti-jitter module I used before didn\'t perform well. When I pressed the button, PC increased so quickly that I couldn\'t see any meaningful result from the LEDs. Latter I used a better Anti-jitter module. I deleted the original Anti-jitter module and regenerated a new symbol of it. Two original ports were deleted because the current Anti-jitter module is slightly different from the previous one. However these ports doesn\'t really matter.

    The new codes of Anti-jitter are shown below:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

    That\'s all of my report, thanks for your reading!