32-bit RISC-V CPU from Logic Gates
A complete 2-stage pipelined RV32I processor built from first principles in Logisim — ALU, register file, control unit, and memory system, all hand-wired from basic gates.

The Vision: Build a complete 32-bit RISC-V processor from absolutely nothing — no high-level modules, no black boxes, just logic gates, multiplexers, and flip-flops. Every component designed to understand how modern CPUs actually work at the transistor level.
The Challenge: Implementing a real instruction set architecture that can run actual programs while maintaining the educational transparency of seeing every logic gate. Balance complexity with comprehensibility, performance with clarity.
CPU Architecture Overview
This processor implements the RV32I base integer instruction set — the minimal RISC-V configuration capable of running real software. The design balances educational clarity with functional completeness.
Datapath Design Philosophy
Rather than optimizing for maximum performance, the architecture prioritizes understanding: every data path, control signal, and timing relationship is visible and traceable. The 2-stage pipeline provides instruction-level parallelism while remaining comprehensible.
Key Design Decisions:
- Harvard Architecture: Separate instruction and data memories eliminate structural hazards
- Single-Cycle Execute: Complex operations complete in one cycle, simplifying control logic
- Explicit Control Signals: No microcode — all control logic implemented as combinational circuits

Complete CPU datapath showing instruction fetch, decode, execute, and writeback stages

Arithmetic Logic Unit (ALU)
The ALU implements 13 operations covering all RV32I arithmetic and logical instructions:
Arithmetic Operations
Addition/Subtraction: 32-bit ripple-carry adder with configurable inversion for subtraction. Overflow detection for signed operations, though RV32I ignores overflow exceptions.
Multiplication Suite: Three multiply operations (MUL, MULH, MULHSU, MULHU) implemented using array multipliers. Each multiplier consists of 32×32 partial product generators feeding a Wallace tree reduction network.
Technical Challenge: 64-bit multiply results require careful handling. MULH operations return the upper 32 bits, requiring separate sign extension logic for signed vs unsigned operands.
Logical & Shift Operations
Bitwise Logic: AND, OR, XOR implemented as 32-bit parallel gate arrays
Shifts: Barrel shifter design enables any shift amount (0-31) in single cycle. Left shifts (SLL) and arithmetic/logical right shifts (SRA/SRL) use different sign-fill logic.
Barrel Shifter Architecture: 5-level shift network where each level can shift by 2^n positions (1, 2, 4, 8, 16). Control signals select which levels are active, composing to any desired shift amount.
Comparison Operations: Set-less-than operations leverage subtraction circuits with overflow-aware result interpretation for signed comparisons and magnitude-based logic for unsigned operations.
Register File Architecture
The register file provides the central storage for the CPU’s architectural state:
Storage Design
32 × 32-bit registers implemented using D flip-flop arrays. Register x0 is hardwired to zero through explicit gating logic — writes to x0 are ignored, reads always return 0.
Port Configuration:
- 2 Read Ports: Enables simultaneous access to both source operands
- 1 Write Port: Simplifies control logic, eliminates write conflicts
- Bypass Network: Internal forwarding paths reduce pipeline hazards
Technical Implementation Details
Address Decoding: 5-bit register addresses decoded using 5-to-32 decoders. Write enable signals gated with clock to prevent corruption during address changes.
Read Logic: Dual 32-to-1 multiplexers select appropriate register contents based on rs1 and rs2 fields. Zero detection logic hardwires register 0 outputs.
Write Logic: Write data routed through demultiplexer controlled by rd field. Clock gating ensures writes only occur when RegWrite control signal asserted.
Control Unit & Instruction Decode
The control unit orchestrates all CPU operations through pure combinational logic — no microcode or state machines.
Instruction Format Decoding
RISC-V uses 6 instruction formats (R, I, S, B, U, J) with different immediate field layouts. The immediate generator extracts and sign-extends these fields appropriately:
I-Type: 12-bit signed immediate for arithmetic and loads
S-Type: 12-bit signed immediate for stores (split across two fields)
B-Type: 12-bit signed branch offset (with implicit bit 0 = 0)
U-Type: 20-bit immediate for upper immediate instructions
J-Type: 20-bit signed jump offset with complex bit reordering
Control Signal Generation
The control unit generates 15 distinct control signals based on instruction opcode and function fields:
ALUSel[3:0]: ALU operation selection
ImmSel[2:0]: Immediate generator format
RegWrite: Register file write enable
MemRead/Write: Memory operation enables
Branch: Branch instruction indicator
Jump: Unconditional jump enable
PCWrite: Program counter update enable
ROM-Based Implementation: Control logic implemented as 128×15 ROM with opcode+funct3+funct7 as address. This approach makes control modifications easy while keeping critical path short.
Memory System Design
Instruction Memory (IMEM)
Harvard architecture uses separate instruction memory to eliminate structural hazards. 32-bit wide reads provide one complete instruction per cycle.
Address Translation: PC addresses are word-aligned (bottom 2 bits ignored). IMEM organized as array of 32-bit words with simple address decoding.
Data Memory (DMEM)
Supports byte, halfword, and word accesses as required by RV32I. Load/store unit handles address alignment and data formatting.
Load Operations: Sign-extension logic for LB/LH instructions, zero-extension for LBU/LHU. Byte-lane selection based on address bits [1:0].
Store Operations: Write enable generation for appropriate byte lanes. SB writes 1 byte, SH writes 2 bytes, SW writes 4 bytes.
Pipeline Implementation & Hazard Handling
2-Stage Pipeline Design
Stage 1: Instruction Fetch + Decode
- Fetch instruction from IMEM at PC address
- Decode instruction format and generate control signals
- Read source registers from register file
Stage 2: Execute + Writeback
- Perform ALU operation or memory access
- Update program counter based on branch/jump conditions
- Write results back to register file
Hazard Management
Data Hazards: Minimized by single-cycle execution model. Most instructions complete before next instruction needs their results.
Control Hazards: Branch prediction not implemented — simple stall-and-flush approach. Branch target computed in execute stage, with one-cycle penalty for taken branches.
Structural Hazards: Eliminated by Harvard architecture and dual-port register file design.
Testing & Validation Methodology
Instruction-Level Testing
Built comprehensive test suite covering every RV32I instruction:
Arithmetic Tests: Edge cases for overflow, zero operands, maximum values
Logic Tests: All bit patterns, shift amounts including edge cases
Branch Tests: All comparison types with boundary conditions
Memory Tests: Aligned/unaligned access, byte/half/word operations
Program-Level Validation
Fibonacci Sequence: Tests arithmetic, branches, and loops
Sorting Algorithm: Validates memory operations and complex control flow
Prime Number Sieve: Exercises all instruction types in realistic computation
RISC-V Compliance: Validated against official RV32I compliance tests to ensure architectural correctness.
Performance Analysis
Clock Frequency: Achieves 50MHz in Logisim simulation (limited by tool, not design) CPI (Cycles Per Instruction): 2.0 average due to pipeline structure Instruction Mix: Optimized for typical compiled C code patterns
Advanced Features & Optimizations
Branch Prediction Experiments
Implemented simple 1-bit branch predictor as optional enhancement:
- Single flip-flop per branch instruction stores last outcome
- ~65% prediction accuracy on typical code
- Reduces average branch penalty from 1.0 to 0.35 cycles
Forwarding Network
Added data forwarding paths to reduce pipeline stalls:
- ALU result forwarded directly to next instruction
- Memory load data bypassed to dependent instructions
- Reduces effective CPI from 2.0 to 1.4 on complex programs
Computer Architecture Insights
RISC Design Principles Validated
Building this CPU confirmed several RISC architecture principles:
Regularity Simplifies Control: Fixed 32-bit instruction format makes decoding straightforward, unlike variable-length x86 instructions.
Load/Store Architecture: Separating memory access from arithmetic reduces ALU complexity and enables better pipeline design.
Orthogonal Instruction Set: Independent opcode, format, and addressing modes minimize control logic complexity.
Hardware/Software Trade-offs
Simple Hardware, Complex Compiler: RISC-V pushes complexity to software (compiler), resulting in cleaner, faster hardware implementation.
Pipeline Depth vs Complexity: 2-stage pipeline provides good performance/complexity balance. Deeper pipelines require exponentially more hazard logic.
Real-World CPU Design Lessons
Control Logic Dominates Area: ~60% of the CPU gates are in control unit and decode logic, not arithmetic units.
Memory System Is Critical: Even with perfect CPU, memory latency limits performance. Real CPUs spend enormous effort on cache hierarchies.
Extensions & Future Work
Performance Enhancements
- Superscalar Execution: Issue multiple independent instructions per cycle
- Out-of-Order Execution: Dynamic instruction scheduling around hazards
- Branch Target Buffer: Hardware prediction for indirect jumps
Architectural Extensions
- RV32M Extension: Hardware divide unit and 64-bit multiply support
- Privilege Modes: User/supervisor modes with memory protection
- Interrupt Handling: Precise exception support with context switching
Educational Improvements
- Visualization Tools: Real-time display of pipeline state and data flow
- Performance Counters: Hardware monitoring of instruction mix, hazards, cache misses
- Debugging Interface: Single-step execution with register/memory examination
Real-World Impact & Applications
This project provides foundational understanding crucial for several career paths:
CPU Design Engineering
Understanding instruction decode, pipeline hazards, and control logic is essential for working on real processor designs at companies like Intel, AMD, ARM, or RISC-V startups.
Compiler Optimization
Knowledge of CPU internals enables better compiler design — understanding pipeline hazards, instruction latencies, and architectural limitations.
Systems Performance
Applications developers benefit from CPU architecture knowledge when optimizing performance-critical code, understanding cache behavior, or debugging performance issues.
FPGA & Hardware Acceleration
The skills demonstrated here translate directly to FPGA development, custom accelerator design, and hardware/software co-design projects.
This CPU demonstrates that modern computer architecture is understandable when approached systematically. While real processors have thousands of optimizations, the fundamental principles — instruction decode, arithmetic units, control flow, memory systems — remain the same. Building this foundation enables tackling much more complex hardware design challenges.