← Back to Test Report

AprNes Testing Methodology

NES emulator accuracy verification through hardware test ROMs

174
Test ROMs
30+
Test Suites
5
Subsystems
<3 min
Full Run

1. What We Test

AprNes uses hardware verification test ROMs originally written to validate real NES console behavior. These are actual NES programs (6502 machine code) that exercise specific hardware features and report pass/fail. The same ROMs are used by every major NES emulator project to measure accuracy.

Test coverage spans all major NES subsystems:

2. Test ROM Sources

All ROMs come from the nes-test-roms collection, primarily authored by:

These are the same ROMs used by Mesen, Nestopia, FCEUX, and other reference emulators. Passing them indicates cycle-accurate or near-cycle-accurate emulation.

3. Test Runner Architecture

Bash Script
run_tests_report.sh
MSBuild
Compile emulator
Headless Emulator
TestRunner.cs
Result Detection
$6000 / Screen scan
HTML Report
JSON + Screenshots

Headless Mode

The emulator has a built-in TestRunner.cs that runs in headless mode — no window, no audio, no frame rate limiter. The CPU/PPU/APU all tick at maximum speed. A single test ROM typically completes in under 1 second.

Orchestration Script

A bash script (run_tests_report.sh) handles the full pipeline:

  1. Build the project with MSBuild
  2. Run each of 174 ROM files through the headless emulator
  3. Capture the final frame as PNG, convert to lossless WebP
  4. Collect results into a JSON array
  5. Generate a single-file HTML report with embedded data

4. Result Detection Mechanisms

Mechanism A: $6000 Memory Protocol

Modern blargg test ROMs use a memory-mapped status protocol. The test runner polls address $6000 every frame:

$6000 ValueMeaningAction
$80Test runningContinue waiting
$81Reset requestedWait 100ms, then soft reset
$00Test passedExit with code 0
$01-$7FTest failed (error code N)Exit with code N

Result text is read from $6004+ as null-terminated ASCII. This gives detailed error messages like "Flag first set too late" or "Length counter not clocked correctly".

Mechanism B: Screen Stability Detection

Older blargg tests (2005 era) don't use the $6000 protocol. They render results directly to the PPU nametable. The test runner handles these with a multi-step heuristic:

  1. After 120 frames (~2 sec), start sampling the screen buffer every frame
  2. Compute a hash of the framebuffer (sampling every 37th pixel for speed)
  3. When the hash stays identical for 90 consecutive frames (~1.5 sec), the screen is "stable"
  4. Scan the PPU nametable (character map) for known result strings:
    • "Passed" / "PASSED" → PASS
    • "Failed" / "FAILED" → FAIL
    • "$01" (hex on screen) → PASS
    • "$02" ~ "$FF" (hex on screen) → FAIL
    • "All tests complete" → PASS
    • " 0/" (zero error count) → PASS

This approach reads the PPU nametable directly (not OCR on pixels), making it fast and reliable.

5. Automation Features

Auto Soft Reset

Some test ROMs write $81 to $6000 to request a console reset (testing power-on/reset behavior). The runner detects this and automatically performs a soft reset after a 100ms delay, mimicking a human pressing the reset button. Supports up to 10 sequential resets per ROM.

Simulated Controller Input

Controller read tests need actual button presses. The --input parameter schedules timed button events:

--input "A:2.0,B:4.0,Select:6.0,Start:8.0,Up:10.0,Down:12.0,Left:14.0,Right:16.0"

Each button is pressed at the specified time (seconds) and held for 10 frames (~166ms). This lets tests like read_joy3/test_buttons verify that all 8 buttons are correctly detected in sequence.

Screenshot Capture

The final frame of each test is captured as a 256x240 PNG, then converted to lossless WebP (typically 60-80% smaller). Screenshots serve as visual evidence — many test ROMs display their results on screen as text, showing exactly what passed or failed.

Timeout Safety

Each ROM has a configurable --max-wait timeout (default 30 seconds, 120 for longer tests). If a test ROM enters an infinite loop or hangs, the runner terminates it gracefully and reports the last known state.

6. Test Suite Coverage

4apu_mixer — Channel mixing
6apu_reset — APU power/reset
9apu_test — APU frame counter
11blargg_apu_2005 — APU timing
2blargg_cpu_test5 — CPU instructions
5blargg_ppu_tests — PPU basics
3branch_timing — Branch cycle count
1cpu_dummy_reads — Dummy read cycles
2cpu_dummy_writes — Dummy write cycles
2cpu_exec_space — Execution from I/O
6cpu_interrupts_v2 — NMI/IRQ interaction
2cpu_reset — CPU reset behavior
1cpu_timing_test6 — Instruction timing
5dmc_dma_during_read — DMC DMA conflicts
5instr_misc — Misc instruction tests
17instr_test-v3 — All 6502 instructions
18instr_test-v5 — All 6502 instructions (v5)
3instr_timing — Instruction cycle timing
6mmc3_irq_tests — MMC3 IRQ counter
6mmc3_test — MMC3 behavior
6mmc3_test_2 — MMC3 behavior (v2)
11nes_instr_test — CPU instructions (alt)
1oam_read — OAM read behavior
1ppu_open_bus — PPU open bus
1ppu_read_buffer — PPU read buffer
11ppu_vbl_nmi — VBlank/NMI timing
4read_joy3 — Controller reading
2sprdma_and_dmc_dma — DMA conflicts
11sprite_hit_tests — Sprite 0 hit
5sprite_overflow — Sprite overflow
7vbl_nmi_timing — VBL/NMI timing

7. Command-Line Interface

The headless test runner is invoked directly via the emulator executable:

AprNes.exe --rom <file.nes> [options]
OptionDescription
--rom <path>ROM file to load (required)
--wait-resultMonitor $6000 / screen for test result
--max-wait <sec>Timeout in seconds (default: 30)
--time <sec>Run for exactly N seconds, then stop
--screenshot <path>Save final frame as PNG
--log <path>Write result line to file
--soft-reset <sec>Trigger soft reset at N seconds
--input <spec>Schedule button presses (e.g. "A:2.0,B:4.0")
--debug-log <path>Write CPU trace log

Exit codes: 0 = pass, 1-127 = fail (test error code), 255 = timeout/no result.