NES emulator accuracy verification through hardware test ROMs
AprNes uses hardware verification test ROMs originally written to validate real NES console behavior. These are actual NES programs (6502 machine code) that exercise specific hardware features and report pass/fail. The same ROMs are used by every major NES emulator project to measure accuracy.
Test coverage spans all major NES subsystems:
All ROMs come from the nes-test-roms collection, primarily authored by:
These are the same ROMs used by Mesen, Nestopia, FCEUX, and other reference emulators. Passing them indicates cycle-accurate or near-cycle-accurate emulation.
The emulator has a built-in TestRunner.cs that runs in headless mode — no window, no audio, no frame rate limiter. The CPU/PPU/APU all tick at maximum speed. A single test ROM typically completes in under 1 second.
A bash script (run_tests_report.sh) handles the full pipeline:
Modern blargg test ROMs use a memory-mapped status protocol. The test runner polls address $6000 every frame:
| $6000 Value | Meaning | Action |
|---|---|---|
$80 | Test running | Continue waiting |
$81 | Reset requested | Wait 100ms, then soft reset |
$00 | Test passed | Exit with code 0 |
$01-$7F | Test failed (error code N) | Exit with code N |
Result text is read from $6004+ as null-terminated ASCII. This gives detailed error messages like "Flag first set too late" or "Length counter not clocked correctly".
Older blargg tests (2005 era) don't use the $6000 protocol. They render results directly to the PPU nametable. The test runner handles these with a multi-step heuristic:
"Passed" / "PASSED" → PASS"Failed" / "FAILED" → FAIL"$01" (hex on screen) → PASS"$02" ~ "$FF" (hex on screen) → FAIL"All tests complete" → PASS" 0/" (zero error count) → PASSThis approach reads the PPU nametable directly (not OCR on pixels), making it fast and reliable.
Some test ROMs write $81 to $6000 to request a console reset (testing power-on/reset behavior). The runner detects this and automatically performs a soft reset after a 100ms delay, mimicking a human pressing the reset button. Supports up to 10 sequential resets per ROM.
Controller read tests need actual button presses. The --input parameter schedules timed button events:
--input "A:2.0,B:4.0,Select:6.0,Start:8.0,Up:10.0,Down:12.0,Left:14.0,Right:16.0"
Each button is pressed at the specified time (seconds) and held for 10 frames (~166ms). This lets tests like read_joy3/test_buttons verify that all 8 buttons are correctly detected in sequence.
The final frame of each test is captured as a 256x240 PNG, then converted to lossless WebP (typically 60-80% smaller). Screenshots serve as visual evidence — many test ROMs display their results on screen as text, showing exactly what passed or failed.
Each ROM has a configurable --max-wait timeout (default 30 seconds, 120 for longer tests). If a test ROM enters an infinite loop or hangs, the runner terminates it gracefully and reports the last known state.
The headless test runner is invoked directly via the emulator executable:
AprNes.exe --rom <file.nes> [options]
| Option | Description |
|---|---|
--rom <path> | ROM file to load (required) |
--wait-result | Monitor $6000 / screen for test result |
--max-wait <sec> | Timeout in seconds (default: 30) |
--time <sec> | Run for exactly N seconds, then stop |
--screenshot <path> | Save final frame as PNG |
--log <path> | Write result line to file |
--soft-reset <sec> | Trigger soft reset at N seconds |
--input <spec> | Schedule button presses (e.g. "A:2.0,B:4.0") |
--debug-log <path> | Write CPU trace log |
Exit codes: 0 = pass, 1-127 = fail (test error code), 255 = timeout/no result.