How to emulate a coin-op game on a "small memory" device
Intro
In a previous life I was paid to emulate coin-op games, lots of games. In total, I worked on more than 300 titles. The games were for closed source products and were unrelated to my open source projects and had nothing to do with more recognizable game emulators like MAME. They were sold in commercial game systems such as Chicago Gaming's "Ultimate Arcade 2", some mobile devices such as Microsoft's Pocket PC launch (PocketPak's) and Turner's GameTap system. I made my living by writing these game emulators for a period of about 10 years. Much of the code I wrote back then had to be very efficient because most of our platforms were embedded Arm devices. This same code can find new life on today's Arduino/embedded microcontrollers. As a starting project, I decided to port my Space Invaders emulator to run on moderately spec'd 32-bit MCUs. Taito's Space Invaders was one of the first raster graphics video games to exist and was composed of relatively simple hardware (by today's standards). That same hardware was copied by several competitors and used to publish other game titles in the late 70's and early 80's. The machine is composed of a 240x224 1-bit per pixel raster CRT display (oriented 90 degrees rotated), an Intel 8080 CPU (a compatible subset of the Zilog Z80) and a TI SN76477 + some analog circuitry for sound.
What is a game emulator?
The simplest way to explain game emulation is that the emulator creates a virtual environment that's sufficiently accurate to run the original game code unmodified. In other words, the emulator's job is to allow the original game code to behave as if it was running on its original hardware (generate video, audio and react to user input). It does this using nothing but software running on a completely different set of hardware. The actual means by which is does this is not very complex, but like most software, is made up of many small parts that all have to function correctly together. Let's look at the actual code of the Space Invaders emulator and see what's happening under the hood. The loop below is the main execution loop of the emulator. It executes each frame (60th of a second of activity) of the game by emulating the correct number of cycles of CPU execution followed by video and audio updates. There are other ways to do it, but this is probably the simplest to visualize. In the case of Space Invaders, the machine divides the display into top and bottom halves to prevent tearing/flicker of the moving objects. You can see that there are 2 time slices for each frame which generate different interrupts for the CPU.
while (!bUserAbort) {for (j=0; j<2; j++) {
iClocks = 2000000/120; // 2Mhz @ 60Hz executed in 2 phases
if (j & 1) {
ucIRQs |= INT_IRQ;
z80.ucIRQVal = 0xcf; /* RST 08 = 8080's IRQ */
} else {
ucIRQs |= INT_IRQ;
z80.ucIRQVal = 0xd7; /* RST 10 = 8080's NMI */
}
EXECZ80(&z80, emuh, &iClocks, &ucIRQs);
EMUPlaySamples();
} // for j
EMUScreenUpdate(lDirtyRect, NULL, 224, 240);
iFrame++; /* Count the number of frames */
} // while running
What changes were needed for embedded use?
If you're unfamiliar with MAME or other game emulator code, it generally uses a lot more memory than the original game had and requires a relatively fast machine to run the emulated games at their native speed. When working with Arduino's or other embedded microcontrollers, the capabilities of the target processor are usually the opposite - a very small memory size and relatively slow CPU. Some of the challenges with RAM usage in certain games just can't be overcome, but with a relatively modest challenge like Space Invaders, we can succeed in making it fit. These are the subsystems competing for RAM to emulate Space Invaders:
- Temporary storage to unzip the files containing the game- Z80 memory map (64K of address space)
- 224x240 bitmap for the playfield
- Any additional structures/variables needed to emulate the game
How Does it work?
What's not visible in the main loop above is the initialization and some of the additional activity that needs to occur during the CPU emulation. The hardware generating the video, audio and monitoring the user controls is connected to the CPU through I/O ports and memory mapped addresses. In order to emulate this in a generic way which allows the CPU emulator to be used for other purposes, a dynamic mechanism needs to exist to attach specific emulated hardware to specific port/memory addresses. The way I handle this (on small memory devices) is to have a set of read and write flag bits. Each bit represents 4k bytes of address range. For Z80 ports (IN/OUT instructions), I use a single function for the entire address range. To map the flag bits to the 64k Z80 address map, you would just divide the address by 4096 (shift right 12). To know if a specific address needs a special "handler", just test if the corresponding bit is set. In addition to the I/O emulation, this code needs to fit in a small memory device. I accomplish this by creating a sparse map of the Z80 64K address space by also dividing it into 4K pages and having an offset list pointing to each page. That way, the unused parts of memory don't have to be mapped to anything. Here's how the Z80WriteByte() function implements the sparse memory map and I/O logic:
void Z80WriteByte(unsigned short usAddr, unsigned char ucByte)
{
unsigned char c;
c = (unsigned char)(usAddr >> EMU_ADDR_SHIFT);
if (z80_writeflags & (1 << c)) { /* If special flag (ROM or hardware) */
*iRealClocks = iMyClocks; /* In case external hardware needs accurate timing */
(mem_handlers_z80[c].pfn_write)(usAddr, ucByte);
} else { /* Normal RAM, just write it and leave */
*(unsigned char *)((unsigned long)usAddr + z80_ulCPUOffs[c]) = ucByte;
}
} /* Z80WriteByte() */
The handler functions invoked by calling the various pfn_write pointers will check the address passed to them and then take appropriate action to emulate the hardware that corresponds to it. As indicated in the comment above, the write handler can also serve to prevent writes to addresses containing ROM. Some games purposely write into the ROM address range to thwart software emulation. Since an emulator will usually load the game code into RAM, this is a clever gotcha to prevent it from running.
Optimizing for Embedded
void InvadersVideoWrite(int usAddr, unsigned char ucByte)
{
int x, y, i;
if (ucRAM[usAddr - 0x2000] != ucByte){ // did it change?
ucRAM[usAddr - 0x2000] = ucByte;
usAddr -= 0x2400; // Subtract starting address of display RAM
x = usAddr >> 5;
y = 240 - ((usAddr & 0x1f) * 8);
if (y < 0 || x > 223)
return; // Not visible
i = (y >> 4); // Horizontal bands of 16 rows each
lDirtyRect |= (1 << i); // Mark this section as 'dirty'
}
} /* InvadersVideoWrite() */
Hacks / Modifications of the code
Wasted CPU cycles in the original game are also a good area to investigate for optimizing your emulator. Most coin-op game authors were not concerned with power savings because the games were powered from wall current. Coin-op game code was written to fit within the capabilities of the CPU. If a particular game has a very complex section of gameplay that stresses the CPU, that 'worst case' code must run not cause any additional latency or the user will notice. This also means that the average game activity is not stressing the CPU. Because of this situation, there will be lulls in CPU activity for the much of the game. Game authors usually used the vertical blank interval (VBlank) to set the pace of the game. If the work that the game had to accomplish was finished before the VBlank period was over, they would often make the CPU sit in a time-wasting loop to wait for the VBlank interrupt to start the next frame. These time wasting loops can represent a large percentage of the total CPU execution time. I've experimented with ways of exploiting this fact in my various emulators and have come up with two solutions - automatic and manual.
Automatic Shortcuts
The code below shows one of the optional 'hacks' that I use to shortcut these time wasting loops. A condition branch to a few bytes behind the current Program Counter usually means that no actual work is occurring and it's just counting down to waste cycles.
case 0xC2: // JP NZ,(nn)
if (!(regs->ucRegF & F_ZERO)) {
usAddr = p[1] + (p[2] << 8);
iMyClocks -= 5;
#ifdef SPEED_HACKS
if (PC - usAddr <= 3) // time wasting loop
iMyClocks = 0;
#endif // SPEED_HACKS
PC = usAddr;
} else
PC += 2;
break;
Manual Shortcuts
Another way to prevent time wasting loops from executing is to patch the game code to disable them. The risk in doing this is that game authors started to put self-checks into the code to make sure it hadn't been modified/pirated. The check was usually done as a simple checksum of the game code at power up. It's usually safe to patch the code after the self-test has completed. Some of the more clever game authors would be sneaky about the mode of failure when a checksum error occurred. Instead of making the game display an error or refuse to run, they would have the game purposely misbehave in unexpected ways. The emulator author would then potentially chase these bugs instead of realizing that the checksum had failed. To patch Z80 games, I usually replace the start of the time wasting loop with a HALT instruction. This instruction causes the CPU to wait for an interrupt before continuing. My implementation of the HALT instruction simply sets the remaining cycles to 0 which causes the current frame's time to end immediately. Each game may have multiple time wasting loops that each need to be discovered separately. A simple way to find them is to see what the value of the Program Counter is right before the next VBlank interrupt occurs. For Space Invaders, this value is 0x0A93. Here's the code at that address:
0x0A9E: LD A,(0x20C0) // A = 0x07
0x0AA1: DEC A
0x0AA2: JP NZ, (0x0A9E)
This code can execute thousands of times per frame depending on the game activity. On a different CPU, this code could cause problems because after the interrupt service routine executed, it would return to this loop. This would force all of the game logic to execute within the interrupt service routine. On the 8080/Z80, the interrupts can be configured to just jump to a specific address without stacking the current program counter. In the case of Space Invaders, it just jumps to the start of the logic for running the top or bottom half of the current frame.
Bringing it all together
In the video below, I have the emulator running on an Adafruit PyPortal - (120Mhz Cortex-M4 with 192K of RAM and an ILI9341 240x320 LCD display). For Space Invaders, this is more than enough CPU power and RAM to run my emulator at 60fps. I haven't hooked up sound or the user controls yet, but I'll cover those in my next blog post...
I've shared the current code with a Github Gist here:
My Github Sponsors Page
"I made my living by writing these game emulators for a period of about 10 years." I'm envious of your life.
ReplyDelete