Brief Introduction to Sound Engines
Music of the Past
In games for consoles before the advent of computers that could play audio files, music and sound needed to be stored as a series of instructions the game would interpret and write to memory addresses the console would feed to its audio synthesis unit. I’ll be honest, you probably knew that, if you’re here.
With some consoles, it is very easy to produce a file that contains nothing from the game except for the music data. It can be emulated on modern hardware, and with such a file, you can listen to the music exactly as it plays in the game. For the NES, that file is called an NSF. Keep that in mind.
Sound engines for older consoles were not like modern flexible software. Their architectural decisions very much informed the kind of music that could be made, and especially the kind of sound design that could be used. With modern chiptune software, you can make music at virtually any tempo, with virtually any pitches, and place notes anywhere.
Not all of that is necessarily true with these older games’ sound engines. Many of them don’t even have a concept of tempo; instead they require you to define the exact length of every note in frames. Some of them restrict the number of pitches you can use even more than the sound chip already does.
Composers often brought sound engines with them between companies, especially if they worked freelance. Some developers had their own in-house sound engines. It was up in the air, and unlike the SNES, Nintendo did not provide a sound engine to licensed developers for the NES, so they were forced to create or license their own.
If you listen to enough chiptune, you start to be able to pick out certain things by ear. You might start recognizing certain styles of instrument, or you might notice when a certain pattern is repeating a lot, and you can make deductions about the abilities of the sound engine that is playing back the music. Let’s talk about Rambo.
Brief Introduction to Rambo
Music of the Sylvester Stallone
Rambo for the NES is a game. I think that’s all that’s necessary for this article. The relevant thing is that said game has a soundtrack I happen to be a fan of. Here’s a link to the soundtrack so you can listen for yourself. Any good reversing project starts with being familiar with what you’re trying to reverse.
As much as I love this music, a few things stick out after hearing every song. Namely, there’s only one instrument being used on the pulse channels. Does that necessarily mean that the sound engine has no ability to change the instrument of the pulse channel? No. It could very well be a compositional choice.
But, given that this game is a licensed game released in the US by Acclaim, it’s probably safe to say that developer Pack-In-Video was contracted based on their price more so than their skill. To be clear, that doesn’t mean they were bad developers! I’d argue they were quite good even with licensed titles. Acclaim was just not known for outstanding quality among their game library.
Anyway, even with the few inferences I’ve made based on the sound design alone, I don’t think I was quite prepared for how primitive this sound engine actually is.
Getting Into It
(and by "it", I mean the game)
If you’ve ever watched the YouTube channel Displaced Gamers, whose skill at reverse engineering and video production I can only hope to achieve one day, you might be familiar with the multi-system emulator Mesen. It’s quite good, and features a whole host of useful debugging tools that make reversing not as difficult as it seems. In particular, we want to set our sights on the Memory Viewer.
When opening up the NSF file, we can see it load into memory the same way the ROM of the game would, except for that everything here is just the music and nothing else. This is basically a hex editor that represents the entire console’s memory. Isn’t it crazy how we can see all of this nowadays? Technology really is amazing.
Certain bytes are displayed with different colors, and the color depends on whether that address has been read from, written to, or executed as code on the current frame. Many emulators do this. Mesen is not unique in this regard. It’s the one I have installed, though. We really only care about the blue ones, as those are the ones that have been read from.
Basically, what we’re looking for is a series of bytes that is getting read from in time with the music. Once we find that, we can be pretty sure that that’s music data.
Look at that! These sections are getting read from in sequence in a way that follows the music. If we pause the emulator, the reading pauses too. This is the music data for this track. It's at 34A8.
Sequence Data
The Easy Stuff
There’s a lot of patterns here. There’s a relatively high byte, followed by a super low byte, and another super low byte, and then back to another relatively high byte. A lot of repetition of the number 06 here, too.
In order to figure out what everything does, we’re first going to pause and restart the song, so we know where the start is and we don’t have to wait too long to hear changes. Let’s change this 4A to 3A. You generally only want to change one nybble (that’s half of a byte) at a time. Incremental is the name of the game here, in many ways.
Now the first note of the song is lower. Clearly, the first byte is a note. Undo this change, and write down what we’ve discovered. Change this 06 to 07.
Now the first note of the song takes a little longer to finish. This byte represents the note length. Undo this change, and write down what we’ve discovered. Finally, change this 04 to 05.
Now the first note of the song releases after a little longer. This one may be hard to hear, and if you struggle to hear any incremental change, keep changing it a little at a time until it does something you can hear. By changing it to 06, it now doesn’t release before the next note of the song.
The next three bytes, as you might notice if you try changing them, follow the same format. So do the next three. And the next three. And so on, and so forth, until this FE byte is reached and the song loops.
This is, well, an extremely unsophisticated format. Each note is made up of three bytes, and there aren't any commands that will set any kind of continuous effect, like, I don’t know, a repeated note length or release time. As you can imagine, this takes up a lot of space. Many sound engines provide the ability to loop a section a given number of times before continuing on with the rest of the song. Imagine you have a drumline like this:
You could store it as written, but that’s not efficient. Instead, imagine you could store it like this:
Obviously, these kinds of written notes are not typically found in sheet music, but I’m sure you can figure out what I mean here. You do often see a repeat symbol in sheet music, though.
This is common, and it simply means to play the section that these symbols bookend twice. Instead of computer memory, though, it saves paper.
The ability to repeat the same data multiple times can cut down drastically on space taken up by music data. A lot of music is the same thing many times consecutively, and this game’s soundtrack is no exception. It’s fascinating, however, that there seems to be no loop commands to speak of beyond an FF 00 00 (it’s three bytes because the engine reads three bytes at a time) and FE to go back to the note after the FF 00 00. If the engine has a more granular looping feature, it certainly isn’t utilized. At the same time, this also means this music data is incredibly easy to modify. I’ll bet you could transcribe this entire song just with the information contained in the prior 7 paragraphs.
Before we talk about where the note tables and instrument data are stored, though, a brief aside about the noise channel format. Since the only instrument available to the noise channel is a drum that fades out, this channel uses only two bytes per note: the first one representing the drum sound to play (there’s several), and the second representing the note length. The loop point commands (FF, FE) are the same.
Data Tables
The Hard Stuff
Identifying data tables is much easier with just the NSF than it is with the ROM. (It might surprise you to learn that games use tables of data for a lot more than just music!) We can use the same technique of looking for bytes that are being read. However, this time we don’t have to chase them down as the song is playing, since oftentimes, music will read from the same piece of data many times.
There are three main types of tables to look for that are common between most games regardless of their engine, all with different characteristics and patterns. The first one, and by far the most common, is the address table. The engine needs to know where music data is stored, and this is best accomplished with a list of the addresses.
Since a byte stores 8 bits, and the memory of the NES is large enough to require 16 bits to represent an address, address tables are stored as pairs of bytes, often with the most significant byte last. This is known as “little-endian”. The opposite of this is “big-endian”, where the most significant byte is in front. The only sound engine I’ve seen use this is Capcom’s.
I will list data based on where it can be located using a hex editor and the NSF. I will also include at least the first 8 bytes of each table, so you can search for it in either the NSF file or the ROM using either a hex editor or an emulator. For instance, the address table is located at 8252, and its first 8 bytes are as follows:
FC BB 5D BC CD BC 43 BD [...]
There’s not a really good technique to identify this just by sight. The easiest way I’ve found is to pause the emulation on the first frame of a song, and find an area where several pairs of bytes are being read. If you skip to the next song, and find that different pairs of bytes in the same block are similarly being read on the first frame of a song, that’s probably the address table.
Musically, this area is uninteresting. Technically, this area is important to know about if you’re modifying the music. If your new music is not the exact same length in bytes as the old music, you’ll need to replace the entries in the address table with the new ones. This is the case with every game.
The second important type of table is the instrument table. If a game has any more complex sound design than can be accomplished with the NES’ built-in envelope generator, it has to define instruments somewhere. I happened to already know what the volume macro of Rambo’s single pulse channel instrument was due to having looked at the title theme with NSFImport before, but I happened upon it by chance. (Note the blue volume column.)
It’s at the end of a big block of 00s, starting at B547. Its volume macro is as follows:
09 0A 0B FF 01 01 01 04 04 04 04 04 04 04 04 03 03 03 03 03 03 03 03 02 02 02 02 02 02 02 02 01 01 01 01 01 01 01 01 00
The whole sequence is reproduced here since it’s just the one instrument.
We already know that this instrument has the capability to release. The FF here at the start should tell us that everything after this only plays when the note is released. I didn’t need to change any bytes to confirm that this is it, because it matches exactly with known data, and because it is constantly being read.
But, this game also has drums! Where are the drums? This one was a little harder to figure out. First, we can look at the noise sequence data, and find that instead of a pitch, it is using single numbers as its notes. Now, the numbers it’s using don’t match with what those noise pitches are on the NES.
03 60 03 18 03 0C 00 24 [...]
The snare sound here, which is the one represented as 7-# in FamiTracker, is 03, and the hi-hat, which is F-# in FamiTracker, is 00. Those are more than 4 places from each other. The logical assumption here is that these are indexes into another table. Whether that table stores particular noise pitches or unique instruments is yet to be determined, but we know we’re looking for a table.
Indeed, we find our table right at the end of the NSF file, at B900. Once again, this was found by looking for areas of memory being read in sequence with the music. The whole table is as follows:
00 00 00 00
01 00 03 01
01 00 03 01
02 00 08 08
03 00 03 01
00 00 00 00
00 00 00 00
01 00 03 01
01 00 03 01
4 bytes are being read at a time, which tells us that each drum is defined with 4 values.
One way to figure out how data is being interpreted is to disassemble the 6502 assembly code. The much simpler way to figure it out, without having to be familiar with 6502 assembly code, is the previously described method of “changing the bytes until you notice something different”.
Via this method, we can find out that the first byte in each definition represents how long the sound takes to fade out. This makes sense. The shorter drums have 00, and the longer ones have 01. I have not figured out what the second byte does.
The third byte does two things. The high nybble, which is 0 in every definition here, controls whether or not the drum uses the NES’ periodic noise mode. If it’s higher than 7, it does; otherwise, it does not. The low nybble controls the pitch of the drum. It goes in the opposite direction of FamiTracker. Lower values are higher noise pitches, and vice versa.
Finally, the fourth byte seems to have something to do with the length of the drum sound. If it is any value other than 08, the drum is much shorter. I have not figured out how this works, unfortunately. I also have not figured out how the engine stores vibrato, but I’m willing to bet it’s controlled by code and not a table. If it’s not that, then I apologize.
The third important type of table is the pitch table. The way the NES works is that, instead of just writing a pitch to the audio registers, you write a period, as in a period of time. This is represented as an 11-bit number, and is tied to the clock speed of the console to generate a given frequency.
Because the value is 11 bits, it requires 2 bytes to control it. This means that, for the most part, the pitch table is going to look a lot like the address table, as in it’s going to be a series of pairs of bytes. This table is located at B890, and its first 8 bytes are as follows:
A7 02 81 02 5D 02 3B 02
It appears to be a pretty limited pitch table, and it is also stored in little-endian format. The lowest pitch is A7 02 (approximately E-2 in FamiTracker), represented by the value 34 in sequence data. The highest is 47 00 (approximately G-5 in FamiTracker), represented by 5B. Other games I’ve researched split the note byte into an octave/pitch pair. This one uses it as a raw index into this table. This is just speculation, but what this tells me is that this music data was generated from a series of macros.
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
A7 02 81 02 5D 02 3B 02 1B 02 FD 01 E0 01 C5 01
AC 01 94 01 7D 01 68 01 53 01 40 01 2E 01 1D 01
0D 01 FE 00 F0 00 E3 00 D6 00 CA 00 BE 00 B4 00
AA 00 A0 00 97 00 8F 00 87 00 7F 00 78 00 71 00
6B 00 65 00 5F 00 00 00 55 00 50 00 4C 00 47 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Something else worth noting is that the surrounding values of the pitch table are all 00s, but can still be accessed from note values. I’m not sure why they didn’t fill in the whole note range, as one of them (57) is also a 00 pair. Maybe they just only filled in ones that were used by the music? Kind of strange, but it’s not like this affects the final game, so it’s just an interesting aside.
Wrapping Up
Thanks for reading
I’ve looked into several other games’ sound engines, but this is by far the simplest I’ve seen. I also have not heard it used in any other NES games, unlike most other engines. I guess the biggest lesson from this all is that if you’re smart at composition, you don’t really need to get super fancy with it. I certainly didn’t notice the lack of instrumental variety in this soundtrack until I started inspecting the data format.
And, anyway, if you ever decide to hack Rambo, maybe the music format being this simple is a benefit.
No comments:
Post a Comment