This is the second attempt at writing a post on this subject. The first was largely focused around an 8-bit CPU emulator I had written, or am writing, depending on my mood at the time. I decided to start again in a more general way, but still using it for examples.
Writing software is hard. When you begin, you have this idea, you might even have an idea how things might fit together. Generally those ideas are wrong. Don’t get me wrong, as I’m getting more experience, more and more of my up-front ideas are proving correct, but for all but the most trivial projects, you cannot realistically imagine all possible modules and interactions until you start coding.
That being said, I know that large systems do occasionally go through months of design and planning, and it is possible to do exhaustive up-front design so that the implementation becomes a fairly straightforward process. From what I’ve seen though, I still don’t think it covers all eventualities.
So, what is the best way to make a software system, if not to plan at the beginning? Surely just coding ‘by the seat of your pants’ is a bad way to do it too because you end up with spiderweb dependencies, and generally stinky code.
I think the answer is ‘do just enough’.
Although that’s a very stupid answer. Let me take the aforementioned emulator as a case study. It is, by far, not the prettiest, most well designed code I have ever written but I am quite proud of it for a few reasons. The way that I went about writing it is quite simple. I decided on some coding standards. This was a very easy part as I have been largely using the same throughout my ‘Tiny Little’ library projects. Then, as I decided on an interface. If I started off the project designed to be a standalone emulator, I might have designed it differently, but I was designing it as a library which I could hook into various programs. Therefore I stubbed out the various public interface functions I thought I would need. There would, doubtless, be more added later, especially as I had no real experience with 8 bit CPUs, or emulation at the time, but I set them out as a contract. The ones I thought I would need then, if they were to be changed, would need a very good reason to do so.
Then I started hooking up my test suite.
I will not pretend to be the poster-boy for Test Driven Design, a large amount of the project is yet to be adequately unit tested. I also didn’t write a number of the tests until after writing the functionality, a habit I have to change. However, having a set of tests there did two things, first and most obviously, it allowed me to quickly sanity check the things which I had written tests for, but also it made it very easy to check that my dependencies weren’t getting too out of hand. If I had to do a lot of setup for a test, then there were dependency problems and I needed to simplify something. In the case of ‘tlvm’ there were a number of tests where the setup did get a bit too long for my liking, but this was because I was hand-assembling some small test programs, and I favoured doing it in a more verbose way over a string of hex values, as it allowed me to visually check the test. Test setup complexity should be kept to a minimum, not necessarily test setup length.
So, in the case of tlvm, how did this work? Not too great to be totally honest. One of the issues was that my conceptual design changed a number of times. My first plan was to create a custom CPU instruction set which could be integrated with, for example, a game, in a way like DCPU-16 was intended in 0x10c. When it became obvious that I didn’t understand enough about how processors worked (learning this was the main driving force behind this whole process) then I decided to pick a processor and emulate it. I looked at a couple before deciding on the Intel 8080. It is still debatable whether I made the right choice. I believe it was.
The reasons for me picking the 8080 initially were, firstly what little ASM I had done in the past had been x86 and ARM. As the 8080 is a direct ancestor to x86, this seemed like the one I might understand the most straight off. Secondly, the Zilog Z80 used an extended 8080 instruction set so, once the 8080 was working, I could extend it and get another chip supported. Between the 8080 and the Z80, a lot of computers, consoles and other machines could then be emulated. Finally, there was an operating system I had a passing familiarity with which ran on 8080, CP/M, which I was hoping, with not an awful lot more than CPU emulation, to get running.
As it turned out, looking at CP/M was a great idea, as I will mention a little later.
One thing I did not do while writing tlvm, was look at existing emulators which support 8080, or any other 8-bit system. I did this for a simple reason. I wanted to learn things myself, and not just copy/paste what someone else had already written. My reference material was initially just any, and all hardware documentaion I could find. For other software projects, I think it would be important to have a design in mind, or ideally on paper, before investigating existing solutions. Existing solutions may be written by people with more experience, in a more efficient way, but having looked at them, they do make it difficult to think in any other way. The example of this is, the instruction set in most emulators I’ve looked at since, implement the instruction set of a given processor as a giant switch statement. If I would have seen these before, I might well have implemented tlvm in this way, but I did not. I am glad I did not. I decided that for tlvm, I would have a processor definition struct, which contained an array of function pointers, one for each instruction. I doubt this is quicker than a simple switch statement, as there will almost certainly be far more jumping through the code, and it will definitely use more memory, although the processor definition is only created once and used for all subsequent CPUs of the same type. The benefits though? Adding a new instruction is trivial, and doesn’t require modifying a gigantic switch statement (although, to be fair, I have a huge registration function, but I could probably macro that up even more if I wanted and it would get smaller). Instructions are broken into small, self contained units, which can be easily tested and verified. Also, in the case of something like the Z80, I can populate the 8080 instruction set, and then add the Z80 extensions on top without touching the 8080 code at all.
One mistake I made early on was to not have any particularly reliable test cases. I had written a few based on my understanding of the processor, but these proved to be incorrect in a few instances. My first real test case was when I discovered the cpudiag utility from CP/M. It had no real dependencies apart from one syscall which printed text to the screen, and it tested most of the instructions that the 8080 had to offer. A quick bit of self-assembly later and I had replicated the BDOS print call, and I could test out cpudiag. Turned out that a couple of the instructions I had implemented were incorrect in some way, but largely everything ‘just worked’. Now I had proof that the implementation was correct.
Since passing all cpudiag tests, it was only time before I tried something slightly more stessful on my little emulator. I had come across a 4-part ROM for Space Invaders, which was originally built on a Taito 8080 board, built around the same Intel 8080 chip that I had emulated. It also had very few other components which would need to be implemented. I knocked up a quick display in SDL and hooked up some ports to give values which Space Invaders seemed to be expecting and fired it up. First thing was, it was upside down. Whoops. Fixed that and I got the following. Notice something weird? The Y is upside down. After a little research I think I discovered what caused this. There is a shift register connected to one of the ports of the CPU, this is used to simulate the movement across the screen of the invaders. I’m pretty sure the 8080 could have managed this without the extra chip, so I can only assume that this is a form of early copy-protection, needing to copy the board exactly to make it work. The reason the Y is upside down was that a space invader was meant to come across the screen ‘pick up’ the Y and take it off screen, returning with one the correct way around. At this point, I still haven’t got Space Invaders fully working, but the fact I can run it, and display a screen makes me happy that it works.
In subsequent bouts of fiddling with tlvm, I have also experimented with adding further CPUs to it, as well as things such as adding coprocessors which are sync’d off the same clock as the main processor and such. None of these are particularly complete, some are barely more than started, but it highlights another main point about growing a system. Allow flexibility. I have had to do a number of rewrites of parts of tlvm, and restructured the files, the data etc, to allow for more processors and different designs. One thing that should not happen is that the coherence and code design should not be compromised, but, however you work, you should be allowed to make those changes with as little pain as possible. In my case, I’m publishing everything on github, so I’m obviously using git. At work I use Perforce. I won’t get into a VCS debate here, but I’ll just say that I like git, and it does a number of things right to be able to grow systems. The first is branching. Branching is quick, cheap and unless you’re working on a very simple project, you should be using branches. When I started adding a new CPU, I branched. When I tried adding coprocessors, I branched. When I ended up doing a major redesign in a branch, such as moving files around to make more logical sense when the system supported multiple CPUs, I merged back to master, and made sure that any branches I then worked on contained the new changes. Also, being able to make micro-commits of parts of features that, whereas not fully implemented and may break the system to commit as-is, would help to have a separate stage in the version control, if I ever needed to roll back. Branches also help with this of course.
To reiterate, when growing a system, I have found a number of things I feel should be followed when creating a system, especially one you cannot quite see the full scope of.
- Plan just enough to get an idea about how things should fit together.
- Set up a test suite.
- Make sure you have solid test cases.
- Ensure you have systems in place that will allow organic growth and iterative design,
- Repeatedly check that you are sticking to the design, coding standards and that you don’t get too messy.