Monday, August 22, 2016

Double-tap RESTORE to enter Hypervisor "freezer"

After a lot of little bug fixing, I finally got the keyboard based trap-to-Hypervisor function working.  While this might sound dull at first, it is actually super important, and enables a bunch of really interesting and fun features.

The best way to think of this, is as an integrated "freeze" button, that is triggered by tapping the RESTORE key twice in quick succession (between 50ms and 800ms apart).  When this occurs, it triggers a transparent trap to the Hypervisor. That is, the running program doesn't have the slightest idea that it has happened.  This allows the Hypervisor to run some arbitrary routine, before exiting back to the running program.  In other words, it really is just an integrated freeze function. Right now, that routine just toggles between slow and fast CPU speed, which is kind of fun, but not really that exciting.  Here is the current Hypervisor RESTORE trap routine:

double_restore_trap:
; For now we just want to toggle the CPU speed between 48MHz and
; 1MHz

; enable 48MHz for fast mode instead of 3.5MHz
lda $D054
eor #$40
sta $D054

; enable FAST mode,
lda $D031
ora #$40
sta $D031

; bump border colour so that we know something has happened
lda $D020
inc
and #$0f
sta $D020

; return from hypervisor
sta hypervisor_enterexit_trigger

There isn't anything really to exciting to see there: It just fiddles a couple of registers to toggle the CPU speed between normal and 48MHz, increments the border colour as a bit of a debug aid, and then exits from the hypervisor.

For the technically inclined, what you might be noticing what isn't there.  As I have talked about previously, the Hypervisor trap process is super efficient: The entire CPU state is saved to shadow registers, resulting in a 1 cycle entry and exit time from the Hypervisor, because you don't need to save or restore any CPU or memory mapping registers.  The CPU is automatically set to a known configuration on entry to the Hypervisor, and restores the running program's configuration on exit.  Thus, this entire Hypervisor trap takes only about 40 cycles, or about 850 nanoseconds.  Most modern desktop processors probably would have trouble beating that.  Indeed, as previously mentioned, a minimalistic Hypervisor trap can complete in under 200 nanoseconds.

Anyway, back to the story at hand...


Like on a freeze cartridge, we will implement a freeze menu, that will allow a number of useful operations.  The usual staples will be there, including memory monitor, the option to reset the machine, probably some poke finder type functions, the option to freeze the currently running program to disk, and so on.

However, the MEGA65 has been designed from the outset to do much more in the freeze menu.

First up, you will be able to switch tasks, by browsing through the list of tasks, complete with 80x50 pixel thumbnails that are drawn using the previously described hardware thumbnail generator, that continuously generates little screen captures of the running program.

Similarly, you will be able to delete tasks and start new ones.

So, for example, if you have a sudden need to show off your BASIC programming prowess half-way through a game of Ghosts and Goblins, you can just double-tap RESTORE, choose the menu option to create a new C64-mode task, demonstrate your elite status by typing something like:

10PRINT"I RULE!!! ";:GOTO10
RUN

and then when you have demonstrated your mastery over coding to whoever was doubting it, you can double-tap RESTORE again, and switch back to Ghosts and Goblins, which you can easily find from the thumbnails.

Similarly, when approaching a hard part of a game, you could freeze it, make a back up of the game where you are up to, and then go on to play that hard level, and reload the saved state until you can conquer it.  In this way, mere mortals should be able to get a score of at least 7 in Flappy Birds without too much trouble.

While these use-cases might be a bit simplistic and contrived, it is hopefully not too hard to see how the Hypervisor freeze menu will likely play a central role in the use and experience of the MEGA65 for many.  Thus it is really nice to have the hardware side of it implemented.  The next step is to start working on the menu program itself, the freeze/unfreeze routines, and getting saving to the SD card actually working, so that things can get saved.

Sunday, August 14, 2016

Tutorial Video for m65dbg

Gürçe who has been working on the very nice m65dbg symbolic debugger for the MEGA65 has released a nice video providing an introduction to the current feature-set of m65dbg:


Saturday, August 13, 2016

We can include GEOS with the MEGA65

Just a very quick but super exciting note to say that we have received permission to include GEOS with the MEGA65, provided that we do not charge any extra for doing so.  Since we are creating our software suite as open-source, this is no problem at all for us, and means that we can potentially use GEOS to make the MEGA65 configuration menus etc using GEOS, which would make them look nice, and probably be much quicker to write as well.

Sunday, August 7, 2016

Booting GEOS on the MEGA65

Ralph Egas, CEO of Abstraction Games has been helping to port GEOS to the MEGA65, using the disassembly of the GEOS 2.0 Kernal by Maciej Witkowiak.

This has been super exciting, because we have been wanting for some time to get GEOS running on the MEGA65, partly because we know that it should be VERY fast on the MEGA65, even without a RAM expander, because the SD card interface can transfer data faster than the REU on a C64 or C128 can.

However, we weren't sure that it would be easy to do, because GEOS is infamous for its horrible copy protection, which I hadn't realised JUST how horrible/clever it was until I read several pages at that link.

However, Maciej's disassembly of the GEOS kernal removes all such problems for us, and presents the disk drivers as nice discrete modules.  Thus, in theory, all that was needed, was to write a C65/MEGA65 disk routine.

For simplicity and speed of development, Ralph decided to make a version of GEOS that would access the floppy drive(s) via the normal CBM DOS routines,  without any fast loader. This allowed him to test that version under VICE, for very a rapid development cycle, especially since VICE could be run in warp mode, without having to exactly emulate the floppy drive, since it was only being accessed using the official C64 KERNAL routines.

Once Ralph had that working, the plan was to start implementing the MEGA65 SD card routines.  However, he decided to try this de-fast-loaded version on the MEGA65, and was pleasantly surprised to find that it worked:



This is because, like in VICE, by using only the official KERNAL disk routines, the C65's 1581 emulation DOS was able to service the sector reads. He only hit trouble at this point, when trying to write sectors, because the MEGA65's emulation of the C65's floppy controller currently has some problems with writing sectors to the SD card.  We'll fix that as soon as we get the chance to do so.

My first comment to Ralph, after congratulating him, of course, was how slow it was to load.  This was slightly tongue in cheek, because it clearly loads VERY fast.  However, it is still using the C65's 1581 DOS emulation routines, which context switch (very slowly!) on every byte read or written to the internal drive. This costs hundreds of cycles per byte, yielding a maximim disk speed of somewhere around 15 - 30KB/sec.  In contrast, the SD interface is capable (currently) of a theoretical maximum of 3MB/sec, and speeds in the 100s of KB/second are quite easy to achieve. Also, GEOS doesn't know about the MEGA65's DMA controller, and so memory fills are much slower than they could be*.  Thus, I think it should be possible to speed up the loading time by an order of magnitude or so, so as to seem instantaneous after hitting "return" after loading the program.

You can see the current state of the source code on github.  Ralph hopes to implement the native SD card routines soon, which would get us a fully working, and much faster booting GEOS.  He might then look into using MEGA65/C65 features, such as the extra RAM, DMA controller, and improved screen resolutions and colour depths.

* Probably "only" 1MB - 2MB/second using a typical 6502 memory copy routine.

Wednesday, August 3, 2016

We finally have the new disk menu mostly working

Finally we have the new disk menu (mostly) working on the MEGA65:


This is no small milestone, because it doesn't touch the SD card directly at all.

It uses Hypervisor calls to do everything: to check that it is running on a MEGA65, to list the files on the SD card, and then to ask the Hypervisor to mount the disk image, which in turn does some checks to make sure the disk image is fit for mounting. We can now proceed to implementing more Hypervisor calls with confidence.

Also, by using the Hypervisor, the program is able to be quite compact: less than 2KB, despite having a full screen browser to select the disk images, and a (currently disabled) sort routine to show the images in the right order. Indeed, almost 1/4 of the size is text messages.

This now sets the scene for us to progress all of the other previously blocked progress on the Hypervisor, to make the whole machine pleasant to use.

Monday, August 1, 2016

Revenge of the Decimal Flag

[Deutsch Übersetzen unten]

I first learned to hate the 6502's decimal flag when I was about 17.  I was still at school, and was offered a job of writing the software for a dual-6502 based 30' (8+m) industrial roll former (the machine that puts the corrugations into corrugated iron sheeting).  If was a bit older and wiser, I might have thought twice about it, but at the time it seemed like a great idea. Anyway, the process went surprisingly well except for one intermittent bug: Sometimes it wouldn't count the distances out correctly.

This caused a hair raising last 3 days before the blasted thing was due to be shipped off to Argentina while we tried to track down the source.  The cause turned out to be that the 6502, while specified in the data-sheet to start up with the decimal flag clear, starts up, in fact, with the decimal flag only usually clear.  Needless to say, one of the absolute first things that I did with the MEGA65 was make sure that its CPU always starts up with the decimal flag clear.  End of problem. Well, so I thought...

Some of you will be aware that I have been trying to track down the source of some nasty bugs in the Hypervisor, where making trap calls into the Hypervisor would sometimes fail for seemingly inexplicable reasons.  This came to a head when I tried to build the new Disk Menu program into the Hypervisor ROM, instead of the amazingly horrible diskchooser thing that I cobbled together back in the beginning.

With the Disk Menu built in (running in user mode, not in the Hypervisor, but installed automatically at $C000 by the Hypervisor), the trap bug was happening consistently.  Big problem.  Especially since months of looking at it intermittently failed to provide a simple explanation.

Today in the lab we had a breakthrough.  Ben pointed out that the Hypervisor checkpoint debug system was displaying a corrupted message when we tried to debug the Disk Menu program.  This gave us a clue that I started to follow, and with a bit of poking around and following the single-step trace output (Gurce, please feel free to make that program that can show the instruction disassembly for the serial monitor as soon as you like :), I realised that the Hypervisor was incorrectly calculating the skip address when stepping over the message for a checkpoint.  It was adding #$01 to #$1D and getting #$24 as the result.

I was about to start pulling my hair out and wonder exactly what had gone so badly to pot that the ALU was now not even able to compute a simple addition.  Then some little neuron in the back of my mind told me that it looked like the result of a Binary Coded Decimal (BCD) addition.  I then cast my eyes to the right on the trace output to look at the CPU status register, and sure enough, there was the evil "D" staring at me: The CPU was in decimal mode! In the Hypervisor!

It then occurred to me that when the CPU traps to the hypervisor, it preserves the D flag's value, as well as saving it in the hyper_p register for restoration on exit.  A single line change to the code there has fixed that (and I also added a CLD instruction to the main Hypervisor trap entry point because I am paranoid).  Then it was time to go home, so we will have to try it out in the morning.  Hopefully it will let us finally get the Disk Menu program working, which will be a very big step forward.

This still leaves the mystery of how the D flag got set in the processor flags to begin with.  It really shouldn't have, as the Disk Menu program doesn't use decimal mode, either.  That will have to wait until tomorrow as well to be investigated.

For now, I am just happy that we have finally found the main bug, and can hopefully start moving forward again.

--

Ich war nur 17, wenn ich erst gelernt das "Dezimal Modus" Fahne des 6502s zu hassen. Ich war noch in der Schule und bekam ein Job, das Software ein 8,7m lang Roll Former Maschine zu schreiben. Es benutzt ein doppel-6502 Platte und hatte nur 8KB RAM pro CPU. Wenn ich weiser war, dann würde ich es nicht akzeptiert haben.  Alles ging ziemlich gut. Dass ist, bis wir hatten nur drei Tage, es zu fertigstellen, bevor es an Bord ein Schiff nach Argentinien muss. Alles funktioniert gut, außer dass manchmal zählt es die Lange falsch. Endlicht entdeckt uns, dass nur meistens gestartet ein 6502 mit dem D-Fahne leer, obwohl das Data-Sheet sagt es immer ohne D-Fahne starten würde.

Natürlich mit dem MEGA65, einige die erste Sache, dass ich gemacht hatte, war die D-Fahne aus am Reset machen. Ich hatte gedacht, dass alles war am Ende mit der blöden D-Fahne. Aber das war nicht so.

Wir haben für ein paar Monaten ein böse Bug im Hypervisor gekriegt. Manchmal, wenn eine Programme ein Hypervisor-Trap gerufen, dann wird es falsch gehen.  Ab und zu hatten wir es untersucht ohne Erfolg. Dann Heute im unseren Labor hat Ben bemerkt, dass wann wir die neue Disk Menu Programme debuggt, dass eine von Checkpoint Nachrichten war immer beschädigt. Endlich hatten wir ein reproduzierbar Bug, dass wir könnten untersuchen.

Nur eine Stunde später hatte ich entdeckt, dass die blöde D-Fahne war auf. Im Hypervisor! Ich war gar nicht glücklich.  Aber wie bekam die D-Fahne auf?  Dann erkannte ich, dass ich der D-Fahne Wert konserviert, wenn ein Hypervisor Trap auftritt.  Es braucht nur Eine Linie zu reparieren, nach wir hatten das Problem entdeckt. Aber es war schon Feierabend. Deshalb müssen wir Morgen prüfen, wenn es das Problem wirklich repariert.

Thursday, April 21, 2016

On cycle count predictability and related things

Some folks have expressed their concern that this CPU redesign takes away from the genuine 8-bit computer feel of the MEGA65.  My feeling is that it doesn't which I will explain below, but at the same time I don't want to be dismissive of anyone's concerns. Our goal remains to make something that is authentic and enjoyable for a wide range of people to use and program.  So please poke me, either in the comments or elsewhere if you wish.

But for now, I will take a little time to explain how the CPU looks from the user-perspective, to hopefully provide some assurance that it is not really a great departure from what we already had.  Indeed, from what I understand, what we doing here is not greatly different from how the Chameleon's CPU operates, i.e., some more modern CPU construction techniques are used behind the scenes, to provide what is very much (in their case) a 6502.

The main difference is that we are being transparent how we are making the CPU behind the scenes, so that it gives the end result of being a 6502 and 4502 compatible CPU.  We're sorry if that spoils the "magic trick" for some, but we strongly believe that transparency is always best in the long run.

The out-of-order instruction retirement is just a fancy way of saying that the CPU takes and executes the instructions in order, but some can take longer to complete, for example if they need to read or write from memory.

What doesn't change, is if an instruction requires the value read from memory, that it can't be completed until the thing it depends on is complete.  That is, it still behaves exactly as one expects a 6502 to behave, for any given program.  This is quite similar in many ways to the way that the SuperCPU has a 1-byte write-through "cache."  We are just using a different mechanism (register renaming, or reservation slots, depending on how you want to look at it), but to achieve much the same goal.

So if we look at a simple loop:

l1: lda $1000,x
sta $2000,x 
inx              
bne l1         

The simulation of this loop for the new CPU (in its current unfinished form, so there might be some changes) below shows how a couple of loop iterations go through. Note that register contents are BEFORE the instruction is executed, just because of how the simulation outputs stuff.  i.e., it shows the CPU state just before it executes the instruction, instead of just after.

-- LDA / STA / INX / BNE instructions all execute on consecutive cycles, taking
-- a total of only 20ns
@450ns: PC $8104 A:00 X:01 Y:00 Z:00 B:00 SP:01FF --E--I--  :  BD 00 10
@455ns: PC $8107 A:00 X:01 Y:00 Z:00 B:00 SP:01FF --E--IZ-  :  9D 00 20
@460ns: PC $810A A:00 X:01 Y:00 Z:00 B:00 SP:01FF --E--IZ-  :  E8 D0 F7
@465ns: PC $810B A:00 X:02 Y:00 Z:00 B:00 SP:01FF --E--I--  :  D0 F7 4C
-- 60ns ( = 12 CPU cycles) elapse between the branch and the next instruction
@525ns: PC $8104 A:00 X:02 Y:00 Z:00 B:00 SP:01FF --E--I--  :  BD 00 10
@530ns: PC $8107 A:00 X:02 Y:00 Z:00 B:00 SP:01FF --E--IZ-  :  9D 00 20
@535ns: PC $810A A:00 X:02 Y:00 Z:00 B:00 SP:01FF --E--IZ-  :  E8 D0 F7
@540ns: PC $810B A:00 X:03 Y:00 Z:00 B:00 SP:01FF --E--I--  :  D0 F7 4C
-- 60ns ( = 12 CPU cycles) elapse between the branch and the next instruction
@600ns: PC $8104 A:00 X:03 Y:00 Z:00 B:00 SP:01FF --E--I--  :  BD 00 10

What can basically be seen above is that the non-branching instructions all take one cycle to run, whether or not they need a memory access, because all the out-of-order retirement and register renaming hides that. The result is that the timing is actually somewhat simpler and easier to predict for the most part than on a real 6502.  Note that we will still have a ~1MHz, ~2MHz and ~3.5MHz speed settings, where we will emulate the normal 6502 and 4502 timing of all instructions, and when we get time to do it, to make the memory access cycles also match that of a 6502 exactly, and naturally also the same for 3.5MHz 4502 mode. (One of the key reasons for reimplementing the CPU this time, is actually to make sure it has two "personalities", where in 6502 mode, all illegal opcodes work properly, and when in 4502 mode, all 4502 opcodes work properly, and can match the timing exactly -- so that we can have a real C64 mode and a real C65 mode, both of which are as compatible as possible.

The other obvious thing is that the branch instruction suffers a pretty big penalty, which is because the pipeline takes a bit of time to start feeding the new instructions.  However, because the clock speed is 4x, and the main pipeline is 4-stage, the end result is that the branch actually takes exactly the same amount of time as on our previous 48MHz CPU design.

It's also worth mentioning that most of the sources of timing uncertainty in modern PC processors etc don't actually come from the pipeline and other features that we are talking about here.  (In fact, the 6502 already pipelines between instructions a little). They come from the cache, from virtual memory, from the operating system that is hiding behind and pre-empting your process all the time and filling the cache with rubbish as a result. We are not having any of that stuff in the MEGA65: What you get is a 6502 or 4502 processor, that behaves how you expect.  We have just implemented it using some lessons learned over the past few decades of CPU implementation.

Otherwise, I think that this work has got some folks thinking about what makes a machine have character, instead of just being an 8-bit version of another wise soulless kind of PC, or FPGA-centric thing that people build.  For us, there are some key things, of which the following are a few.  Of course, we are thinking about many other things, such as C64 compatibility, but we take these simply for granted. 

First, the video generation MUST be rasterised, without a frame-buffer, just like on a real C64 or C65.  That is, the video chip needs to be deciding, cycle by cycle, what colour the next pixel will be, and allow the programmer to do horrible things to it that were never intended.  It is already possible, for example, on the VIC-IV in the MEGA65 to cause a monitor to totally lose sync, because you can trick it into moving the HSYNC pulses on a raster line.  I get back to this again below, but it is really a very important point.  In fact, I would say that what really makes the C64 interesting is the VIC-II and the SID. The CPU, while still important, is really secondary in many ways. It is the custom chips and overall combination that really define the "character" and "personality" of the C64.  The MEGA65 will of course have a its own personality, but we still feel that it will indeed have a personality, and that it will be a very strong one.

Second, it has to still be a simple bare-metal machine, where you have effectively full access to all the hardware when you are running on it.  The only piece we have outside that is the Hypervisor, which is best understood as an integrated freeze cartridge, so that you can easily load, save and switch what you are doing.

Third, the machine must still have fundamental limitations, that provide opportunity for programmers to try to stretch what the machine can do.  This is why we have the combination of CPU and resolution improvements together, for example, so that the relationship of CPU performance and the number of bits on screen at a time remain in reasonable relation.  The C64 has 64000 pixels from not more than 64KB = 512kbit on screen at a time, and 1x10^6 cycles or 3x10^5 instructions per second, so that there is approximately one instruction per bit of displayed graphics per second.  The MEGA65 has about 2x10^6 pixels, and is expected to have some multiple of 10^7 instructions per second.  Thus the instructions per pixel-bit is increased by an order of magnitude over the C64, so that it offers a nice bit of extra freedom, but without removing the limitation completely.  (Compare that with a modern PC, which instead has about 10^10 instructions per second, not counting the 10^12 or more GPU instructions per second). Moreover, the number of bits per pixel available from RAM is still in proportion: A C64 has about 8 bits per pixel available (64KB / 64000 pixels). The C65 actually has less, because while it has 128KB of RAM, it can do, for example 640x400 or 1280x400 resolutions. The MEGA65 goes further, having the same RAM as the C65, but with many more pixels, much more creativity will be required to find solutions to having full screen full-colour displays -- just as this presents special challenges and opportunities for ingenuity on the C64 (and C65). My point here is really that while the boundaries of what is "possible" on the MEGA65 are naturally different to those of the C64, we have retained this sense of a limited computer, so that it still has character, and will still require years of careful thought and experimentation to find its limits.

Finally, the specification has to be fixed for the long-term, like the C64's, so that people can program it with confidence, knowing that their code will "just work" on MEGA65s for years and decades to come, because otherwise the limits of the machine are not real.  This is actually why we want to get this CPU matter sorted out sooner rather than later, so that we can say with authority, "This is the CPU of the MEGA65. It shall be no faster."  Similarly, we want to pin down the last few points on the VIC-IV

Anyway, as I have said, we want this machine to be fun for the community, and something with a stable and fixed specification once we release it, so that it can have a long life, including so that stuff that you write with cycle-by-cycle timing will keep on working.  So please don't hesitate to let us know if you have concerns about our approach, or suggestions how we can do it better. This is one of the great things about an open-source project, that people can look and provide feedback and help to make sure that the end result is as good as possible. We can't guarantee that we can take everyone's requests and include them (partly because some of you ask for opposite things ;), but we do listen and think carefully about them all.