Why is this an argument? Is it really so upsetting to you that I thought this wa...

anyfoo · on Nov 29, 2022

> Chips in 1978 had bugs like this everywhere, and we all just dealt with it.

Like what?

> Motorola, when faced with a critical bug [...] (c.f. Apollo, who used two chips)

I am well aware of the Apollo two chip hack, but I was not aware that this was considered a "critical bug". I thought the original 68000 just did not support virtual memory with paging, and the 68010 introduced that. Given that you needed an external MMU if you wanted to do any kind of virtual memory, which most clients did not care about, and that the 8086 and many if not most contemporaries in that class did not support any kind of virtual memory either, that seemed like a reasonable limitation to me.

The kicker is, I actually own a Siemens PC-D, a marvelous machine that has an 80186 (the differences to an 8086 don't matter here), and an actual external MMU made up of discrete 74 series logic chips. It runs SINIX, and has full memory protection through that MMU.

And yet it does not support paging either, because by the time the external circuitry recognized a fault, the CPU had already moved on further. The only thing SINIX did was kill the faulting process.

> while forgetting the interrupt state bug that forced the introduction of the 68010

Are we actually talking about the same thing then? The Apollo hack I know was because after a bus error, not an interrupt, you could not restart the faulting instruction, presumably because by the time the fault was recognized, the CPU had already continued executing an indeterminate amount. This does not matter for regular interrupts, since they are not synchronized to running code and do not restart instructions. It would have been a pretty useless chip otherwise, and yet it thrived even after its descendant was introduced.

The point is, it seemed to me Apollo really wanted to use the 68000 for something it did not support (yet), and other users of the 68000 were not affected (most did not even have the MMU). It was a marvelous, unique hack, that I've actually never seen fully documented or confirmed anywhere (please let me know if you have). The PC would have been amongst the ones not affected, and the chip even spun off variants long after the 68010 was released, not having paged virtual memory just wasn't a dealbreaker for most.

> That wasn't the solution I proposed. I proposed just having the stack reserve an interrupt frame at the top and setting SP to zero before changing SS. The code I showed was just proof that even putative users[1] who couldn't change the OS would be able to work around it.

Yes, you proposed every single piece of code switching a stack, not just OS code, to do some careful dance whenever it switches the stack, on top of making sure that every stack in existence has the proper amount of unused space at its base to support that.

And then all interrupting code would have to immediately switch to its own stack (which, while required today, wasn't necessary back then, mainly because all code ran with all privileges and plenty of stack). While taking care that all registers can be properly restored on the way back, of course. Or only the NMI handler, but then you would have to additionally have to save and restore the interrupt flag around your stack switching.

But okay, let's agree to disagree then. You say there is a reasonable software workaround, I say there is not, for what to me was a pretty clear design bug. Albeit one that was also easily fixed. In my mind, Intel did the sane thing back then by simply patching it up. Can't say that it hurt them.

ajross · on Nov 29, 2022

This has gone completely off the rails, and I simply don't understand why you're being so combative given you seem to have a genuine interest in the same stuff I do. But in the interests of education:

> I thought the original 68000 just did not support virtual memory with paging, and the 68010 introduced that

The 68000 "did not support virtual memory" because instructions that caused a synchronous[1] bus exception could not be restarted. It was a bug with the state pushed on the stack (though I forget the details). In fact it was a bug[2] almost exactly like the one that produced the stack mess we're talking about. And Motorola never bothered to fix it. Intel could likewise have announced "you can only assign to SS when SP is zero" and refused, with little change in device success. And I find it surprising that they didn't.

[1] Because technically the bus isn't synchronous on a 68k, thus the obvious-in-hindsight design thinko. See [2].

[2] A design error following from the failure of the engineers to anticipate the interaction between two obvious-seeming features (processor use of the stack for interrupts, and stack segments in the former, hardware bus errors being used to report synchronous instruction results in the latter).

anyfoo · on Nov 29, 2022

I'm not sure I'm the one being combative, to be honest. So far Intel at the time seemed to think this was a design bug with significant enough consequences that they decided fixing it in some unused part of the mask. And I just agree with that.

I've been privy to such discussions myself, albeit in modern times. I'm familiar with silicon engineers and kernel engineers coming together to discuss a CPU bug or design mishap, and coming up with all sorts of crazy workarounds. Until it's eventually decided that this is just too much to inflict, and it's fixed.

Since you are interested in the 68000 thing, I think I do know the details and can explain, because I have before looked at the difference between the 68000 and the 68010 that makes paging possible. Let's look at the 68010 manual here: http://www.bitsavers.org/components/motorola/68000/MC68010_6...

(Notice how it already says "VIRTUAL MEMORY MICROPROCESSORS" on the front while the 68000 manual in the same directory does not, though I suppose it could be a good marketing move if there was an actual unintended bug, and Motorola decided to advertise the fix as a new CPU?)

Page 5-12: "Bus error exceptions occur when external logic terminates a bus cycle with a bus error signal. Whether the processor was doing instruction or exception processing, that processing is terminated, and the processor immediately begins exception processing."

It specifies external logic, which would be the MMU in the interesting cases. Everyone not using an MMU did not care, and bus errors were still valuable with or without MMU to signal illegal accesses and crash the program that caused it. The full problem is elaborated on the next page: "The value of the saved program counter does not necessarily point to the instruction that was executing when the bus error occurred, but may be advanced by up to five words. This is due to the prefetch mechanism on the MC68010 that always fetches a new instruction word as each previously fetched instruction word is used (see 7.1.2 Instruction Prefetch)."

The 68010 fixed it by adding enough (opaque, undocumented) internal state for the CPU on the stack to effectively undo what it did so far and restart the instruction, that state is shown (without detail, and subject to change between revisions) on the diagram.

The 68000 manual simply says: "Although this information is not sufficient in general to effect full recovery from the bus error, it does allow software diagnosis" (which should include a multitasking OS killing the task and proceeding), and none of the internal state exists on the stack.

But you may still well be right that Motorola did intend for bus errors to be restartable and failed. They would not say so explicitly in the manual of course.

But from what I can see this still only affected systems with an MMU, and then only if they wanted to do full virtual memory including paging. SINIX on the PC-D did not, they were content with swapping and merely detecting illegal accesses, like other UNIXes at the time. The 8086 problem affected everyone, including in application code.

If you know more than what I've cobbled together and can elaborate, I'd be extremely interested in it, because the Apollo DN100 hack always fascinated me to no end (as should be obvious).

anyfoo · on Nov 29, 2022

I might add, it's obvious that you are competent by the way. I think we just disagree on the severity here.