> However, the 386 uses a different approach—CMOS switches—that avoids a large AND/OR gate.
Standard cell libraries often implement multiplexers using transmission gates (CMOS switches) with inverters to buffer the input and restore the signal drive. This implementation has the advantage of eliminating static hazards (glitches) in the output that can occur with conventional gates.
Static hazards are most often dealt with by just adding some redundant logic (consensus terms) to the circuit. This can even be done automatically.
There are 2 interesting articles here. Not only does Ken treat us with a great text, but hidden in footnote 1 is a second gem. Thanks for the early christmas gift!
> Regenerating the cell layout was very costly, taking many hours on an IBM mainframe computer.
I would love to know more about this – how much info is publicly available on how Intel used mainframes to design the 386? Did they develop their own software, or use something off-the-shelf? And I'm somewhat surprised they used IBM mainframes, instead of something like a VAX.
"80386 Tapeout: Giving Birth to an Elephant" by Pat Gelsinger, Intel Technology Journal, Fall 1985, discusses how they used an Applicon system for layout and an IBM 3081 running UTS unix for chip assembly, faster than the VAX they used earlier. Timberwolf also ran on the 3081.
"Design And Test of the 80386" (https://doi.org/10.1109/MDT.1987.295165) describes some of the custom software they used, including a proprietary RTL simulator called Microsim, the Mossim switch-level simulator, and the Espresso PLA minimizer.
> Espresso PLA minimizer
You can still find the software for Espresso (I ran it a few years ago):
VAX were relatively small computers for the time. They grew upward in the late 80s eventually rivalling the mainframes for speed (and cost). But in the early 80s IBM's high end machines were an entire order of magnitude larger.
Top of the line VAX in 1984 was the 8600 with a 12.5 MHz internal clock, doing about 2 million instructions per second.
IBM 3084 from 1984 - quad SMP (four processors) at 38 MHz internal clock, about 7 million instructions per second, per processor.
Though the VAX was about $50K and the mainframe about $3 million.
There's not a lot of "off the shelf" in terms of mainframes. You're usually buying some type of contract. In that case I would expect a lot of direct support for customer created modules that took an existing software library and turned into the specific application they required.
> Did they develop their own software
Knowing intel SW and based on it was succesful, I really doubt it
But in the end, the 386 finished ahead of schedule, an almost unheard-of accomplishment.
Does that schedule include all the revisions they did too? The first few were almost uselessly buggy:
According to "Design and Test of the 80386", the processor was completed ahead of its 50-man-year schedule from architecture to first production units, and set an Intel record for tapeout to mask fabricator.
Except for the first stepping A0, whose list of bugs is unknown, and it also implemented a few extra instructions that were dropped in the next revisions, instead of having their bugs fixed, the other steppings have errata lists that are not significantly worse than those of most recent Intel or AMD CPUs, which also have long lists of bugs, for which there are workarounds in most cases, at the hardware level or operating system level.
> (Note 4) But to write a value into the latch, the switch is enabled and its output overpowers the weak inverter.
This implementation is sometimes called a "jam latch" (the new value is "jammed" into the inverter loop).
if I remember correctly the 386 didn't have branch prediction so as a thought experiment how would a 386 with design sizes from today (~9nm) fare with the other chips?
It would lose by a country mile, a 386 can handle about one instruction every three or four clocks, a modern desktop core can do as many as four or five ops PER clock.
It's not just the lack of branch prediction, but the primitive pipeline, no register renaming, and of course it's integer only.
A Pentium Pro with modern design size would at least be on the same playing field as today's cores. Slower by far, but recognisably doing the same job - you could see traces of the P6 design in modern Intel CPUs until quite recently, in the same way as the Super Hornet has traces of predecessors going back to the 1950s F-5. The CPUs in most battery chargers and earbuds would run rings around a 386.
A 386 was a beast against a 286, a 16 bit CPU. It was the minimum to run Linux with 4MB of RAM, but a 486 with and FPU destroyed it and not just in FP performance.
Bear in mind that with an 386 you can barely decode an MP2 file, while with a 486 DX you can play most MP3 files at least in mono audio and maybe run Quake at the lowest settings if you own a 100 MHZ one. A 166MHZ Pentium can at least multitask a little while playing your favourite songs.
Also, under Linux, a 386 would manage itself relativelly well with just terminal and SVGAlib tools (now framebuffer) and 8MB of RAM. With a 486 and 16MB of RAM, you can run X at sane speeds, even FVWM in wireframe mode to avoid window repaintings upon moving/resizing them.
Next, TLS/SSL. WIth a 486 DX you can use dropbear/bearssl and even Dillo happily with just a light lag upong handhaking, good enough for TLS 1.2. Under a 486, a 30-35? year old CPU. IRC over TLS, SSH with RSA256 and the like methods, web browsing/Gemini under Dillo with TLS. Doable, I did it under VM, it worked, even email and NNTP over TLS with a LibreSSL fork against BearSSL.
With a 386 in order to keep your sanity you can have plain HTTP, IRC and Gopher and plain email/Usenet. No MP3 audio, where with a 486 you could at least read news over Gopher (even today) will multitasking if you forced yourself to a terminal environment (not as hard as it sounds).
If you emulate some old i440FX based PC under Qemu, switching between the 386 and 486 with -cpu flag gives the user clear results. Just set one with the Cirrus VGA and 16MB and you'll understand upong firing X.
This is a great old distro to test how well 386's and 486's behaved:
Yep, we had a few later-generation 486s in college. They would run Windows NT4 with full GUI - not especially well, but they'd run it. And they'd do SSL stuff adequately for the time.
ISTR the cheap "Pentium clones" at the time - Cyrix, early AMDs before the K5/K6 and Athlon - were basically souped-up 486 designs.
(As an aside - it's very noticeable how much innovation happened between a single generation of CPU architectures at that time, compared to today. Even if some of them were buggy or had performance regressions. 5x86 to K5 was a complete redesign, and the same again between K6 and K7).
I ran X and emacs and gcc on a 386DX with 5MB of RAM circa 1993, and while not pleasant it was workable. The upgrade to 16MB (that cost me £600!) made a big difference.
Ten years before that I saved up for ages and spent £25 on 16KB of RAM. I could have bought a house for the cost of 16MB. It's amazing how quickly it changed.
ZX81 rampack, right?
You could run Linux with 2MB of ram with kernels before 1994 AFIK and with a.out format of binaries instead of ELF.
Nowadays I think it's still doable in theory but Linux kernel have some kind of hard coded limit of 4MB (something to do with memory paging size).
Yep but badly. Read the 4MB laptop Howto. Nowadays if I had a Pentium/k5 laptop I'd just fit a 64 MB SIMM on these and keep everything TTY/framebuffer with NetBSD and most of the unheard daemons disabled. For a 486, Delicate Linux plus a custom build queue for bearssl, libressl on top (there's a fork out there), plus brssl linked lynx, mutt, slrn, mpg123, libtls and hurl.
[deleted]
Modern CPUs are more or less built around the memory hierarchy, so it would be really hard to compare those two - a 386 in a modern process might be able to run at the same clock speed or even faster, but with only a few kb of memory available. As soon as you connect a large memory it will spend most of the time idling (and then of course it is the problem of power dissipation density).
amazing and very informative work. thank you!
I'm curious to know which model, speed, voltage, stepping, and package writing sample(s) were evaluated because there isn't just one 386. i386DX I assume but it doesn't specify whether it was a buggy 32-bit multiply or "ΣΣ" or newer.
"Showing one's work" would need details that are verifiable and reproducible.
I've looked at a bunch of 386 dies, see: https://www.righto.com/2023/10/intel-386-die-versions.html
I typically use an earlier 1.5µm chip since it's easier to study under the microscope than a 1µm chip and I use "ΣΣ" because they are more obtainable. Typical steppings are S40362 or S40344, whatever is cheapest on eBay.
> However, the 386 uses a different approach—CMOS switches—that avoids a large AND/OR gate.
Standard cell libraries often implement multiplexers using transmission gates (CMOS switches) with inverters to buffer the input and restore the signal drive. This implementation has the advantage of eliminating static hazards (glitches) in the output that can occur with conventional gates.
Static hazards are most often dealt with by just adding some redundant logic (consensus terms) to the circuit. This can even be done automatically.
There are 2 interesting articles here. Not only does Ken treat us with a great text, but hidden in footnote 1 is a second gem. Thanks for the early christmas gift!
> Regenerating the cell layout was very costly, taking many hours on an IBM mainframe computer.
I would love to know more about this – how much info is publicly available on how Intel used mainframes to design the 386? Did they develop their own software, or use something off-the-shelf? And I'm somewhat surprised they used IBM mainframes, instead of something like a VAX.
Various papers describe the software, although they are hard to find. My earlier blog post goes into some detail: https://www.righto.com/2024/01/intel-386-standard-cells.html
The 386 used a placement program called Timberwolf, developed by a Berkeley grad student and a proprietary routing tool.
Also see "Intel 386 Microprocessor Design and Development Oral History Panel" page 13. https://archive.computerhistory.org/resources/text/Oral_Hist...
"80386 Tapeout: Giving Birth to an Elephant" by Pat Gelsinger, Intel Technology Journal, Fall 1985, discusses how they used an Applicon system for layout and an IBM 3081 running UTS unix for chip assembly, faster than the VAX they used earlier. Timberwolf also ran on the 3081.
"Design And Test of the 80386" (https://doi.org/10.1109/MDT.1987.295165) describes some of the custom software they used, including a proprietary RTL simulator called Microsim, the Mossim switch-level simulator, and the Espresso PLA minimizer.
> Espresso PLA minimizer
You can still find the software for Espresso (I ran it a few years ago):
https://en.wikipedia.org/wiki/Espresso_heuristic_logic_minim...
VAX were relatively small computers for the time. They grew upward in the late 80s eventually rivalling the mainframes for speed (and cost). But in the early 80s IBM's high end machines were an entire order of magnitude larger.
Top of the line VAX in 1984 was the 8600 with a 12.5 MHz internal clock, doing about 2 million instructions per second.
IBM 3084 from 1984 - quad SMP (four processors) at 38 MHz internal clock, about 7 million instructions per second, per processor.
Though the VAX was about $50K and the mainframe about $3 million.
There's not a lot of "off the shelf" in terms of mainframes. You're usually buying some type of contract. In that case I would expect a lot of direct support for customer created modules that took an existing software library and turned into the specific application they required.
> Did they develop their own software
Knowing intel SW and based on it was succesful, I really doubt it
But in the end, the 386 finished ahead of schedule, an almost unheard-of accomplishment.
Does that schedule include all the revisions they did too? The first few were almost uselessly buggy:
https://www.pcjs.org/documents/manuals/intel/80386/
According to "Design and Test of the 80386", the processor was completed ahead of its 50-man-year schedule from architecture to first production units, and set an Intel record for tapeout to mask fabricator.
Except for the first stepping A0, whose list of bugs is unknown, and it also implemented a few extra instructions that were dropped in the next revisions, instead of having their bugs fixed, the other steppings have errata lists that are not significantly worse than those of most recent Intel or AMD CPUs, which also have long lists of bugs, for which there are workarounds in most cases, at the hardware level or operating system level.
> (Note 4) But to write a value into the latch, the switch is enabled and its output overpowers the weak inverter.
This implementation is sometimes called a "jam latch" (the new value is "jammed" into the inverter loop).
if I remember correctly the 386 didn't have branch prediction so as a thought experiment how would a 386 with design sizes from today (~9nm) fare with the other chips?
It would lose by a country mile, a 386 can handle about one instruction every three or four clocks, a modern desktop core can do as many as four or five ops PER clock.
It's not just the lack of branch prediction, but the primitive pipeline, no register renaming, and of course it's integer only.
A Pentium Pro with modern design size would at least be on the same playing field as today's cores. Slower by far, but recognisably doing the same job - you could see traces of the P6 design in modern Intel CPUs until quite recently, in the same way as the Super Hornet has traces of predecessors going back to the 1950s F-5. The CPUs in most battery chargers and earbuds would run rings around a 386.
A 386 was a beast against a 286, a 16 bit CPU. It was the minimum to run Linux with 4MB of RAM, but a 486 with and FPU destroyed it and not just in FP performance.
Bear in mind that with an 386 you can barely decode an MP2 file, while with a 486 DX you can play most MP3 files at least in mono audio and maybe run Quake at the lowest settings if you own a 100 MHZ one. A 166MHZ Pentium can at least multitask a little while playing your favourite songs.
Also, under Linux, a 386 would manage itself relativelly well with just terminal and SVGAlib tools (now framebuffer) and 8MB of RAM. With a 486 and 16MB of RAM, you can run X at sane speeds, even FVWM in wireframe mode to avoid window repaintings upon moving/resizing them.
Next, TLS/SSL. WIth a 486 DX you can use dropbear/bearssl and even Dillo happily with just a light lag upong handhaking, good enough for TLS 1.2. Under a 486, a 30-35? year old CPU. IRC over TLS, SSH with RSA256 and the like methods, web browsing/Gemini under Dillo with TLS. Doable, I did it under VM, it worked, even email and NNTP over TLS with a LibreSSL fork against BearSSL.
With a 386 in order to keep your sanity you can have plain HTTP, IRC and Gopher and plain email/Usenet. No MP3 audio, where with a 486 you could at least read news over Gopher (even today) will multitasking if you forced yourself to a terminal environment (not as hard as it sounds).
If you emulate some old i440FX based PC under Qemu, switching between the 386 and 486 with -cpu flag gives the user clear results. Just set one with the Cirrus VGA and 16MB and you'll understand upong firing X.
This is a great old distro to test how well 386's and 486's behaved:
https://delicate-linux.net/
Yep, we had a few later-generation 486s in college. They would run Windows NT4 with full GUI - not especially well, but they'd run it. And they'd do SSL stuff adequately for the time.
ISTR the cheap "Pentium clones" at the time - Cyrix, early AMDs before the K5/K6 and Athlon - were basically souped-up 486 designs.
(As an aside - it's very noticeable how much innovation happened between a single generation of CPU architectures at that time, compared to today. Even if some of them were buggy or had performance regressions. 5x86 to K5 was a complete redesign, and the same again between K6 and K7).
I ran X and emacs and gcc on a 386DX with 5MB of RAM circa 1993, and while not pleasant it was workable. The upgrade to 16MB (that cost me £600!) made a big difference.
Ten years before that I saved up for ages and spent £25 on 16KB of RAM. I could have bought a house for the cost of 16MB. It's amazing how quickly it changed.
ZX81 rampack, right?
You could run Linux with 2MB of ram with kernels before 1994 AFIK and with a.out format of binaries instead of ELF.
Nowadays I think it's still doable in theory but Linux kernel have some kind of hard coded limit of 4MB (something to do with memory paging size).
Yep but badly. Read the 4MB laptop Howto. Nowadays if I had a Pentium/k5 laptop I'd just fit a 64 MB SIMM on these and keep everything TTY/framebuffer with NetBSD and most of the unheard daemons disabled. For a 486, Delicate Linux plus a custom build queue for bearssl, libressl on top (there's a fork out there), plus brssl linked lynx, mutt, slrn, mpg123, libtls and hurl.
Modern CPUs are more or less built around the memory hierarchy, so it would be really hard to compare those two - a 386 in a modern process might be able to run at the same clock speed or even faster, but with only a few kb of memory available. As soon as you connect a large memory it will spend most of the time idling (and then of course it is the problem of power dissipation density).
amazing and very informative work. thank you!
I'm curious to know which model, speed, voltage, stepping, and package writing sample(s) were evaluated because there isn't just one 386. i386DX I assume but it doesn't specify whether it was a buggy 32-bit multiply or "ΣΣ" or newer.
"Showing one's work" would need details that are verifiable and reproducible.
I've looked at a bunch of 386 dies, see: https://www.righto.com/2023/10/intel-386-die-versions.html I typically use an earlier 1.5µm chip since it's easier to study under the microscope than a 1µm chip and I use "ΣΣ" because they are more obtainable. Typical steppings are S40362 or S40344, whatever is cheapest on eBay.
Great work and pleasant reading!