LJ article: intel not to fix hardware flaws in future CPU’s


There was an interesting line in the August 2018 linux journal article “ What’s New in Kernel Development”, by Zack Brown:
“Linus Torvalds has interpreted some of Intel statements to mean that Intel does not intend to fix some of the hardware flaws in future generations of CPUs, which would force kernel developers, and developers of other operating systems, to work around those flaws for the for seeable future. “

This likely would’ve come from intel’s evaluation that to fix these flaws at the hardware level would cost too much money. So, they will update their EULA’s to kick the issue down the supply chain (EULA’s let manufactures get away with too much, but that’s a different topic).

Is intel’s decision a little like a pacemaker manufacturer saying “…yeah we know that sometimes it won’t put out a voltage pulse but we’re gonna leave that issue to be fixed in the software by the hospital before surgery”? Is this a business model we want to allow?


I think that statement paints an inaccurate picture regarding “future generations” and “forseeable future”. It was clear that Intel would not be able to immediately fix all of its designs. Skylake was just out of the door and everything we are seeing now is basically refinements of Skylake or even older generations. It takes years to get a new CPU generation production-ready, and changing low-level internals like branch prediction, cache control etc. basically throws you back to start because you have to do all the validation and optimization again. Also some of the additional flaws we know about now were discovered after the initial Spectre/Meltdown publications.

As an employee of a large enterprise customer I can tell you that Intel is working hard to fix all known flaws. Cascade Lake, due to ship in Q4’18/Q1’19, will have the first wave of hardware mitigations against side channel attacks.


Obviously this isn’t ideal, but there have been hardware bugs worked around in software since time immemorial, right? The original XBoxes booted up at radically different speeds because Microsoft bought all the RAM they could get their hands on, some of which didn’t even work, and then tested it at bootup and disabled broken chips, just so they could get them built and released as fast as possible. Pentium chips had the FOOF bug, and OSes worked around it. The next chip will not have the flaw, and software works around it in the current chip; it’s not a great way for things to operate, but the software has to do it anyway since existing released chips can’t be fixed and can’t be made to disappear.


Yes, you don’t want to know how much broken hardware is out there and how many quirks people have to build into their drivers. The string “quirk” appears about 11,000 times in the Linux kernel sources.

I think the point is that

  1. Usually you try to fix the next design and Intel is (wrongfully) accused as just having said “Nope”, which would have been a major departure from how the world worked before

  2. Usually broken hardware requires a small fix somewhere. The F00F bug patch was ~140 lines and a non-trivial part of that is formatting an printing strings. But Spectre/Meltdown led to things like the introduction of full-blown Kernel Page Table Isolation, which weighs in at several hundred or even thousands of lines of code.

Intel couldn’t keep these issues around even if they wanted to. Their public image is bad enough as it is, and their partners are having a hard time explaining enterprise customers why they are supposed to buy CPUs which will not perform as expected due to necessary security mitigations. Especially while AMD is busy selling more EPYCs every day.


And the Power Cell CPUs that went into IBM POWER servers or PlayStation 3s depending on how many SPE cores were marked as defective through QA.


…that’s standard practice, it’s called “binning”.


I know, that was my point [used to work for a fabless semi inc]

Please respect our code of conduct which is simple: don't be a dick.