First off, I would say that is some pretty awesome work by this guy to chase this down. Including his work with the manufacturer to help them reliably recreate the issue.
Second, I would say that over the course of my 10 year career in managing developers, I've heard many, many times that the bug was in the kernel, or in the hardware, or in the complier, or in the other lower level thing the developer had no control over. This has been the correct diagnosis exactly once. If I had to guess, I would say about 5%.
I have been the first to trigger two CPU bugs and came across a third a few days after it was discovered, before it was published. Once errata are published, software workarounds are usually put in place quickly, and tripping over them is rare.
Compiler bugs are another story entirely. I have found dozens of them (confirmed), and I can find more whenever I feel like it.
Out of curiosity, if someone paid you to find compiler bugs for a day, how would you go about it?
(I've found several missed-optimization bugs in gcc, but I found them while working on a project where I examine assembly frequently; I have no idea how I'd go about looking for a compiler bug).
One way of actively looking for compiler bugs is using a tool like Csmith[1]. Another is to compile some known-difficult code (e.g. Libav[2]) with various combinations of (optimisation) flags until the test suite fails. Most of the bugs I've found were during routine testing of Libav.
While I don't consider missed optimisations bugs as such, they are easy to find. Simply compile some non-trivial function and look at the output. There's usually something that could be done better, especially if some exotic instruction can be used.
> While I don't consider missed optimisations bugs as such, they are easy to find. Simply compile some non-trivial function and look at the output.
Perhaps you'll give me a little credit :) if I mention that I found missed optimization bugs in extremely trivial functions. One of them involved gcc generating several completely useless stores even at -O3: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
I've been a developer for over 30 years, from mainframes to micros, and a hardware problem has been responsible for a bug in my code exactly zero times.
This is because OS devs (and compiler devs) have suffered them for you. I've run into many x86 bugs, both documented and undocumented. Have you done much assembly?
Usually bugs would involve unlikely sequences of operations or operations in unexpected states. But there have been very serious bugs involving wrong math (Intel Pentium) or cache failures leading to complete crashes (AMD Phenom). These two made it to production and were show-stoppers because OS devs could do very little about them (in the Phenom bug, they could, but with a noticeable performance hit). I don't think I've seen any production CISC chip completely free of bugs. OS devs have to do the testing and the circumventing.
I mean... typically x86 chips have DOZENS of documented bugs.
It's interesting how with significantly worse (or at the very least, comparable) complexity to manage, and uniformly horrific costs for repairing production bugs, the ASIC design industry and Intel/AMD in particular have managed to scrape by with something like <20 bugs between them in the past decade.
Perhaps we need to incentivize software developers with fear of execution, or something.
A recent x86 processor model has at least 20 "errata" that they'll tell you about; in the last decade they must have had hundreds. But most of them are just worked around so they don't affect you.
Modern x86 CPUs can trap arbitrary instructions to microcode. This means that most hardware bugs can be fixed with a firmware update that just slows down the cpu somewhat when it encounters the offending instruction.
There certainly are a lot of hardware bugs in cpus -- it's just that most of them get fixed before anyone outside the cpu company ever sees them.
The amount of effort spent on what is generally called "functional verification" is much higher for hardware than for software. Also, the specifications tend to be clearer and the source code size is smaller than you might imagine.
But really, I have only ever experienced one bug in a compiler (that I hadn't written) but it was such an odd experience, like the patient having Lupus.
Yeah, when I find myself drifting towards the 'maybe it is a bug in the compiler/OS/debugger' territory I know it is time to take a break from the debugging as it is rarely true[1]. Nice to see it occasionally happens :)
[1] I work on Visual Studio so I have found compiler/debugger bugs as I am generally using 'in development' bits, but far more often than not the bug turns out to be mine and mine alone :)
I had few bugs traced down to the kernel, on Windows. ALL of them were in 3rd party antivirus packages. In fact, it a machine blue-screened after installing our stuff, you can rest assured it had an antivirus and it was a Kaspersky.
Didn't Raymond Chen write once how most Windows crashes (of some version, during some year) were due to beta nvidia drivers installed by gamers to increase FPS?
Disagree. In fact, were I to come up with a rule of thumb, I'd say the opposite is true.
Want to find bugs in Sun's Java 6 compiler for X64 Linux , use annotations (yeah, I found one in their V30 release last week). Want to find bugs in MS' C++ compiler, write your own templates (this was a few years ago, maybe it's better?). The best programmers push the limit of their tools because they know what's "supposed to happen".
Poor programmers hit something that doesn't work, and just try something else, cause, well they're just trying shit. I would go so far to say that poor programmers, in fact, are unable to find compiler, optimizer, OS, or hardware bugs because, by definition, they probably don't have a firm handle on what's "supposed to happen".
I think what VBprogrammer meant is that thinking you've found a bug in a compiler / OS / CPU is often a warning sign you're a poor programmer. Often times a beginner will have a bug in their code that is too subtle for them to identify, so they end up attributing it to some external factor. Actually finding a bug in a compiler / OS / CPU is as you suggest likely a sign you're doing something advanced or unusual and therefore are perhaps more knowledgeable than most.
Yeah, that's exactly what I meant. Sorry if the sarcasm didn't quite carry.
I know these things can and do happen. I've come across one or two of these strange ones before, but too often I've seen people jump to the conclusion that someone / something else was to blame. Without any other real evidence other than that they have exhausted their shallow back of talent.
I remember scripting on a MUD that used a customized version of the standard MPROG[1] patch for ROM-based MUDs. Whoever originally "documented" it just grabbed some docs from some other ROM-based MUD that used their own customized version.
The documentation was a completely wrong for several years before I started programming there. Once I realized that the documentation was lying to me, I started methodically examining how things actually worked by writing lots of very simple test programs and documenting the actual behavior.
The others didn't care why it was broken. Most of them were just trying to build cool areas, they weren't really programmers at all. They would just tweak things until it appeared to work or they were frustrated enough to give up.
[1] I'm sure a lot of HN knows this, but to save the non-gamers the hassle of looking it up, a MUD is a type of text-based online game and an MPROG is a script used to control the actions of the characters in the game.
Way too complicated to identify concisely, and trying to release just now; in the end the wrong object was returned by a casting operator (very, very wrong object). Added and called a named method to do the cast (very same implementation) and all was well.
Depends on the compiler. When I took my "hardware for CS students" class as an undergrad, our big project for the semester was to write a CPU simulator in c, and the campus labs had just rolled out the upgrade from gcc from 2.x to 3.x. I had a bug in my program that I just couldn't isolate, and after about eight hours of chasing it, I realized that the compiler had allocated space for an integer variable right in the middle of an already-allocated array, so the two variables were stomping on each other. I changed the name of the integer variable and my problems went away.
Apparently I wasn't the only one, because within a week all the labs were back on GCC 2.95.
Actually, that sounds a lot like a bug in your build system. Did you have a custom Makefile (either hand written, or provided by a teacher)? If you didn't keep track of dependencies very carefully so that you always recompile all the .c files that depended on shared .h files when the .h files change, you can wind up with situations where different object files disagree on the layout of structures -- it could cause exactly the sort of problem you describe. Changing the name of the variable could force the file to be recompiled, thus appearing to solve the problem.
To further support your point, the rollback to gcc 2.95 happened across many linux distributions due to incompatible changes in the language that happened at the gcc 3.0 version.
Many distributions rolled back so that the default "stable" compiler matched the one they had to use to build the packages - i.e. common sense.
Once the packages were updated to deal with the gcc 3.x language changes, the compiler and packages started appearing together.
My first programming job was doing VB programming in Access 2 programs that had to run on Windows 3.1. (Yes, this was in the last millennium.) I kept on running into bugs that I could demonstrate were in Access, not in my code. It was very frustrating.
My next job was in Perl. I went several years before I found an actual bug in the language. Which then went unfixed for years because someone might be using it. Despite the fact that in every significant Perl code base that I've seen since, there are real bugs in the code that nobody has noticed which trace back to the bug that I found. Why do you ask whether I am bitter?
So your suggestion failed glaringly for me when I was using VB, but since has worked much better.
It did not use Visual Basic for Applications (VBA), but it did use Access Basic. Which was a dialect of Visual Basic.
Access 95 had the ability to upgrade from Access 2, and that included the ability to migrate from Access Basic to VBA. The tool was not flawless (very little from Microsoft is), but mostly worked pretty well.
"Finding" them is a warning signal of a poor programmer if and only if the scare quotes mean that they have not actually researched the problem sufficiently to prove that the problem is in one of those areas.
These legitimate bugs do exist, and some of us have a talent for finding them with annoying frequency.
Less experienced programmers often "want" to find bugs in the compiler/OS/whatever because that way it's not their fault, and they lack the skill to track down difficult problems in their own code. More experienced programmers realize that finding bugs outside of your own code is often a disaster because frequently there's nothing you can do to fix it.
Nah. Finding a bug in the dev stack is a sign that you're running on the edge. I've encountered one or two myself. I sent one in and the company wrote back and said "yep, it's a bug, fixed next build".
Now, blaming without reproduction on the dev stack is a sign of a lamer. :-)
If that were true then there would be no need to ever release new versions of these things!
I have personally found bugs in Linux (kernel, libc), Oracle, various JVMs, etc, usually cases in which algorithms optimized for "normal" loads became pathological under extreme load. It's much more common than perhaps you'd think.
When I'm working, it's always a bug in the compiler, kernel or hardware. The semicolon was implied. Just give me a minute to work around the compiler bug.
I have only twice thought one of my problems was due to a compiler bug, and I was right one of those times (and that was because my company was stuck with a 4-year old version of the compiler; The bug had already been fixed in the latest version.)
Second, I would say that over the course of my 10 year career in managing developers, I've heard many, many times that the bug was in the kernel, or in the hardware, or in the complier, or in the other lower level thing the developer had no control over. This has been the correct diagnosis exactly once. If I had to guess, I would say about 5%.