Shared posts

11 Jun 13:40

Seagate Fires Salvo in Hybrid HDD Campaign

by Tom Coughlin

Just a few days after Seagate Technology said that they would discontinue their 7,200 RPM 2.5-inch HDDs they announced three new solid state hybrid drives (SSHD, Seagate’s name for hybrid HDDs that contain flash memory as well as magnetic disk storage).  These are a new 7-mm high (thin), 500 GB, 2.5-inch, 5,400 RPM SSHD; a 9.5-mm (regular thickness), 1 TB,  2.5-inch, 5,400 RPM SSHD; as well as the industry’s first announced desktop SSHD with a magnetic storage capacity of 2 TB.  The announced SSHDs all have 8 GB of flash memory in addition to the magnetic storage capacity.

It is interesting that the 2.5-inch and 3.5-inch SSHDs from Seagate also have 64 MB of DRAM cache.  32 MB of cache or less was standard in prior generations of Seagate SSHDs.  The Seagate SSHDs use an adaptive memory caching technology to determine what data goes in the flash memory and what on the magnetic storage.  The new generation of adaptive memory appears to support some host writes as well as frequently accessed read data as in past SSHDs.  These cached writes improve some system response and reduce energy use somewhat.  Cached writes are always written to the magnetic storage so the data isn’t lost if power is suddenly turned off.

According to an article in The Tech Report (http://techreport.com/news/24454/seagate-sshds-cache-writes-in-dual-mode-nand) writing is done on allocated single level cell (SLC) flash memory (which has greater write durability than multilevel cell (actually two bits per cell or MLC).  The flash memory used for read caching is MLC flash memory that is common in SSDs and other flash memory consumer products.  The cached writes and boot data are stored on the SLC flash memory.  Providing this write caching apparently increases overall computer system performance above that of the prior generation of SSHD products.

Write caching was one of the features that was supposed to be enabled in the first generation of hybrid HDDs in 2008 with support from the Vista operating system.  Dependence on support from the operating system to enable hybrid HDD capabilities was largely responsible for problems with these early hybrid HDDs.  The current generation of hybrid HDDs, such as the Seagate SSHD does not depend upon the computer system operating system for their operation.

Western Digital as well as Toshiba, the other HDD manufacturers, have hybrid HDD as well.  It will be interesting to see how they will respond to Seagate’s new SSHD introductions, including a desktop product.  Both Seagate and Western Digital were showing hybrid HDDs with 36 GB and 24 GB of flash memory storage capacity in January 2013.  It is possible that we could see higher flash capacity hybrid HDDs later this year.

Flash flash memory is playing a larger role in computer applications.  Hybrid HDDs provide a less expensive way to boost overall computer performance while providing larger storage capacities for an attractive price than a pure SSD computer.  Providing this hybrid storage functionality in thin notebook computers and as a system upgrade to desktop computers is probably one of the best chances for improving the sales of HDDs in computers in 2013 and ultimately may be one of the keys for renewing computer growth amid wide-spread tablet adoption.

08 Jun 13:04

Beautiful hand-carved woodcut print of The Moon

by David Pescovitz
Themoon woodcut v2

Tugboat Printshop hand-carved this lovely drawing of "The Moon" in 3/4" birch plywood and then pulled 200 limited edition 33.5" x 30.5" prints. You can read more about the intricate process at their site. "The Moon" (Thanks, Jason Tester!)

    


07 Jun 14:13

Schneier: what we need the whistleblowers to tell us about America's surveillance apparatus

by Cory Doctorow

Bruce Schneier writes in The Atlantic to comment on the leaked court order showing that the NSA has been secretly engaged in bulk domestic surveillance, recording who everyone is talking to, when, for how long, and where they are when they do. Schneier points out -- as many have -- that this is the tip of the iceberg, and lays out a set of government secrets that we need whistleblowers to disclose in order to grasp the full scope of the new, total surveillance state:

We need details on the full extent of the FBI's spying capabilities. We don't know what information it routinely collects on American citizens, what extra information it collects on those on various watch lists, and what legal justifications it invokes for its actions. We don't know its plans for future data collection. We don't know what scandals and illegal actions -- either past or present -- are currently being covered up.

We also need information about what data the NSA gathers, either domestically or internationally. We don't know how much it collects surreptitiously, and how much it relies on arrangements with various companies. We don't know how much it uses password cracking to get at encrypted data, and how much it exploits existing system vulnerabilities. We don't know whether it deliberately inserts backdoors into systems it wants to monitor, either with or without the permission of the communications-system vendors.

And we need details about the sorts of analysis the organizations perform. We don't know what they quickly cull at the point of collection, and what they store for later analysis -- and how long they store it. We don't know what sort of database profiling they do, how extensive their CCTV and surveillance-drone analysis is, how much they perform behavioral analysis, or how extensively they trace friends of people on their watch lists.

We don't know how big the U.S. surveillance apparatus is today, either in terms of money and people or in terms of how many people are monitored or how much data is collected. Modern technology makes it possible to monitor vastly more people -- yesterday's NSA revelations demonstrate that they could easily surveil everyone -- than could ever be done manually.

What We Don't Know About Spying on Citizens: Scarier Than What We Know

    


07 Jun 13:58

Promise or problem? A debate on nuclear power

[caption id="attachment_1583" align="alignleft" width="300" caption="The association of plutonium with nuclear weapons often leads to a greatly exaggerated perception of its risks. Here, a disk of plutonium which emits low-energy alpha particles is held in the palm of a hand (Image: bubblews)"] [/caption] Yesterday's post about nuclear power sparked a minor but enlightening exchange on Twitter whose culmination is a thoughtful perspective by Nicholas Evans (Twitter handle @neva9257), a research associate at Charles Sturt University in Canberra. Earlier I had posted Nicholas's response in this space but he indicated that he was more comfortable posting it on his own blog since he thought that my editorial comments on his perspective inferred more agreement than what he believes it implies. Below is my response to his post. [More]
07 Jun 13:38

What "Worse is Better vs The Right Thing" is really about

by Yossi Kreinin

I thought about this one for a couple of years, then wrote it up, and left it untouched for another couple of years.

What prompted me to publish it now - at least the first, relatively finished part - is Steve Yegge's post, an analogy between the "liberals vs conservatives" debate in politics and some dichotomies in the professional worldviews of software developers. The core of his analogy is risk aversion: conservatives are more risk averse than liberals, both in politics and in software.

I want to draw a similar type of analogy, but from a somewhat different angle. My angle is, in politics, one thing that people view rather differently is the role of markets and competition. Some view them as mostly good and others as mostly evil. This is loosely aligned with the "right" and the "left" (with the caveat that the political right and left are very overloaded terms).

So what does this have to do with software? I will try to show that the disagreement about markets is at the core of the conflict presented in the classic essay, The Rise of Worse is Better. The essay presents two opposing design styles: Worse Is Better and The Right Thing.

I'll claim that the view of economic evolution is what underlies the Worse Is Better vs The Right Thing opposition - and not the trade-off between design simplicity and other considerations as the essay states.

So the essay says one thing, and I'll show you it really says something else. Seriously, I will.

And then I'll tell you why it's important to me, and why - in Yegge's words - "this conceptual framework became one of the most important tools in my toolkit" (though of course each of us is talking about his own analogy).

Specifically, I came to think that you can be for evolution or against it, and I'm naturally inclined to be against it, and once I got that, I've been trying hard to not overdo it.

***

Much of the work on technology is done in a market context. I mean "market" in a relatively broad sense - not just proprietary for-profit developments, but situations of competition. Programs compete for users, specs compete for implementers, etc.

Markets and competition have a way to evoke strong and polar opinions in people. The technology market and technical people are no exception, including the most famous and highly regarded people. Here's what Linus Torvalds has to say about competition:

Don't underestimate the power of survival of the fittest. And don't ever make the mistake that you can design something better than what you get from ruthless massively parallel trial-and-error with a feedback cycle. That's giving your intelligence much too much credit.

And here's what Alan Kay has to say:

…if there’s a big idea and you have deadlines and you have expedience and you have competitors, very likely what you’ll do is take a low-pass filter on that idea and implement one part of it and miss what has to be done next. This happens over and over again.

Linus Torvalds thus views competition as a source of progress more important than anyone's ability to come up with bright ideas. Alan Kay, on the contrary, perceives market constraints as a stumbling block insurmountable for the brightest idea.

(The fact that Linux is vastly more successful than Smalltalk in "the market", whatever market one considers, is thus fully aligned with the creators' values.)

Incidentally, Linux was derived from Unix, and Smalltalk was greatly influenced by Lisp. At one point, Lisp and Unix - the cultures and the actual software - clashed in a battle for survival. The battle apparently followed a somewhat one-sided, Bambi meets Godzilla scenario: cheap Unix boxes quickly replaced sophisticated Lisp-based workstations, which became collectible items.

The aftermath is bitterly documented in The UNIX-HATERS Handbook, groundbreaking in its invention of satirical technical writing as a genre. The book's take on the role of evolution under market constraints is similar to Alan Kay's and the opposite of Linus Torvalds':

Literature avers that Unix succeeded because of its technical superiority. This is not true. Unix was evolutionarily superior to its competitors, but not technically superior. Unix became a commercial success because it was a virus. Its sole evolutionary advantage was its small size, simple design, and resulting portability.

The "Unix Haters" see evolutionary superiority as very different from technical superiority - and unlikely to coincide with it. The authors' disdain for the products of evolution isn't limited to development driven by economic factors, but extends to natural selection:

Once the human genome is fully mapped, we may discover that only a few percent of it actually describes functioning humans; the rest describes orangutans, new mutants, televangelists, and used computer sellers.

Contrast that to Linus' admiration of the human genome:

we humans have never been able to replicate  something more complicated than what we ourselves are, yet natural selection did it without even thinking.

The UNIX-HATERS Handbook presents in an appendix Richard P. Gabriel's famous essay, The Rise of Worse Is Better. The essay presents what it calls two opposing software philosophies. It gives them names - The Right Thing for the philosophy underlying Lisp, and Worse Is Better for the one behind Unix - names I believe to be perfectly fitting.

The essay also attempts to capture the key characteristics of these philosophies - but in my opinion, it focuses on non-inherent embodiments of these philosophies rather than their core. The essay claims it's about the degree of importance that different designers assign to simplicity. I claim that it's ultimately not about simplicity at all.

I thus claim that the essay discusses real things and gives them the right names, but the wrong definitions - a claim somewhat hard to defend. Here's my attempt to defend it.

Worse is Better - because it's simpler?

Richard Gabriel defines "Worse Is Better" as a design style focused on simplicity, at the expense of completeness, consistency and even correctness. "The Right Thing" is outlined as the exact opposite: completeness, consistency and correctness all trump simplicity.

First, "is it real"? Does a conflict between two philosophies really exist - and not just a conflict between Lisp and Unix? I think it does exist - that's why the essay strikes a chord with people who don't care much about Lisp or Unix. For example, Jeff Atwood

…was blown away by The Rise of "Worse is Better", because it touches on a theme I've noticed emerging in my blog entries: rejection of complexity, even when complexity is the more theoretically correct approach.

This comment acknowledges the conflict is real outside the original context. It also defines it as a conflict between simplicity and complexity, similarly to the essay's definition - and contrary to my claim that "it's not about simplicity".

But then examples are given, examples of "winners" at the Worse Is Better side - and suddenly x86 shows up:

The x86 architecture that you're probably reading this webpage on is widely regarded as total piece of crap. And it is. But it's a piece of crap honed to an incredibly sharp edge.

x86 implementations starting with the out-of-order implementations from the 90s are indeed "honed to an incredibly sharp edge". But x86 is never criticized because of its simplicity - quite the contrary, it's criticized precisely because an efficient implementation can not be simple. This is why the multi-billion-dollar "honing" is necessary in the first place.

Is x86 an example of simplicity? No.

Is it a winner at the Worse is Better side? A winner - definitely. At the "Worse is Better" side - yes, I think I can show that.

But not if Worse Is Better is understood as "simplicity trumps everything", as the original essay frames it.

Worse is Better - because it's more compatible?

Unlike Unix and C, the original examples of "Worse Is Better", x86 is not easy to implement efficiently - it is its competitors, RISC and VLIW, that are easy to implement efficiently.

But despite that, we feel that x86 is "just like Unix". Not because it's simple, but because it's the winner despite being the worse competitor. Because the cleaner RISC and VLIW ought to be The Right Thing in this one.

And because x86 is winning by betting on evolutionary pressures.

Bob Colwell, Pentium's chief architect, was a design engineer at Multiflow - an early VLIW company which was failing, prompting him to join Intel to create their out-of-order x86 implementation, P6. In The Pentium Chronicles, he gives simplicity two thumbs up, acknowledges complexity as a disadvantage of x86 - and then explains why he bet on it anyway:

Throughout the 1980s, the RISC/CISC debate was boiling. RISC's general premise was that computer instruction sets … had become increasingly complicated and counterproductively large and arcane. In engineering, all other things being equal, simpler is always better, and sometimes much better.

…Some of my engineering friends thought I was either masochistic or irrational. Having just swum ashore from the sinking Multiflow ship, I immediately signed on to a "doomed" x86 design project. In their eyes, no matter how clever my design team was, we were inevitably going to be swept aside by superior technology. But … we could, in fact, import nearly all of RISC's technical advantages to a CISC design. The rest we could overcome with extra engineering, a somewhat larger die size, and the sheer economics of large product shipment volume. Although larger die sizes … imply higher production cost and higher power dissipation, in the early 1990s … easy cooling solutions were adequate. And although production costs were a factor of die size, they were much, much more dependent on volume being shipped, and in that arena, CISCs had an enormous advantage over their RISC challengers.

…because of having more users ready to buy them to run their existing software faster.

x86 is worse - as it's quite clear now when, in cell phones and tablets, easy cooling solutions are not adequate, and the RISC processor ARM wins big. But in the 1990s, because of compatibility issues, x86 was better.

Worse is Better, even if it isn't simpler - when The Right Thing is right technically, but not economically.

Worse is Better - because it's quicker?

Interestingly, Jamie Zawinski, who first spread the Worse is Better essay, followed a path somewhat similar to Colwell's. He "swum ashore" from Richard Gabriel's Lucid Inc., where he worked on what would become XEmacs, to join Netscape (named Mosiac at the time) and develop their very successful web browser. Here's what he said about the situation at Mosaic:

We were so focused on deadline it was like religion. We were shipping a finished product in six months or we were going to die trying. …we looked around the rest of the world and decided, if we're not done in six months, someone's going to beat us to it so we're going to be done in six months.

They didn't have to bootstrap the program on a small machine as in the Unix case. They didn't have to be compatible with an all-too-complicated previous version as in the x86 case. But they had to do it fast.

Yet another kind of economic constraint meaning that something else has to give. "We stripped features, definitely". And the resulting code was, according to jwz - not simple, but, plainly, not very good:

It's not so much that I was proud of the code; just that it was done. In a lot of ways the code wasn't very good because it was done very fast. But it got the job done. We shipped - that was the bottom line.

Worse code is Better than not shipping on time - Worse is Better in its plainest form. And nothing about simplicity.

Here's what jwz says about the Worse is Better essay - and, like Jeff Atwood, he gives a summary that doesn't summarize the actual text - but summarizes "what he feels it should have been":

…you should read it. It explains why mediocrity has better survival characteristics than perfection…

The essay doesn't explain that - the essay's text explains why simple-but-wrong has better survival characteristics than right-but-complex.

But as evidenced by jwz's and Atwood's comments, people want it to explain something else - something about perfection (The Right Thing) versus less than perfection (Worse is Better).

Worse is Better evolutionary

And it seems that invariably, what forces you to be less than perfection, what elects worse-than-perfect solutions, what "thinks" they're better, is economic, evolutionary constraints.

Economic constraints is what may happen to select for simplicity (Unix), compatibility (x86), development speed (Netscape) - or any other quality that might result in an otherwise worse product.

Just like Alan Kay said - but contrary to the belief of Linus Torvalds, the belief that ultimately, the result of evolution is actually better than anything that could have been achieved through design without the feedback of evolutionary pressure.

From this viewpoint, Worse Is Better ends up actually better than any real alternative - whereas from Alan Kay's viewpoint, Worse Is Better is actually worse than what's achievable.

(A bit convoluted, not? In fact, Richard Gabriel wrote several follow-ups, not being able to decide if Worse Is Better was actually better, or actually worse. I'm not trying to help decide that - just to show what makes one think it's actually better or worse.)

***

That's the first part - I hope to have shown that your view of evolution has a great effect on your design style.

If evolution is in the center of your worldview, if you think about viability as more important than perfection in any area, then you'll tend to design in a Worse Is Better style.

If you think of evolutionary pressure as an obstacle, an ultimately unimportant, harmful distraction on the road to perfection, then you'll prefer designs in The Right Thing style.

But why do people have a different view of evolution in the first place? Is there some more basic assumption underlying this difference? I think I have more to say about this, though it's not in nearly as finished form as the first part, and I might write about it all in the future.

Meanwhile, I want to conclude this first part with some thoughts on why it all matters personally to me.

I'm a perfectionist, by nature, and compromise is hard for me. Like many developers good enough to be able to implement much of their own ambitious ideas, I turned my professional life into a struggle for perfection. I wasn't completely devoid of common sense, but I did things that make me shiver today.

I wrote heuristic C++ parsers. I did 96 bit integer arithmetic in assembly. I implemented some perverted form of thread migration on the bare metal, without any kind of OS or thread support. I did many other things that I'm too ashamed to admit.

None of it was really needed, not if you asked me today. It was "needed" in the sense of being a step towards a too-good-for-my-own-good, "perfect" solution. Today I'd realize that this type of perfection is not viable anyway (in fact, none of these monstrosities survived in the long run.) I'd choose a completely different path that wouldn't require any such complications in the first place.

But my stuff shipped. I was able to make it work.You don't learn until you fail - at least I didn't. Perfectionists are stubborn.

Then at one point I failed. I had to throw out months worth of code, having realized that it's not going to fly.

And it so happened that I was reading Unix-Haters, and I was loving it, because I'm precisely the type of perfectionist that these people are, or close enough to identify with them. And there was this essay there about Worse Is Better vs The Right Thing.

And I was reading it when I wrote the code soon to be thrown out, and I was reading it when I decided to throw it out and afterwards.

And I suddenly started thinking, "This is not going to work, this type of thing. With this attitude, if you want it all, consistency, completeness, correctness - you'll get nothing, because you will fail, completely. You're too dumb, I mean I am, also not enough time. You have to choose, you're not going to get it all so you better decide what you want the most and aim at that."

If you read the Unix-Haters, you'll notice a lot of moral outrage - perfectionists have that, moral outrage at something imperfect. Especially at someone who knowingly chooses to aim at less than perfection. Especially if it's due to the ulterior motive of wanting to succeed.

And I felt a counter-outrage, for the first time. "What do you got to show, you got nothing. What good are your ideals if you end up dead? Dead bodies smell bad to us for a reason. Technical superiority without evolutionary superiority? Evolutionary inferiority means "dead". How can "dead" be technically superior? What have the dead ever done for us?"

It was huge, for me. I mean, it took a few years to truly sink in, but that was the start. I've never done anything Right since. And I've been professionally happy ever after. I guess it's a kind of "having swum ashore".

07 Jun 13:35

"It's done in hardware so it's cheap"

by Yossi Kreinin

It isn't.

This is one of these things that look very obvious to me, to the point where it seems not worth discussing. However, I've heard the idea that "hardware magically makes things cheap" from several PhDs over the years. So apparently, if you aren't into hardware, it's not obvious at all.

So why doesn't "hardware support" automatically translate to "low cost"/"efficiency"? The short answer is, hardware is an electric circuit and you can't do magic with that, there are rules. So what are the rules? We know that hardware support does help at times. When does it, and when doesn't it?

To see the limitations of hardware support, let's first look at what hardware can do to speed things up. Roughly, you can really only do two things:

  1. Specialization - save dispatching costs in speed and energy.
  2. Parallelization - save time, but not energy, by throwing more hardware at the job.

Let's briefly look at examples of these two speed-up methods - and then some examples where hardware support does nothing for you, because none of the two methods helps. We'll only consider run time and energy per operation, ignoring silicon area (considering a third variable just makes it too hairy).

I'll also discuss the difference between real costs of operations and the price you pay for these operations, and argue that in the long run, costs are more stable and more important than prices.

Specialization: cheaper dispatching

If you want to extract bits 3 to 7 of a 32-bit word and then multiply them by 13 - let's say an encryption algorithm requires this - you can have an instruction doing just that. That will be faster and use less energy than, say, using bitwise AND, shift & multiplication instructions.

Why - what costs were cut out? The costs of dispatching individual operations - circuitry controlling which operation is executed, where the inputs come from and where the outputs go.

Specialization can be taken to an extreme. For instance, if you want a piece of hardware doing nothing but JPEG decoding, you can bring dispatching costs close to zero by having a single "instruction" - "decode a JPEG image". Then you have no flexibility - and none of the "overhead" circuitry found in more flexible machines (memory for storing instructions, logic for decoding these instructions, multiplexers choosing the registers that inputs come from based on these instructions, etc.)

Before moving on, let's look a little closer at why we won here:

  • We got a speed-up because the operations were fast to begin with - so dispatching costs dominated. With specialization, we need 4 wires connected directly to bits 3 to 7 that have tiny physical delay - just the time it takes the signal to travel to a nearby multiplier-by-13. Without specialization, we'd use a shifter shifting by a configurable amount of bits - 3 in our case but not always - which is a bunch of gates introducing a much larger delay. On top of that, since we'd be using several such circuits communicating through registers (let's say we're on a RISC CPU), we'd have delays due to reading and writing registers, delays due to selecting registers from a large register file, etc. With all this taken out by having a specialized instruction, no wonder we're seeing a big speed-up.
  • Likewise, we'll see lower energy consumption because the operations didn't require a lot of energy to begin with. Roughly, most of the energy is consumed when a signal value changes from 1 to 0 or back. When we use general-purpose instructions, most of the gate inputs & outputs and most flip-flops changing their values are those implementing the dispatching. When we use a specialized instruction, most of the switching is gone.

This means that, unsurprisingly, there's a limit to efficiency - the fundamental cost of the operations we need to do, which can't be cut.

When the operations themselves are costly enough - for instance, memory access or floating point operations - then their cost dominates the cost of dispatching. So specialized instructions that cut dispatching costs will give us little or nothing.

Parallelization: throwing more hardware at the job

What to do when specialization doesn't help? We can simply have N processors instead of one. For the parts that can be parallelized, we'll cut the run time by N - but spend the same amount of energy. So things got faster but not necessarily cheaper. A fixed power budget limits parallelization - as does a fixed budget of, well, money (the price of a 1000-CPU rack is still not trivial today).

[Why have multicore chips if it saves no energy? Because a multicore chip is cheaper than many single core ones, and because, above a certain frequency, many low-frequency cores use less energy than few high-frequency ones.]

We can combine parallelization with specialization - in fact it's done very frequently. Actually a JPEG decoder mentioned above would do that - a lot of its specialized circuits would execute in parallel.

Another example is how SIMD or SIMT processors broadcast a single instruction to multiple execution units. This way, we get only a speed-up, but no energy savings at the execution unit level: instead of one floating point ALU, we now have 4, or 32, etc. We do, however, get energy savings at the dispatching level - we save on program memory and decoding logic. As always with specialization, we pay in flexibility - we can't have our ALUs do different things at the same time, as some programs might want to.

Why do we see more single-precision floating point SIMD than double-precision SIMD? Because the higher the raw cost of operations, the less we save by specialization, and SIMD is a sort of specialization. If we have to pay for double-precision ALUs, why not put each in a full-blown CPU core? That way, at least we get the most flexibility, which means more opportunities to actually use the hardware rather than keeping it idle.

(It's really more complicated than that because SIMD can actually be a more useful programming model than multiple threads or processes in some cases, but we won't dwell on that.)

What can't be done

Now that we know what can be done - and there really isn't anything else - we basically already also know what can't be done. Let's look at some examples.

Precision costs are forever

8-bit integers are fundamentally more efficient than 32-bit floating point, and no hardware support for any sort of floating point operations can change this.

For one thing, multiplier circuit size (and energy consumption) is roughly quadratic in the size of inputs. IEEE 32b floating point numbers have 23b mantissas, so multiplying them means a ~9x larger circuit than an 8×8-bit multiplier with the same throughput. Another cost, linear in size, is that you need more memory, flip-flops and wires to store and transfer a float than an int8.

(People are more often aware of this one because SIMD instruction sets usually have fixed-sized registers which can be used to keep, say, 4 floats or 16 uint8s. However, this makes people underestimate the overhead of floating point as 4x - when it's more like 9x if you look at multiplying mantissas, not to mention handling exponents. Even int16 is 4x more costly to multiply than int8, not 2x as the storage space difference makes one guess.)

We design our own chips, and occasionally people say that it'd be nice to have a chip with, say, 256 floating point ALUs. This sounds economically nonsensical - sure it's nice and it's also quite obvious, so if nobody makes such chips at a budget similar to ours, it must be impossible, so why ask?

But actually it's a rather sensible suggestion, in that you can make a chip with 256 ALUs that is more efficient than anything on the market for what you do, but not flexible enough to be marketed as a general-purpose computer. That's precisely what specialization does.

However, specialization only helps with operations which are cheap enough to begin with compared to the cost of dispatching. So this can work with low-precision ALUs, but not with high-precision ALUs. With high-precision ALUs, the raw cost of operations would exceed our power budget, even if dispatching costs were zero.

Memory indirection costs are forever

I mentioned this in my old needlessly combative write-up about "high-level CPUs". There's this idea that we can have a machine that makes "high-level languages" run fast, and that they're really only slow because we're running on "C machines" as opposed to Lisp machines/Ruby machines/etc.

Leaving aside the question of what "high-level language" means (I really don't find it obvious at all, but never mind), object-orientation and dynamic typing frequently result in indirection: pointers instead of values and pointers to pointers instead of pointers. Sometimes it's done for no apparent reason - for instance, Erlang strings that are kept as linked lists of ints. (Why do people even like linked lists as "the" data structure and head/tail recursion as "the" control structure? But I digress.)

This kind of thing can never be sped up by specialization, because memory access fundamentally takes quite a lot of time and energy, and when you do p->a, you need one such access, and when you do p->q->a, you need two, hence you'll spend twice the time. Having a single "LOAD_LOAD" instruction instead of two - LOAD followed by a LOAD - does nothing for you.

All you can do is parallelization - throw more hardware at the problem, N processors instead of one. You can, alternatively, combine parallelization with specialization, similarly to N-way floating point SIMD that's somewhat cheaper than having N full-blown processors. For example, you could have several load-store units and several cache banks and a multiple-issue processor. Than if you had to run p1->q1->a and somewhere near that, p2->q2->b, and the pointers point into different banks, some of the 4 LOADs would end up running in parallel, without having several processors.

But, similarly to low-precision math being cheaper whatever the merits of floating point SIMD, one memory access is always twice cheaper than two despite the merits of cache banking and multiple issue. Specifically, doubling the memory access throughput roughly doubles the energy cost. This can sometimes be better than simply using two processors, but it's a non-trivial cost and will always be.

A note about latency

We could discuss other examples but these two are among the most popular - floating point support is a favorite among math geeks, and memory indirection support is a favorite among language geeks. So we'll move on to a general conclusion - but first, we should mention the difference between latency costs and throughput costs.

In our two examples, we only discussed throughput costs. A floating point ALU with a given throughout uses more energy than an int8 ALU. Two memory banks with a given throughput use about twice the energy of a single memory bank with half the throughput. This, together with the relatively high costs of these operations compared to the costs of dispatching them, made us conclude that we have nothing to do.

In reality, the high latency of such heavyweight operations can be the bigger problem than our inability to increase their throughput without paying a high price in energy. For example, consider the instruction sequence:

c = FIRST(a,b)
e = SECOND(c,d)

If FIRST has a low latency, then we'll quickly proceed to SECOND. If FIRST has a high latency, then SECOND will have to wait for that amount of time, even if FIRST has excellent throughput. Say, if FIRST is a LOAD, being able to issue a LOAD every cycle doesn't help if SECOND depends on the result of that LOAD and the LOAD latency is 5 cycles.

A large part of computer architecture is various methods for dealing with these latencies - VLIW, out-of-order, barrel processors/SIMT, etc. These are all forms of parallelization - finding something to do in parallel with the high-latency instruction. A barrel processor helps when you have many threads. An out-of-order processor helps when you have nearby independent instructions in the same thread. And so on.

Just like having N processors, all these parallelization methods don't lower dispatching costs - in fact, they raise them (more registers, higher issue bandwidth, tricky dispatching logic, etc.) The processor doesn't become more energy efficient - you get more done per unit of time but not per unit of energy. A simple processor would be stuck at the FIRST instruction, while a more clever one would find something to do - and spend the energy to do it.

So latency is a very important problem with fundamentally heavyweight operations, and machinery for hiding this latency is extremely consequential for execution speed. But fighting latency using any of the available methods is just a special case of parallelization, and in this sense not fundamentally different from simply having many cores in terms of energy consumed.

The upshot is that parallelization, whether it's having many cores or having single-core latency-hiding circuitry, can help you with execution speed - throughput per cycle - but not with energy efficiency - throughput per watt.

The latency of heavyweight stuff is important and not hopeless; its throughput is important and hopeless.

Cost vs price

"But on my GPU, floating point operations are actually as fast as int8 operations! How about that?"

Well, a bus ticket can be cheaper than the price of getting to the same place in a taxi. The bus ticket will be cheaper even if you're the only passenger, in which case the real cost of getting from A to B in a bus is surely higher than the cost of getting from A to B in a taxi. Moreover, a bus might take you there more quickly if there are lanes reserved for buses that taxis are not allowed to use.

It's basically a cost vs price thing - math and physics vs economics and marketing. The fundamentals only say that a hardware vendor always can make int8 cheaper than float - but they can have good reasons not to. It's not that they made floats as cheap as int8 - actually, they made int8 as expensive as floats in terms of real costs.

Just like you going alone in a bus designed to carry dozens of people is an inefficient use of a bus, using float ALUs to process what could be int8 numbers is an inefficient use of float ALUs. Similarly, just like transport regulations can make lanes available for buses but not cars, an instruction set can make fetching a float easy but make fetching a single byte hard (no load byte/load byte with sign extension instructions). But cars could use those lanes - and loading bytes could be made easy.

As a passenger, of course you will use the bus and not the taxi, because economics and/or marketing and/or regulations made it the cheaper option in terms of price. Perhaps it's so because the bus is cheaper overall, with all the passengers it carries during rush hours. Perhaps it's so because the bus is a part of the contract with your employer - it's a bus carrying employees towards a nearby something. And perhaps it's so because the bus is subsidized by the government. Whatever the reason, you go ahead and use the cheaper bus.

Likewise, as a programmer, if you're handed a platform where floating point is not more expensive or even cheaper than int8, it is perhaps wise to use floating point everywhere. The only things to note are, the vendor could have given you better int8 performance; and, at some point, a platform might emerge that you want to target and where int8 is much more efficient than float.

The upshot is that it's possible to lower the price of floating point relative to int8, but not the cost.

What's more "important" - prices or costs?

Prices have many nice properties that real costs don't have. For instance, all prices can be compared - just convert them all to your currency of choice. Real costs are hard to compare without prices: is 2x less time for 3x more energy better or worse?

In any discussion about "fundamental real costs", there tend to be hidden assumptions about prices. For example, I chose to ignore area in this discussion under the assumption that area is usually less important than power. What makes this assumption true - or false - is the prices fabs charge for silicon production, the sort of cooling solutions that are marketable today (a desktop fan could be used to cool a cell phone but you couldn't sell that phone), etc. It's really hard to separate costs from prices.

Here's a computer architect's argument to the effect of "look at prices, not costs":

While technical metrics like performance, power, and programmer effort make up for nice fuzzy debates, it is pivotal for every computer guy to understand that “Dollar” is the one metric that rules them all. The other metrics are just sub-metrics derived from the dollar: Performance matters because that’s what customers pay for; power matters because it allows OEMs to put cheaper, smaller batteries and reduce people’s electricity bills; and programmer effort matters because it reduces the cost of making software.

I have two objections: that prices are the effect, not the cause, and that prices are too volatile to commit to memory as a "fundamental".

Prices are the effect in the sense that, customers pay for performance because it matters, not "performance matters because customers pay for it". Or, more precisely - customers pay for performance because it matters to them. As a result - because customers pay for it - performance matters to vendors. Ultimately, the first cause is that performance matters, not that it sells.

The other thing about prices is that they're rather jittery. Even a price index designed for stability such as S&P 500 is jumping up and down like crazy. In a changing world, knowledge about costs has a longer shelf life than knowledge about prices.

For instance, power is considered cheap for desktops but expensive for servers and really expensive for mobile devices. In reality, desktops likely consume more power than servers, there being more desktops than servers. So the real costs are not like the prices - and prices change; the rise of mobile computing means rising prices for power-hungry architectures.

It seems to me that, taking the long view, the following makes sense:

  • It's best to reason in costs and project them to the relevant prices - not forget the underlying costs and "think in prices", so as to not get into habits that will become outdated when prices change.
  • If you see a high real cost "hidden" by contemporary prices, it's a good bet to assume that at some point in the future, prices will shift so that the real cost will rear its ugly head.

For example, any RISC architecture - ARM, MIPS, PowerPC, etc. - is fundamentally cheaper than, specifically, x86, in at least two ways: hardware costs - area & power - and the costs of developing said hardware. [At least so I believe; let's say that it's not as significant in my view than my other more basic examples, and I might be wrong and I'm only using this as an illustration.]

In the long run, this spells doom for the x86, whatever momentum it otherwise has at any point in time - software compatibility costs, Intel's manufacturing capabilities vs competitors capabilities, etc. Mathematically or physically fundamental costs will, in the long run, trump everything else.

In the long run, there is no x86, no ARM, no Windows, no iPhone, etc. There are just ideas. We remember ideas originating in ancient Greece and Rome, but no products. Every product is eventually outsold by another product. Old software is forgotten and old fabs rot. But fundamentals are forever. An idea that is sufficiently more costly fundamentally than a competing idea can not survive.

This is why I disagree with the following quote by Bob Colwell - the chief architect of the Pentium Pro (BTW, I love the interview and intend to publish a summary of the entire 160-something page document):

…you might say that CISC only stayed viable because Intel was able to throw a lot of money and people at it, and die size, bigger chips and so on.

In that sense, RISC still was better, which is what was claimed all along. And I said you know, there's point to be made there. I agree with you that Intel had more to do to stay competitive. They were starting a race from far behind the start line. But if you can throw money at a problem then, it's not really so fundamental technologically, is it? We look for more deep things than that, so if all the RISC/CISC thing amounted to was, you had a slight advantage economically, well, that's not as profound as it seemed back in the 80s was it?

Well, here's my counter-argument and it's not technical. The technical argument would be, CISC is worse, to the point where Intel's 32nm Medfield performs about as well as ARM-based 40nm chips in a space where power matters. Which can be countered with an economical argument - so what, Intel does have a better manufacturing ability so who cares, they still compete.

But my non-technical argument is, sure, you can be extremely savvy business-wise, and perhaps, if Intel realized early on how big mobile is going to be, they'd make a good enough x86-based offering back then and then everyone would have been locked out due to software compatibility issues and they'd reign like they reign in the desktop market.

But you can't do that forever. Every company is going to lose to some company at some point or other because you only need one big mistake and you'll make it, you'll ignore a single emerging market and that will be the end. Or, someone will outperform you technically - build a better fab, etc. If an idea is only ("only"?) being dragged into the future kicking and screaming by a very business-savvy and technically excellent company, then the idea has no chance.

The idea that will win is the idea that every new product will use. New products always beat old products - always have.

And nobody, nobody at all has made a new CISC architecture in ages. Intel will lose to a company or companies making RISC CPUs because nobody makes anything else - and it has to lose to someone. Right now it seems like it's ARM but it doesn't matter how it comes out in this round. It will happen at some point or other.

And if ARM beats x86, it won't be, straightforwardly, "because RISC is better" - x86 will have lost for business reasons, and it could have gone the other way for business reasons. But the fact that it will have lost to a RISC - that will be because RISC is technically better. That's why there's no CISC competitor to lose to.

Or, if you dismiss this with the sensible "in the long run, we're all dead" - then, well, if you're alive right now and you're designing hardware, you are not making a CISC processor, are you? QED, not?

Getting back to our subject - based on the assumption that real costs matter, I believe that ugly, specialized hardware is forever. It doesn't matter how much money is poured into general-purpose computing, by whom and why. You will always have sufficiently important tasks that can be accomplished 10x or 100x more cheaply by using fundamentally cheap operations, and it will pay off for someone to make the ugly hardware and write the ugly, low-level code doing low-precision arithmetic to make it work.

And, on the other hand, the market for general-purpose hardware is always going to be huge, in particular, because there are so many things that must be done where specialization fundamentally doesn't help at all.

Conclusion

Hardware can only deliver "efficiency miracles" for operations that are fundamentally cheap to begin with. This is done by lowering dispatching costs and so increasing throughput per unit of energy. The price paid is reduced flexibility.

Some operations, such as high-precision arithmetic and memory access, are fundamentally expensive in terms of energy consumed to reach a given throughput. With these, hardware can still give you more speed through parallelization, but at an energy cost that may be prohibitive.