Shared posts

12 Sep 13:35

It’s Bayes All The Way Up

by Scott Alexander

[Epistemic status: Very speculative. I am not a neuroscientist and apologize for any misinterpretation of the papers involved. Thanks to the people who posted these papers in r/slatestarcodex. See also Mysticism and Pattern-Matching and Bayes For Schizophrenics]

Bayes’ Theorem is an equation for calculating certain kinds of conditional probabilities. For something so obscure, it’s attracted a surprisingly wide fanbase, including doctors, environmental scientists, economists, bodybuilders, fen-dwellers, and international smugglers. Eventually the hype reached the point where there was both a Bayesian cabaret and a Bayesian choir, popular books using Bayes’ Theorem to prove both the existence and the nonexistence of God, and even Bayesian dating advice. Eventually everyone agreed to dial down their exuberance a little, and accept that Bayes’ Theorem might not literally explain absolutely everything.

So – did you know that the neurotransmitters in the brain might represent different terms in Bayes’ Theorem?

First things first: Bayes’ Theorem is a mathematical framework for integrating new evidence with prior beliefs. For example, suppose you’re sitting in your quiet suburban home and you hear something that sounds like a lion roaring. You have some prior beliefs that lions are unlikely to be near your house, so you figure that it’s probably not a lion. Probably it’s some weird machine of your neighbor’s that just happens to sound like a lion, or some kids pranking you by playing lion noises, or something. You end up believing that there’s probably no lion nearby, but you do have a slightly higher probability of there being a lion nearby than you had before you heard the roaring noise. Bayes’ Theorem is just this kind of reasoning converted to math. You can find the long version here.

This is what the brain does too: integrate new evidence with prior beliefs. Here are some examples I’ve used on this blog before:

All three of these are examples of top-down processing. Bottom-up processing is when you build perceptions into a model of the the world. Top-down processing is when you let your models of the world influence your perceptions. In the first image, you view the center letter of the the first word as an H and the second as an A, even though they’re the the same character; your model of the world tells you that THE CAT is more likely than TAE CHT. In the second image, you read “PARIS IN THE SPRINGTIME”, skimming over the duplication of the word “the”; your model of the world tells you that the phrase should probably only have one “the” in it (just as you’ve probably skimmed over it the three times I’ve duplicated “the” in this paragraph alone!). The third image might look meaningless until you realize it’s a cow’s head; once you see the cow’s head your model of the world informs your perception and it’s almost impossible to see it as anything else.

(Teh fcat taht you can siltl raed wrods wtih all the itroneir ltretrs rgraneanrd is ahonter empxlae of top-dwon pssirocneg mkinag nsioy btotom-up dtaa sanp itno pacle)

But top-down processing is much more omnipresent than even these examples would suggest. Even something as simple as looking out the window and seeing a tree requires top-down processing; it may be too dark or foggy to see the tree one hundred percent clearly, the exact pattern of light and darkness on the tree might be something you’ve never seen before – but because you know what trees are and expect them to be around, the image “snaps” into the schema “tree” and you see a tree there. As usual, this process is most obvious when it goes wrong; for example, when random patterns on a wall or ceiling “snap” into the image of a face, or when the whistling of the wind “snaps” into a voice calling your name.

Corlett, Frith & Fletcher (2009) (henceforth CFF) expand on this idea and speculate on the biochemical substrates of each part of the process. They view perception as a “handshake” between top-down and bottom-up processing. Top-down models predict what we’re going to see, bottom-up models perceive the real world, then they meet in the middle and compare notes to calculate a prediction error. When the prediction error is low enough, it gets smoothed over into a consensus view of reality. When the prediction error is too high, it registers as salience/surprise, and we focus our attention on the stimulus involved to try to reconcile the models. If it turns out that bottom-up was right and top-down was wrong, then we adjust our priors (ie the models used by the top-down systems) and so learning occurs.

In their model, bottom-up sensory processing involves glutamate via the AMPA receptor, and top-down sensory processing involves glutamate via the NMDA receptor. Dopamine codes for prediction error, and seem to represent the level of certainty or the “confidence interval” of a given prediction or perception. Serotonin, acetylcholine, and the others seem to modulate these systems, where “modulate” is a generic neuroscientist weasel word. They provide a lot of neurological and radiologic evidence for these correspondences, for which I highly recommend reading the paper but which I’m not going to get into here. What I found interesting was their attempts to match this system to known pharmacological and psychological processes.

CFF discuss a couple of possible disruptions of their system. Consider increased AMPA signaling combined with decreased NMDA signaling. Bottom-up processing would become more powerful, unrestrained by top-down models. The world would seem to become “noisier”, as sensory inputs took on a life of their own and failed to snap into existing categories. In extreme cases, the “handshake” between exuberant bottom-up processes and overly timid top-down processes would fail completely, which would take the form of the sudden assignment of salience to a random stimulus.

Schizophrenics are famous for “delusions of reference”, where they think a random object or phrase is deeply important for reasons they have trouble explaining. Wikipedia gives as examples:

– A feeling that people on television or radio are talking about or talking directly to them

– Believing that headlines or stories in newspapers are written especially for them

– Seeing objects or events as being set up deliberately to convey a special or particular meaning to themselves

– Thinking ‘that the slightest careless movement on the part of another person had great personal meaning…increased significance’

In CFF, these are perceptual handshake failures; even though “there’s a story about the economy in today’s newspaper” should be perfectly predictable, noisy AMPA signaling registers it as an extreme prediction failure, and it fails its perceptual handshake with overly-weak priors. Then it gets flagged as shocking and deeply important. If you’re unlucky enough to have your brain flag a random newspaper article as shocking and deeply important, maybe phenomenologically that feels like it’s a secret message for you.

And this pattern – increased AMPA signaling combined with decreased NMDA signaling – is pretty much the effect profile of the drug ketamine, and ketamine does cause a paranoid psychosis mixed with delusions of reference.

Organic psychosis like schizophrenia might involve a similar process. There’s a test called the binocular depth inversion illusion, which looks like this:


The mask in the picture is concave, ie the nose is furthest away from the camera. But most viewers interpret it as convex, with the nose closest to the camera. This makes sense in terms of Bayesian perception; we see right-side-in faces a whole lot more often than inside-out faces.

Schizophrenics (and people stoned on marijuana!) are more likely to properly identify the face as concave than everyone else. In CFF’s system, something about schizophrenia and marijuana messes with NMDA, impairs priors, and reduces the power of top-down processing. This predicts that schizophrenics and potheads would both have paranoia and delusions of reference, which seems about right.

Consider a slightly different distortion: increased AMPA signaling combined with increased NMDA signaling. You’ve still got a lot of sensory noise. But you’ve also got stronger priors to try to make sense of them. CFF argue these are the perfect conditions to create hallucinations. The increase in sensory noise means there’s a lot of data to be explained; the increased top-down pattern-matching means that the brain is very keen to fit all of it into some grand narrative. The result is vivid, convincing hallucinations of things that are totally not there at all.

LSD is mostly serotonergic, but most things that happen in the brain bottom out in glutamate eventually, and LSD bottoms out in exactly the pattern of increased AMPA and increased NMDA that we would expect to produce hallucinations. CFF don’t mention this, but I would also like to add my theory of pattern-matching based mysticism. Make the top-down prior-using NMDA system strong enough, and the entire world collapses into a single narrative, a divine grand plan in which everything makes sense and you understand all of it. This is also something I associate with LSD.

If dopamine represents a confidence interval, then increased dopaminergic signaling should mean narrowed confidence intervals and increased certainty. Perceptually, this would correspond to increased sensory acuity. More abstractly, it might increase “self-confidence” as usually described. Amphetamines, which act as dopamine agonists, do both. Amphetamine users report increased visual acuity (weirdly, they also report blurred vision sometimes; I don’t understand exactly what’s going on here). They also create an elevated mood and grandiose delusions, making users more sure of themselves and making them feel like they can do anything.

(something I remain confused about: elevated mood and grandiose delusions are also typical of bipolar mania. People on amphetamines and other dopamine agonists act pretty much exactly like manic people. Antidopaminergic drugs like olanzapine are very effective acute antimanics. But people don’t generally think of mania as primarily dopaminergic. Why not?)

CFF end their paper with a discussion of sensory deprivation. If perception is a handshake between bottom-up sense-data and top-down priors, what happens when we turn the sense-data off entirely? Psychologists note that most people go a little crazy when placed in total sensory deprivation, but that schizophrenics actually seem to do better under sense-deprivation conditions. Why?

The brain filters sense-data to adjust for ambient conditions. For example, when it’s very dark, your eyes gradually adjust until you can see by whatever light is present. When it’s perfectly silent, you can hear the proverbial pin drop. In a state of total sensory deprivation, any attempt to adjust to a threshold where you can detect the nonexistent signal is actually just going to bring you down below the point where you’re picking up noise. As with LSD, when there’s too much noise the top-down systems do their best to impose structure on it, leading to hallucinations; when they fail, you get delusions. If schizophrenics have inherently noisy perceptual systems, such that all perception comes with noise the same way a bad microphone gives off bursts of static whenever anyone tries to speak into it, then their brains will actually become less noisy as sense-data disappears.

(this might be a good time to remember that no congentally blind people ever develop schizophrenia and no one knows why)


Lawson, Rees, and Friston (2014) offer a Bayesian link to autism.

(there are probably a lot of links between Bayesians and autism, but this is the only one that needs a journal article)

They argue that autism is a form of aberrant precision. That is, confidence intervals are too low; bottom-up sense-data cannot handshake with top-down models unless they’re almost-exactly the same. Since they rarely are, top-down models lose their ability to “smooth over” bottom-up information. The world is full of random noise that fails to cohere into any more general plan.

Right now I’m sitting in a room writing on a computer. A white noise machine produces white noise. A fluorescent lamp flickers overhead. My body is doing all sorts of body stuff like digesting food and pumping blood. There are a few things I need to concentrate on: this essay I’m writing, my pager if it goes off, any sorts of sudden dramatic pains in my body that might indicate a life-threatening illness. But I don’t need to worry about the feeling of my back against the back fo the chair, or the occasional flickers of the fluorescent light, or the feeling of my shirt on my skin.

A well-functioning perceptual system gates out those things I don’t need to worry about. Since my shirt always feels more or less similar on my skin, my top-down model learns to predict that feeling. When the top-down model predicts the shirt on my skin, and my bottom-up sensation reports the shirt on my skin, they handshake and agree that all is well. Even if a slight change in posture makes a different part of my shirt brush against my skin than usual, the confidence intervals are wide: it is still an instance of the class “shirt on skin”, it “snaps” into my shirt-on-skin schema, and the perceptual handshake goes off successfully, and all remains well. If something dramatic happens – for example my pager starts beeping really loudly – then my top-down model, which has thus far predicted silence – is rudely surprised by the sudden burst of noise. The perceptual handshake fails, and I am startled, upset, and instantly stop writing my essay as I try to figure out what to do next (hopefully answer my pager). The system works.

The autistic version works differently. The top-down model tries to predict the feeling of the shirt on my skin, but tiny changes in the position of the shirt change the feeling somewhat; bottom-up data does not quite match top-down prediction. In a neurotypical with wide confidence intervals, the brain would shrug off such a tiny difference, declare it good enough for government work, and (correctly) ignore it. In an autistic person, the confidence intervals are very narrow; the top-down systems expect the feeling of shirt-on-skin, but the bottom-up systems report a slightly different feeling of shirt-on-skin. These fail to snap together, the perceptual handshake fails, and the brain flags it as important; the autistic person is startled, upset, and feels like stopping what they’re doing in order to attend to it.

(in fact, I think the paper might be claiming that “attention” just means a localized narrowing of confidence intervals in a certain direction; for example, if I pay attention to the feeling of my shirt on my skin, then I can feel every little fold and micromovement. This seems like an important point with a lot of implications.)

Such handshake failures match some of the sensory symptoms of autism pretty well. Autistic people dislike environments that are (literally or metaphorically) noisy. Small sensory imperfections bother them. They literally get annoyed by scratchy clothing. They tend to seek routine, make sure everything is maximally predictable, and act as if even tiny deviations from normal are worthy of alarm.

They also stim. LRF interpret stimming as an attempt to control sensory predictive environment. If you’re moving your arms in a rhythmic motion, the overwhelming majority of sensory input from your arm is from that rhythmic motion; tiny deviations get lost in the larger signal, the same way a firefly would disappear when seen against the blaze of a searchlight. The rhythmic signal which you yourself are creating and keeping maximally rhythmic is the most predictable thing possible. Even something like head-banging serves to create extremely strong sensory data – sensory data whose production the head-banger is themselves in complete control of. If the brain is in some sense minimizing predictive error, and there’s no reasonable way to minimize prediction error because your predictive system is messed up and registering everything as a dangerous error – then sometimes you have to take things into your own hands, bang your head against a metal wall, and say “I totally predicted all that pain”.

(the paper doesn’t mention this, but it wouldn’t surprise me if weighted blankets work the same way. A bunch of weights placed on top of you will predictably stay there; if they’re heavy enough this is one of the strongest sensory signals you’re receiving and it might “raise your average” in terms of having low predictive error)

What about all the non-sensory-gating-related symptoms of autism? LRF think that autistic people dislike social interaction because it’s “the greatest uncertainty”; other people are the hardest-to-predict things we encounter. Neurotypical people are able to smooth social interaction into general categories: this person seems friendly, that person probably doesn’t like me. Autistic people get the same bottom-up data: an eye-twitch here, a weird half-smile there – but it never snaps into recognizable models; it just stays weird uninterpretable clues. So:

This provides a simple explanation for the pronounced social-communication difficulties in autism; given that other agents are arguably the most difficult things to predict. In the complex world of social interactions, the many-to-one mappings between causes and sensory input are dramatically increased and difficult to learn; especially if one cannot contextualize the prediction errors that drive that learning.

They don’t really address differences between autists and neurotypicals in terms of personality or skills. But a lot of people have come up with stories about how autistic people are better at tasks that require a lot of precision and less good at tasks that require central coherence, which seems like sort of what this theory would predict.

LRF ends by discussing biochemical bases. They agree with CFF that top-down processing is probably related to NMDA receptors, and so suspect this is damaged in autism. Transgenic mice who lack an important NMDA receptor component seem to behave kind of like autistic humans, which they take as support for their model – although obviously a lot more research is needed. They agree that acetylcholine “modulates” all of this and suggest it might be a promising pathway for future research. They agree with CFF that dopamine may represent precision/confidence, but despite their whole spiel being that precision/confidence is messed up in autism, they don’t have much to say about dopamine except that it probably modulates something, just like everything else.


All of this is fascinating and elegant. But is it elegant enough?

I notice that I am confused about the relative role of NMDA and AMPA in producing hallucinations and delusions. CFF say that enhanced NMDA signaling results in hallucinations as the brain tries to add excess order to experience and “overfits” the visual data. Fine. So maybe you get a tiny bit of visual noise and think you’re seeing the Devil. But shouldn’t NMDA and top-down processing also be the system that tells you there is a high prior against the Devil being in any particular visual region?

Also, once psychotics develop a delusion, that delusion usually sticks around. It might be that a stray word in a newspaper makes someone think that the FBI is after them, but once they think the FBI is after them, they fit everything into this new paradigm – for example, they might think their psychiatrist is an FBI agent sent to poison them. This sounds a lot like a new, very strong prior! Their doctor presumably isn’t doing much that seems FBI-agent-ish, but because they’re working off a narrative of the FBI coming to get them, they fit everything, including their doctor, into that story. But if psychosis is a case of attenuated priors, why should that be?

(maybe they would answer that because psychotic people also have increased dopamine, they believe in the FBI with absolute certainty? But then how come most psychotics don’t seem to be manic – that is, why aren’t they overconfident in anything except their delusions?)

LRF discuss prediction error in terms of mild surprise and annoyance; you didn’t expect a beeping noise, the beeping noise happened, so you become startled. CFF discuss prediction error as sudden surprising salience, but then say that the attribution of salience to an odd stimulus creates a delusion of reference, a belief that it’s somehow pregnant with secret messages. These are two very different views of prediction error; an autist wearing uncomfortable clothes might be constantly focusing on their itchiness rather than on whatever she’s trying to do at the time, but she’s not going to start thinking they’re a sign from God. What’s the difference?

Finally, although they highlighted a selection of drugs that make sense within their model, others seem not to. For example, there’s some discussion of ampakines for schizophrenia. But this is the opposite of what you’d want if psychosis involved overactive AMPA signaling! I’m not saying that the ampakines for schizophrenia definitely work, but they don’t seem to make the schizophrenia noticeably worse either.

Probably this will end the same way most things in psychiatry end – hopelessly bogged down in complexity. Probably AMPA does one thing in one part of the brain, the opposite in other parts of the brain, and it’s all nonlinear and different amounts of AMPA will have totally different effects and maybe downregulate itself somewhere else.

Still, it’s neat to have at least a vague high-level overview of what might be going on.

11 Sep 05:00

Minirreforma política muda muito na direção correta

Tramita no Senado PEC (proposta de emenda à Constituição) de minirreforma política. De autoria do senador do PSDB do Espírito Santo, Ricardo Ferraço, conta com relatoria do senador do PSDB de São Paulo Aloysio Nunes Ferreira. O projeto institui a cláusula de desempenho e veda coligação para eleição proporcional.

A intensa crise política atual tem inúmeras causas. A elevada fragmentação partidária agrava o problema: aumenta o custo do Executivo no exercício de seu poder de agenda no Congresso Nacional, dificultando, portanto, a governabilidade.

Difícil governar com 32 partidos e outros 35 na fila para a obtenção de registro.

O desenho institucional, com voto proporcional em grandes distritos (cada Estado é um distrito), gera elevado número de partidos.

Escolhas recentes de gestão do próprio governo petista, que tentou desidratar o PMDB criando vários "pequenos PMDBs", e decisões equivocadas do STF agravaram o problema.

É consensual na ciência política brasileira que o elevado grau de fragmentação não tem gerado ganhos de representação de grupos minoritários da sociedade. É indústria que atende somente aos interesses de uns poucos: abocanhar parcela dos R$ 800 milhões do Fundo Partidário e vender tempo de televisão.

A cláusula de desempenho da PEC determina que "terão direito a funcionamento parlamentar aqueles [partidos] que obtiverem, nas eleições para a Câmara dos Deputados, no mínimo, 3% de todos os votos válidos, distribuídos em, pelo menos, 14 unidades da Federação, com um mínimo de 2% dos votos válidos em cada uma delas".

O parágrafo seguinte estabelece que "somente os partidos políticos com funcionamento parlamentar terão direito a estrutura própria e funcional nas Casas Legislativas, participarão da distribuição dos recursos do fundo partidário e terão acesso gratuito a rádio e televisão, na forma da lei".

Deputados eleitos por partidos que não atingirem a cláusula de desempenho não perdem o mandato. Podem continuar no partido sem direito a funcionamento parlamentar, convivendo com as limitações daí decorrentes, ou podem mudar, sem maiores ônus, para outro partido.

A PEC, atendendo ao anseio de pequenos partidos ideológicos, permite coligação para eleição proporcional na forma de federação de partidos. Para todos os efeitos práticos, a federação de partidos funciona, ao longo do período de sua vigência, como se fosse um único partido. Em particular, os partidos federados terão que participar do processo eleitoral e atuar conjuntamente não somente no Senado e na Câmara Federal mas também nas Assembleias dos Estados e do Distrito Federal, bem como nas Câmaras de Vereadores.

Como regra de transição, a PEC estabelece cláusula de barreira mais leve em 2018, de somente 2% de todos os votos válidos; e a proibição de coligação em eleições proporcionais vigorará somente a partir de 2022.

A grande virtude da iniciativa do senador Ferraço é que a reforma proposta é incremental e ataca a fonte de um dos maiores problemas de nosso sistema político: a excessiva fragmentação partidária.

Em vez de tudo mudar para que fique tudo como está, reforma-se quase nada para mudar muito na direção correta.

Let's block ads! (Why?)

14 Sep 02:44

Some Context For That NYT Sugar Article

by Scott Alexander

Imagine a political historian discovers that Lyndon Johnson accepted a campaign contribution from a big Wall Street bank. Since Johnson’s policies helped shape the modern Democratic Party, everyone agrees the Democrats are built on a foundation of lies. “Republicans Vindicated; Small-Government Conservativism Was Right All Along”, say the headlines of all the major newspapers.

This is kind of how I feel about the reaction to the latest New York Times article.

How The Sugar Industry Shifted Blame To Fat describes new historical research that finds that the sugar industry sponsored a study showing that fat (and not sugar) was the major risk factor for cardiovascular disease. They tie this into a bigger narrative about how sugar is the real dietary villain, and it’s only the sugar industry’s successful bribery work that made us suspect fat for so long:

The revelations are important because the debate about the relative harms of sugar and saturated fat continues today, Dr. Glantz said. For many decades, health officials encouraged Americans to reduce their fat intake, which led many people to consume low-fat, high-sugar foods that some experts now blame for fueling the obesity crisis.

“It was a very smart thing the sugar industry did, because review papers, especially if you get them published in a very prominent journal, tend to shape the overall scientific discussion,” he said […]

I’m glad researchers have discovered this. But treating it as a smoking gun which exonerates fat and blames sugar is like the political example above. Yes, it’s sketchy for LBJ to take Wall Street money. But this kind of low-level corruption is so universal that concentrating on any one example is likely to lead to overcorrection.

Yes, the sugar lobby sponsors some research, but the fat lobby has researchers of its own. They tend to be associated with the dairy and meat industries, both of which are high in saturated fat and both of which are very involved in nutrition research. For example, Siri-Tarino et al’s Meta-analysis of prospective cohort studies evaluating the association of saturated fat with cardiovascular disease finds that saturated fat does not increase heart disease risk, but it has a little footnote saying that it’s supported by the National Dairy Council. Modulation of Replacement Nutrients, which finds that replacing dietary fat with dietary sugar doesn’t help and may worsen heart disease, includes two authors affiliated with the National Dairy Council and one affiliated with the National Cattleman’s Beef Association.

Mother Jones does a dairy industry expose and finds:

[Industry] ties can sometimes be hard to avoid, since much of the research on dairy is funded by a constellation of industry-backed institutes, including the Nestlé Nutrition Institute, the Dannon Institute, and the Dairy Research Institute, which spends $19 million a year “to establish the health benefits of dairy products and ingredients.” Even Willett acknowledges that he has received a “very small” dairy industry grant. Dairy companies also donate heavily to the American Society for Nutrition, which publishes the influential American Journal of Clinical Nutrition, and the Academy of Nutrition and Dietetics, “the world’s largest organization of food and nutrition professionals.”

Then there are the industry’s donations to politicians. Dairy companies spent nearly $63 million on federal lobbying and gave $24 million to candidates between 2004 and 2014.

As Jim Babcock points out in the comments, some of the agendas are more complicated than I’m making them sound. Dairy was pretty okay with the low-fat craze for a while, because it let them market low-fat milk. But they do seem to be behind a lot of the pro-saturated-fat research going on right now, and their website does promote pro-saturated-fat articles (1, 2, 3). Overall they seem to be taking a low-key approach where they roll with some studies and push back on others.

In any case, claims that the sugar industry sponsored one study back in the 1960s, and this means everything we’ve ever thought is wrong and biased against fat and in favor of sugar, miss the point (especially since there are probably problems with both sugar and fat). Whatever study the New York Times has dredged up was one volley in an eternal clandestine war of Big Fat against Big Sugar, and figuring out who’s distorted the science more is the sort of project that’s going to take more than one article.

16 Sep 21:30

Cuando se me va de las manos, por @Medievalico

12 Sep 14:17

Cómo nos gusta a todos trollear a Mamá

16 Sep 17:48

Why Do Famous People Get Paid $250,000 to Give a Speech?

Why Do Famous People Get Paid $250,000 to Give a Speech?

Photo by Lorie Shaull; adapted by Priceonomics


In 2013 and 2014, Goldman Sachs, Morgan Stanley, and Deutsche Bank each paid Hillary Clinton $225,000 to give a speech. 

“If you're going to give a speech for $225,000, it must be a hell of a good speech," Bernie Sanders said of Clinton’s speaking fees when he ran for president. 

Sanders was insinuating that bankers had bought Clinton’s loyalty. After all, how could a speech be worth $225,000?

But that’s like asking why anyone would pay Beyoncé or the Beatles a million dollars just to play a few songs. Organizations routinely pay huge sums to have famous people say a few words or attend a conference. Paid speeches are how politicians profit when they leave office and how book authors make good money. It’s a safe bet that Rio Olympians are working with speech coaches right now. 

As one speaking agent puts it, “Hillary Clinton is just the tip of the iceberg.” 

Speaking Fee of Select Public Speakers

Fees are gathered mainly from the websites of speaking agencies. Some fee ranges may be outdated or inflated.

Selecting for speakers based on fame exaggerates the high end of the market (most speakers are lesser known business gurus or motivational speakers whose fee is under $20,000) and skews the list of speakers to be more male than it otherwise would be (largely due to the larger number of male athletes).

Anderson Cooper, Malcolm Gladwell, and co. don’t charge this much every time they speak. They make exceptions for high schools and, often, for commencement addresses. (Students and administrators tend to choose speakers with a personal connection to the college.) And a minority donate all their speaking fees to charity. 

But people with a great story, a famous name, or business acumen can profit enormously by flying to San Francisco for a Salesforce conference. Or to Orlando for the National Association of Realtors Conference and Expo.  

It’s all part of an industry of agencies that represent big-name speakers and connect them with event organizers—usually of corporate conferences—willing to pay $10,000 to $1 million for a good speech. The industry offers speakers a means to share a message or promote a book, and, above all, turn fame into money.  

The Great American Speech

People have always had a complex about speeches.

As Jerry Seinfeld once quipped, “According to most studies, people’s number one fear is public speaking. Number two is death. This means, to the average person, if you have to go to a funeral, you’re better off in the casket than doing the eulogy.” 

But for people who excel at giving speeches, the rewards have always been great. In Ancient Rome, Cicero’s great oratory made him one of the most powerful people in the world. More recently, Barack Obama’s 2004 Democratic Convention keynote about “the audacity of hope” made him an overnight star.

The birth of the modern speakers industry can be traced to the sixties and seventies, when Harry Walker, the founder of a leading agency that represents big-name speakers, pitched businesses on hiring public figures for “mind stretching programs.”

As his son explained when Harry Walker died in 2002, “That meant instead of talking to Coca-Cola executives about bottling—which is what they used to hear about—talk to them about what's happening in the world.” (Entrepreneurial agents in other countries did the same, signing up luminaries to charge for speeches they once gave for free.)

According to Bloomberg, Gerald Ford became the first former president to give paid speeches in 1977. His $25,000 appearances went largely unnoticed; the New York Times archives has no mention of them. But when Ronald Reagan gave many prominent, paid speeches in his retirement, he boosted the industry and cemented the current tradition of politicians, who are banned from giving paid speeches, cashing in once they leave office. 

Lucrative speaking appearances, however, are not a modern invention. They’re 150 years old.

The practice arose in pre-Civil War New England in response to Bostonians’ clamor to hear lectures by abolitionists like Henry Ward Beecher. A literary man named James Redpath began acting as matchmaker between anti-slavery orators and interested audiences; soon he was a de facto agent, organizing speaking tours for famous Americans. 

The result was America’s “Lyceum Movement.” The speakers broadened their scope to address the arts, politics, and science, and the Lyceum Movement became known as “the people’s college.” Mark Twain read his literary work; Abraham Lincoln warned of the corrupting influences of slavery; and Elizabeth Cady Stanton lectured on the importance of women’s suffrage. 

"The lyceum invaded the cities and towns of the country,” The American Educational Reviewobserved in 1910, “and during these years, its apostles have been heard in school, in church, [and] wherever sufficient numbers could be banded together to warrant bringing to their town the great and the near-great." A subsequent movement called Chautauqua organized speaking events around the country—with equal success—in the early 1900s.

It wasn’t cheap to get Ralph Waldo Emerson to come talk about self-reliance. He charged $300, apparently setting a high price so he’d never be asked to speak in small markets like Cincinnati. “We found after a few years,” a Lyceum organizer wrote, “that the drift was quite toward making the lyceum a money-making affair, without regard to any higher end." 

The lecture circuit became the primary way intellectuals supported themselves. Susan B. Anthony gave up to 100 speeches a year to fund her activism, and Mark Twain organized lecture tours with entrepreneurial zeal.

Lyceum fees ranged from $50 for a popular instructor in a small town to several hundred dollars (or up to $1,000 in New York City) for luminaries like Frederick Douglass. Adjusting for inflation, this means big-name speakers in 19th century America earned $10,000 to $25,000 on the lecture circuit. 

The Economics of a $40,000 Speech

A $40,000 speech is a lot less puzzling when you think of it like a concert. If an event organizer sells out a 6,000-seat auditorium, a $10 ticket should more than cover the costs. 

For speakers like Malcolm Gladwell and Jane Goodall, ticketed events hosted by universities, libraries, local Chambers of Commerce, and theatrical promoters make up part of the lecture circuit. 

But the growth of the speakers industry since the 1970s—and the existence of six-figure speaking fees for people like Hillary Clinton—relies on the logic of hiring celebrities to speak at corporate events.

Simply put, if you want a particular group of people to attend an event, hiring a famous person is the way to go. “Everyone wants to say, ‘I had lunch with Michael Lewis yesterday,’” Don Epstein, who represented the best-selling author, told Bloomberg in 2014. “It might be you and 500 other people, but it still happened.”

“For some organizations, the speech is almost secondary,” says Jim Keppler, the president and founder of Keppler Speakers. “They are looking to bring in a VIP to schmooze at receptions, pose for pictures, and sign autographs.”

The rewards to attracting the right people in the corporate world can more than justify a six-figure speech. Hedge funds often invite potential clients to events featuring prominent speakers. As one hedge fund manager has explained, if just one client “decides to invest $10 million… the firm will snag a 2 percent management fee—which works out to $200,000” per year.

For companies like Google and Goldman Sachs, hiring a famous speaker to address their employees is a perk like free lunch. But professional associations are a more common employer of highly paid speakers. Hillary Clinton, for example, did not just speak at Morgan Stanley. She was hired by the National Association of Realtors, International Deli-Dairy-Bakery Association, American Society of Travel Agents, American Camping Association, and Institute of Scrap Recycling Industries. And they each paid her nearly a quarter of a million dollars. 

Photo by Karen Murphy

Why did a bunch of recyclers pay Hillary Clinton $225,000 for a speech? Because they know famous speakers attract a crowd. 

“We’ve essentially had every former president since Ronald Reagan,” says Chuck Carr, the Vice President for Convention, Education & Training at the Institute of Scrap Recycling Industries (ISRI), “and most of the secretaries of state.” As a professional association, ISRI not only wants to sell tickets to its annual conference. It wants good attendance from recycling professionals so they benefit from the networking opportunities. And people like Stanley McChrystal and Bill Clinton help them do that. 

At some conferences, speeches by authors and former presidents are highbrow entertainment, slotted into the schedule like a performance by Green Day. At a gathering of the American Camping Association, Hillary Clinton discussed her career and only addressed camping by joking that Congress might benefit from attending a bipartisan summer camp. 

A $20,000 to $200,000 check may or may not buy a customized speech. According to Jim Keppler, some motivational speakers are “one man shows,” telling each audience a practiced story of overcoming adversity. Malcolm Gladwell, on the other hand, has explained that if he addresses a group of IT specialists, then he’d “like to say something intelligent about IT.”

But highly paid speeches are not always pure entertainment.

In ISRI’s case, Chuck Carr says, the recyclers enjoyed hearing from secretaries of state about how difficult it is to do business in a particular country. Jim Keppler gives the example that after Obamacare became law, businesses wanted speeches by healthcare experts. “Healthcare is the biggest line item on a lot of corporate budgets,” he says, “and executives didn’t know how it would affect them.” 

This once meant companies wanted to hear from Republicans and Democrats. Polarization, however, has hit the speakers industry. “Now if you bring in a Republican or Democrat,” Keppler says, “all you do is piss off 50% of the audience.” It’s safer to hire an Apollo 13 astronaut for an inspirational speech.  

Or a branding expert. Because while the most famous mostly offer star power, plenty of businesses want specific advice. Sometimes that’s a former athlete telling a sales force about football practice as a metaphor for teamwork. More often it’s a consultant or retired executive sharing their expertise. (The chart at the top of this article features household names, but we could easily fill it with business gurus unknown to the average American.) 

The current trend, Keppler says, is speakers who can talk about how to create a better company culture, improve an organization’s leadership, or stand out in a crowded market. 


In the case of politicians like Hillary Clinton, many observers see a more sinister explanation for how a speech can be worth $225,000: the speech is buying political influence.

This concern isn’t unique to politicians. The Columbia Journalism Review has questioned whether reporters who cover Wall Street should accept $20,000 speaking gigs from big banks, and Nate Silver raised eyebrows this week by giving a paid, closed-door presentation to Republican donors.   

It’s this concern that motivates the law that active politicians can’t give paid speeches, and most newspapers ban their beat reporters from getting paid to talk.

It’s impossible to say exactly how much the desire to influence politicians inflates their speaking fees. Critics tend to ask why anyone would pay Hillary Clinton, who acknowledges that she is “not a natural politician,” $225,000 for a speech. But speaking agents say that her status as one of the most famous people in the world justifies her fee, and a one-time speech is a poor way to get quid pro quo. 

Instead, certain organizations likely view paying a politician’s speaking fee the same way they view contributing money to his or her campaign: part of a larger lobbying effort. 

The Life of a Speaker

Earning $40,000 for a speech sounds like easy money. But the reality is that the speaking circuit can be grueling. 

Star power makes a career in speaking possible. But star power is not enough. When booking speakers for the recyclers association, Chuck Carr calls around to hear whether the previous speeches given by former presidents and bestselling authors were “dry or canned.” On the Tim Ferriss Show, Malcolm Gladwell explains that “the breakthrough for me in speaking came when I realized it took 10 times the time I was giving to it.”

Even veterans of countless press conferences find themselves working with coaches like Mary Gardner. “I’ve told some famous people, ‘Stop! Stop! You’re boring me to tears,’” says Gardner, who will fly cross-country to drill speakers in a hotel room. Especially for athletes who previously spent all their time practicing jump shots or a floor routine, launching a career takes time. Gardner coaches her clients on how to stand confidently and even to tilt their head the right way when meeting fans. 

Many speakers also struggle with imposter syndrome. “What worried me was, would anyone want to listen to this?” says Cindy Miller, a golf pro turned corporate trainer and speaker. “I had to realize that it’s not about me. It’s about the audience.” 

Photo by Erik Hersman

But even after a retired politician, former athlete, or prominent executive has earned a reputation for speaking, many will still turn down the majority of offers. Because while a speech seems like easy money, it’s really not. 

In 1895, a speaking agent said of Susan B. Anthony, she “has done more lecturing than any other person in America, and survived it.” Speakers no longer jump off moving trains to make it to a conference. But it’s still a lonely travel schedule. 

Jonah Lehrer is a once high-flying science writer who became one of the industry’s hottest speakers before scandals revealed he plagiarized and fabricated parts of his work. But even during his heyday, he acknowledged the downsides of the lifestyle. “You end up getting existentially sad,” Lehrer told The Observer in 2012, when “you look through your wallet and you realize you’ve got seven hotel keys.”

Like a political campaign, a speaking tour involves unglamorous travel and repetitive events. But just as campaigns create occasional moments of genuine candidate-voter connection and inspiring events like Barack Obama’s acceptance of the presidency in 2008, the speaking industry can remind listeners that public figures are normal people and give them chills when an Apollo 13 astronaut recalls an oxygen tank exploding. 

People will pay a lot for those moments. Even if it costs tens of thousands of dollars, people want to bask in the halo of fame and accomplishment. 

Our next article explores whether the father of modern statistics was a shill for Big Tobacco. To get notified when we post it  →  join our email list. 

Announcement: The Priceonomics Content Marketing Conference is on November 1 in San Francisco. Get your early bird ticket now.

Let's block ads! (Why?)

13 Sep 06:31

Starting Over

by Greg Ross

On Dec. 24, 2010, Lori Erica Ruff shot herself to death with a shotgun in Longview, Texas. After her death, her ex-husband’s family discovered a lockbox in her home that revealed that in May 1988 she had stolen the identity of Becky Sue Turner, a 2-year-old girl who had died in a fire in 1971. She had then changed her name to Lori Erica Kennedy and received a Social Security account, erasing all trace of her origins.

After this she had qualified for a GED and eventually graduated from the University of Texas with a degree in business administration. At a Bible study class she met Blake Ruff, who describes her as extremely secretive. She told him that she was from Arizona, that her parents were dead, and that she had no siblings. The two married in 2003 and Lori gave birth to a girl, of whom she was “extremely protective.” The marriage broke down, Ruff divorced her, and she committed suicide.

The lockbox contained a note with the phrases “North Hollywood police,” “402 months,” and “Ben Perkins,” but none of these clues has led anywhere. No one knows the woman’s real identity, or her history before 1988. Social Security Administration investigator Joe Velling received the case in 2011. “My immediate reaction was, I’ll crack this pretty quickly,” he told the Seattle Times in 2013. It remains unsolved.

(Thanks, Tuvia.)

12 Sep 08:41

How I learned to program

Tavish Armstrong has a great document where he describes how and when he learned the programming skills he has. I like this idea because I’ve found that the paths that people take to get into programming are much more varied than stereotypes give credit for, and I think it’s useful to see that there are many possible paths into programming.

Personally, I spent a decade working as an electrical engineer before taking a programming job. When I talk to people about this, they often want to take away a smooth narrative of my history. Maybe it’s that my math background gives me tools I can apply to a lot of problems, maybe it’s that my hardware background gives me a good understanding of performance and testing, or maybe it’s that the combination makes me a great fit for hardware/software co-design problems. People like a good narrative. One narrative people seem to like is that I’m a good problem solver, and that problem solving ability is generalizable. But reality is messy. Electrical engineering seemed like the most natural thing in the world, and I picked it up without trying very hard. Programming was unnatural for me, and didn’t make any sense at all for years. If you believe in the common “you either have it or you don’t” narrative about programmers, I definitely don’t have it. And yet, I now make a living programming, and people seem to be pretty happy with the work I do.

How’d that happen? Well, if we go back to the beginning, before becoming a hardware engineer, I spent a fair amount of time doing failed kid-projects (e.g., writing a tic-tac-toe game and AI) and not really “getting” programming. I do sometimes get a lot of value out of my math or hardware skills, but I suspect I could teach someone the actually applicable math and hardware skills I have in less than a year. Spending five years in a school and a decade in industry to pick up those skills was a circuitous route to getting where I am. Amazingly, I’ve found that my path has been more direct than that of most of my co-workers, giving the lie to the narrative that most programmers are talented whiz kids who took to programming early.

And while I only use a small fraction of the technical skills I’ve learned on any given day, I find that I have a meta-skill set that I use all the time. There’s nothing profound about the meta-skill set, but because I often work in new (to me) problem domains, I find my meta skillset to be more valuable than my actual skills. I don’t think that you can communicate the importance of meta-skills (like communication) by writing a blog post any more than you can explain what a monad is by saying that it’s like a burrito. That being said, I’m going to tell this story anyway.

Ineffective fumbling (1980s - 1996)

Many of my friends and I tried and failed multiple times to learn how to program. We tried BASIC, and could write some simple loops, use conditionals, and print to the screen, but never figured out how to do anything fun or useful.

We were exposed to some kind of lego-related programming, uhhh, thing in school, but none of us had any idea how to do anything beyond what was in the instructions. While it was fun, it was no more educational than a video game and had a similar impact.

One of us got a game programming book. We read it, tried to do a few things, and made no progress.

High school (1996 - 2000)

Our ineffective fumbling continued through high school. Due to an interest in gaming, I got interested in benchmarking, which eventually led to learning about CPUs and CPU microarchitecture. This was in the early days of Google, before Google Scholar, and before most CS/EE papers could be found online for free, so this was mostly material from enthusiast sites. Luckily, the internet was relatively young, as were the users on the sites I frequented. Much of the material on hardware was targeted (and even written by) people like me, which made it accessible. Unfortunately, a lot of the material on programming was written by and targeted at professional programmers, things like Paul Hsieh’s optimization guide. There were some beginner-friendly guides to programming out there, but my friends and I didn’t stumble across them.

We had programming classes in high school: an introductory class that covered Visual Basic and an AP class that taught C++. Both classes were taught by someone who didn’t really know how to program or how to teach programming. My class had a couple of kids who already knew how to program and were making good money doing programming competitions on topcoder, but they failed to test out of the intro class because that test included things like a screenshot of the VB6 IDE, where you got a point for correctly identifying what each button did. The class taught about as much as you’d expect from a class where the pre-test involved identifying UI elements from an IDE.

The AP class the year after was similarly effective. About halfway through the class, a couple of students organized an independent study group which worked through an alternate textbook because the class was clearly not preparing us for the AP exam. I passed the AP exam because it was one of those multiple choice tests that’s possible to pass without knowing the material.

Although I didn’t learn much, I wouldn’t have graduated high school if not for AP classes. I failed enough individual classes that I almost didn’t have enough credits to graduate. I got those necessary credits for two reasons: first, a lot of the teachers had a deal where, if you scored well on the AP exam, they would give you a passing grade in the class (usually an A, but sometimes a B). Even that wouldn’t have been enough if my chemistry teacher hadn’t also changed my grade to a passing grade when he found out I did well on the AP chemistry test1.

Other than not failing out of high school, I’m not sure I got much out of my AP classes. My AP CS class actually had a net negative effect on my learning to program because the AP test let me opt out of the first two intro CS classes in college (an introduction to programming and a data structures course). In retrospect, I should have taken the intro classes, but I didn’t, which left me with huge holes in my knowledge that I didn’t really fill in for nearly a decade.

College (2000 - 2003)

Because I’d nearly failed out of high school, there was no reasonable way I could have gotten into a “good” college. Luckily, I grew up in Wisconsin, a state with a “good” school that used a formula to determine who would automatically get admitted: the GPA cutoff depended on standardized test scores, and anyone with standardized test scores above a certain mark was admitted regardless of GPA. During orientation, I talked to someone who did admissions and found out that my year was the last year they used the formula.

I majored in computer engineering and math for reasons that seem quite bad in retrospect. I had no idea what I really wanted to study. I settled on either computer engineering or engineering mechanics because both of those sounded “hard”.

I made a number of attempts to come up with better criteria for choosing a major. The most serious was when I spent a week talking to professors in an attempt to find out what day-to-day life in different fields was like. That approach had two key flaws. First, most professors don’t know what it’s like to work in industry; now that I work in industry and talk to folks in academia, I see that most academics who haven’t done stints in industry have a lot of misconceptions about what it’s like. Second, even if I managed to get accurate descriptions of different fields, it turns out that there’s a wide body of research that indicates that humans are basically hopeless at predicting which activities they’ll enjoy. Ultimately, I decided by coin flip.


I wasn’t planning on majoring in math, but my freshman intro calculus course was so much fun that I ended up adding a math major. That only happened because a high-school friend of mine passed me the application form for the honors calculus sequence because he thought I might be interested in it (he’d already taken the entire calculus sequence as well as linear algebra). The professor for the class covered the material at an unusually fast pace: he finished what was supposed to be a year-long calculus textbook in part-way through the semester and then lectured on his research for the rest of the semester. The class was theorem-proof oriented and didn’t involve any of that yucky memorization that I’d previously associated with math. That was the first time I’d found school engaging in my entire life and it made me really look forward to going to math classes. I later found out that non-honors calculus involved a lot of memorization when the engineering school required me to go back and take calculus II, which I’d skipped because I’d already covered the material in the intro calculus course.

If I hadn’t had a friend drop the application for honors calculus in my lap, I probably wouldn’t have majored in math and it’s possible I never would have found any classes that seemed worth attending. Even as it was, all of the most engaging undergrad professors I had were math professors2 and I mostly skipped my other classes. I don’t know how much of that was because my math classes were much smaller, and therefore much more customized to the people in the class (computer engineering was very trendy at the time, and classes were overflowing), and how much was because these professors were really great teachers.

Although I occasionally get some use out of the math that I learned, most of the value was in becoming confident that I can learn and work through the math I need to solve any particular problem.


In my engineering classes, I learned how to debug and how computers work down to the transistor level. I spent a fair amount of time skipping classes and reading about topics of interest in the library, which included things like computer arithmetic and circuit design. I still have fond memories of Koren’s Computer Arithmetic Algorithms, Chandrakasan et al.’s Design of High-Performance Microprocessor Circuits. I also started reading papers; I spent a lot of time in libraries reading physics and engineering papers that mostly didn’t make sense to me. The notable exception were systems papers, which I found to be easy reading. I distinctly remember reading the Dynamo paper (this was HP’s paper on JITs, not the more recent Amazon work of the same name), but I can’t recall any other papers I read back then.


I had two internships, one at Micron where I “worked on” flash memory, and another at IBM where I worked on the POWER6. The Micron internship was a textbook example of a bad internship. When I showed up, my manager was surprised that he was getting an intern and had nothing for me to do. After a while (perhaps a day), he found an assignment for me: press buttons on a phone. He’d managed to find a phone that used Micron flash chips; he handed it to me, told me to test it, and walked off.

After poking at the phone for an hour or two and not being able to find any obvious bugs, I walked around and found people who had tasks I could do. Most of them were only slightly less manual than “testing” a phone by mashing buttons, but I did one not-totally-uninteresting task, which was to verify that a flash chip’s controller behaved correctly. Unlike my other tasks, this was amenable to automation and I was able to write a perl script to do the testing for me.

I chose perl because someone had a perl book on their desk that I could borrow, which seemed like as good a reason as any at the time. I called up a friend of mine to tell him about this great “new” language and we implemented age of renaissance, a boardgame we’d played in high school. We didn’t finish, but perl was easy enough to use that we felt like we could write a program that actually did something interesting.

Besides learning perl, I learned that I could ask people for books and read them, and I spent most of the rest of my internship half keeping an eye on a manual task while reading the books people had lying around. Most of the books had to do with either analog circuit design or flash memory, so that’s what I learned. None of the specifics have really been useful to me in my career, but I learned two meta-items that were useful.

First, no one’s going to stop you from spending time reading at work or spending time learning (on most teams). Micron did its best to keep interns from learning by having a default policy of blocking interns from having internet access (managers could override the policy, but mine didn’t), but no one will go out of their way to prevent an intern from reading books when their other task is to randomly push buttons on a phone.

Second, I learned that there are a lot of engineering problems we can solve without anyone knowing why. One of the books I read was a survey of then-current research on flash memory. At the time, flash memory relied on some behaviors that were well characterized but not really understood. There were theories about how the underlying physical mechanisms might work, but determining which theory was correct was still an open question.

The next year, I had a much more educational internship at IBM. I was attached to a logic design team on the POWER6, and since they didn’t really know what to do with me, they had me do verification on the logic they were writing. They had a relatively new tool called SixthSense, which you can think of as a souped-up quickcheck. The obvious skill I learned was how to write tests using a fancy testing framework, but the meta-thing I learned which has been even more useful is the fact that writing a test-case generator and a checker is often much more productive than the manual test-case writing that passes for automated testing in most places.

The other thing I encountered for the first time at IBM was version control (CVS, unfortunately). Looking back, I find it a bit surprising that not only did I never use version control in any of my classes, but I’d never met any other students who were using version control. My IBM internship was between undergrad and grad school, so I managed to get a B.S. degree without ever using or seeing anyone use version control.

Computer Science

I took a couple of CS classes. The first was algorithms, which was poorly taught and so heavily curved as a result that I got an A despite not learning anything at all. The course involved no programming and while I could have done some implementation in my free time, I was much more interested in engineering and didn’t try to apply any of the material.

The second course was databases. There were a couple of programming projects, but they were all projects where you got some scaffolding and only had to implement a few key methods to make things work, so it was possible to do ok without having any idea how to program. I got involved in a competition to see who could attend fewest possible classes, didn’t learn anything, and scraped by with a B.

Grad school (2003 - 2005)

After undergrad, I decided to go to grad school for a couple of silly reasons. One was a combination of “why not?” and the argument that most of professors gave, which was that you’ll never go if you don’t go immediately after undergrad because it’s really hard to go back to school later. But the reason that people don’t go back later is because they have more information (they know what both school and work are like), and they almost always choose work! The other major reason was that I thought I’d get a more interesting job with a master’s degree. That’s not obviously wrong, but it appears to bt untrue in general for people going into electrical engineering and programming.

I don’t know that I learned anything that I use today, either in the direct sense or in a meta sense. I had some great professors3 and I made some good friends, but I think that this wasn’t a good use of time because of two bad decisions I made at the age of 19 or 20. Rather than attended a school that had a lot of people working in an area I was interested in, I went with a school that gave me a fellowship that only had one person working in an area I was really interested. That person left just before I started.

I ended up studying optics, and while learning a new field was a lot of fun, the experience was of no particular value to me, and I could have had fun studying something I had more of an interest in.

While I was officially studying optics, I still spent a lot of time learning unrelated things. At one pointed, I decided I should learn Lisp or Haskell, probably because of something Paul Graham wrote. I couldn’t find a Lisp textbook in the library, but I found a Haskell textbook. After I worked through the exercises, I had no idea how to accomplish anything practical. But I did learn about list comprehensions and got in the habit of using higher-order functions.

Based on internet comments and advice, I had the idea learning more languages would teach me how to be a good programmer so I worked through introductory books on Python and Ruby. As far as I can tell, this taught me basically nothing useful and I would have been much better off learning about a specific area (like algorithms or networking) than learning lots of languages.

First real job (2005 - 2013)

Towards the end of grad school, I mostly looked for, and found, electrical/computer engineering jobs. The one notable exception was Google, which called me up in order to fly me out to Mountain View for an interview. I told them that they probably had the wrong person because they hadn’t even done a phone screen, so they offered to do a phone interview instead. I took the phone interview expecting to fail because I didn’t have any CS background, and I failed as expected. In retrospect, I should have asked to interview for a hardware position, but at the time I didn’t know they had hardware positions, even though they’d been putting together their own servers and designing some of their own hardware for years.

Anyway, I ended up at a little chip company called Centaur. I was hesitant about taking the job because the interview was the easiest interview I had at any company4, which made me wonder if they had a low hiring bar, and therefore relatively weak engineers. It turns out that, on average, that’s the best group of people I’ve ever worked with. I didn’t realize it at the time, but this would later teach me that companies that claim to have brilliant engineers because they have super hard interviews are full of it, and that the interview difficulty one-upmanship a lot of companies promote is more of a prestige play than anything else.

But I’m getting ahead of myself – my first role was something they call “regression debug”, which included debugging test failures for both newly generated tests as well as regression tests. The main goal of this job was to teach new employees the ins-and-outs of the x86 architecture. At the time, Centaur’s testing was very heavily based on chip-level testing done by injecting real instructions, interrupts, etc., onto the bus, so debugging test failures taught new employees everything there is to know about x86.

The Intel x86 manual is thousands of pages long and it isn’t sufficient to implement a compatible x86 chip. When Centaur made its first x86 chip, they followed the Intel manual in perfect detail, and left all instances of undefined behavior up to individual implementors. When they got their first chip back and tried it, they found that some compilers produced code that relied on the behavior that’s technically undefined on x86, but happened to always be the same on Intel chips. While that’s technically a compiler bug, you can’t ship a chip that isn’t compatible with actually existing software, and ever since then, Centaur has implemented x86 chips by making sure that the chips match the exact behavior of Intel chips, down to matching officially undefined behavior5.

For years afterwards, I had encyclopedic knowledge of x86 and could set bits in control registers and MSRs from memory. I didn’t have a use for any of that knowledge at any future job, but the meta-skill of not being afraid of low-level hardware comes in handy pretty often, especially when I run into compiler or chip bugs. People look at you like you’re a crackpot if you say you’ve found a hardware bug, but because we were so careful about characterizing the exact behavior of Intel chips, we would regularly find bugs and then have discussions about whether we should match the bug or match the spec (the Intel manual).

The other thing I took away from the regression debug experience was a lifelong love of automation. Debugging often involves a large number of mechanical steps. After I learned enough about x86 that debugging became boring, I started automating debugging. At that point, I knew how to write simple scripts but didn’t really know how to program, so I wasn’t able to totally automate the process. However, I was able to automate enough that, for 99% of failures, I just had to glance at a quick summary to figure out what the bug was, rather than spend what might be hours debugging. That turned what was previously a full-time job into something that took maybe 30-60 minutes a day (excluding days when I’d hit a bug that involved some obscure corner of x86 I wasn’t already familiar with, or some bug that my script couldn’t give a useful summary of).

At that point, I did two things that I’d previously learned in internships. First, I started reading at work. I began with online commentary about programming, but there wasn’t much of that, so I asked if I could expense books and read them at work. This seemed perfectly normal because a lot of other people did the same thing, and there were at least two people who averaged more than one technical book per week, including one person who averaged a technical book every 2 or 3 days.

I settled in at a pace of somewhere between a book a week and a book a month. I read a lot of engineering books that imparted some knowledge that I no longer use, now that I spend most of my time writing software; some “big idea” software engineering books like Design Patterns and Refactoring, which I didn’t really appreciate because I was just writing scripts; and a ton of books on different programming languages, which doesn’t seem to have had any impact on me.

The only book I read back then that changed how I write software in a way that’s obvious to me was The Design of Everyday Things. The core idea of the book is that while people beat themselves up for failing to use hard-to-understand interfaces, we should blame designers for designing poor interfaces, not users for failing to use them.

If you ever run into a door that you incorrectly try to pull instead of push (or vice versa) and have some spare time, try watching how other people use the door. Whenever I do this, I’ll see something like half the people who try the door use it incorrectly. That’s a design flaw!

The Design of Everyday Things has made me a lot more receptive to API and UX feedback, and a lot less tolerant of programmers who say things like “it’s fine – everyone knows that the arguments to foo and bar just have to be given in the opposite order” or “Duh! Everyone knows that you just need to click on the menu X, select Y, navigate to tab Z, open AA, go to tab AB, and then slide the setting to AC.”

I don’t think all of that reading was a waste of time, exactly, but I would have been better off picking a few sub-fields in CS or EE and learning about them, rather than reading the sorts of books O’Reilly and Manning produce.

It’s not that these books aren’t useful, it’s that almost all of them are written to make sense without any particular background beyond what any random programmer might have, and you can only get so much out of reading your 50th book targeted at random programmers. IMO, most non-academic conferences have the same problem. As a speaker, you want to give a talk that works for everyone in the audience, but a side effect of that is that many talks have relatively little educational value to experienced programmers who have been to a few conferences.

I think I got positive things out of all that reading as well, but I don’t know yet how to figure out what those things are.

As a result of my reading, I also did two things that were, in retrospect, quite harmful.

One was that I really got into functional programming and used a functional style everywhere I could. Immutability, higher-order X for any possible value of X, etc. The result was code that I could write and modify quickly that was incomprehensible to anyone but a couple of coworkers who were also into functional programming.

The second big negative was that I became convinced that perl was causing us a lot of problems. We had perl scripts that were hard to understand and modify. They’d often be thousands of lines of code with only one or two functions and no tests which used every obscure perl feature you could think of. Static! Magic sigils! Implicit everything! You name it, we used it. For me, the last straw was when I inserted a new function between two functions which didn’t explicitly pass any arguments and return values – and broke the script because one of the functions was returning a value into an implicit variable which was getting read by the next function. By putting another function in between the two closely coupled functions, I broke the script.

After that, I convinced a bunch of people to use Ruby and started using it myself. The problem was that I only managed to convince half of my team to do this The other half kept using Perl, which resulted in language fragmentation. Worse yet, in another group, they also got fed up with Perl, but started using Python, resulting in the company having code in Perl, Python, and Ruby.

Centaur has an explicit policy of not telling people how to do anything, which precludes having team-wide or company-wide standards. Given the environment, using a “better” language seemed like a natural thing to do, but I didn’t recognize the cost of fragmentation until, later in my career, I saw a company that uses standardization to good effect.

Anyway, while I was causing horrific fragmentation, I also automated away most of my regression debug job. I got bored of spending 80% of my time at work reading and I started poking around for other things to do, which is something I continued for my entire time at Centaur. I like learning new things, so I did almost everything you can do related to chip design. The only things I didn’t do were circuit design (the TL of circuit design didn’t want a non-specialist interfering in his area) and a few roles where I was told “Dan, you can do that if you really want to, but we pay you too much to have you do it full-time.”

If I hadn’t interviewed regularly (about once a year, even though I was happy with my job), I probably would’ve wondered if I was stunting my career by doing so many different things, because the big chip companies produce specialists pretty much exclusively. But in interviews I found that my experience was valued because it was something they couldn’t get in-house. The irony is that every single role I was offered would have turned me into a specialist. Big chip companies talk about wanting their employees to move around and try different things, but when you dig into what that means, it’s that they like to have people work one very narrow role for two or three years before moving on to their next very narrow role.

For a while, I wondered if I was doomed to either eventually move to a big company and pick up a hyper-specialized role, or stay at Centaur for my entire career (not a bad fate – Centaur has, by far, the lowest attrition rate of any place I’ve worked because people like it so much). But I later found that software companies building hardware accelerators actually have generalist roles for hardware engineers, and that software companies have generalist roles for programmers, although that might be a moot point since most software folks would probably consider me an extremely niche specialist.

Regardless of whether spending a lot of time in different hardware-related roles makes you think of me as a generalist or a specialist, I picked up a lot of skills which came in handy when I worked on hardware accelerators, but that don’t really generalize to the pure software project I’m working on today. A lot of the meta-skills I learned transfer over pretty well, though.

If I had to pick the three most useful meta-skills I learned back then, I’d say they were debugging, bug tracking, and figuring out how to approach hard problems.

Debugging is a funny skill to claim to have because everyone thinks they know how to debug. For me, I wouldn’t even say that I learned how to debug at Centaur, but that I learned how to be persistent. Non-deterministic hardware bugs are so much worse than non-deterministic software bugs that I always believe I can track down software bugs. In the absolute worst case, when there’s a bug that isn’t caught in logs and can’t be caught in a debugger, I can always add tracing information until the bug becomes obvious. The same thing’s true in hardware, but “recompiling” to add tracing information takes 3 months per “recompile”; compared to that experience, tracking down a software bug that takes three months to figure out feels downright pleasant.

Bug tracking is another meta-skill that everyone thinks they have, but when when I look at most projects I find that they literally don’t know what bugs they have and they lose bugs all the time due to a failure to triage bugs effectively. I didn’t even know that I’d developed this skill until after I left Centaur and saw teams that don’t know how to track bugs. At Centaur, depending on the phase of the project, we’d have between zero and a thousand open bugs. The people I worked with most closely kept a mental model of what bugs were open; this seemed totally normal at the time, and the fact that a bunch of people did this made it easy for people to be on the same page about the state of the project and which areas were ahead of schedule and which were behind.

Outside of Centaur, I find that I’m lucky to even find one person who’s tracking what the major outstanding bugs are. Until I’ve been on the team for a while, people are often uncomfortable with the idea of taking a major problem and putting it into a bug instead of fixing it immediately because they’re so used to bugs getting forgotten that they don’t trust bugs. But that’s what bug tracking is for! I view this as analogous to teams whose test coverage is so low and staging system is so flaky that they don’t trust themselves to make changes because they don’t have confidence that issues will be caught before hitting production. It’s a huge drag on productivity, but people don’t really see it until they’ve seen the alternative.

Perhaps the most important meta-skill I picked up was learning how to solve large problems. When I joined Centaur, I saw people solving problems I didn’t even know how to approach. There were folks like Glenn Henry, a fellow from IBM back when IBM was at the forefront of computing, and Terry Parks, who Glenn called the best engineer he knew at IBM. It wasn’t that they were 10x engineers; they didn’t just work faster. In fact, I can probably type 10x as quickly as Glenn (a hunt and peck typist) and could solve trivial problems that are limited by typing speed more quickly than him. But Glenn, Terry, and some of the other wizards knew how to approach problems that I couldn’t even get started on.

I can’t cite any particular a-ha moment. It was just eight years of work. When I went looking for problems to solve, Glenn would often hand me a problem that was slightly harder than I thought possible for me. I’d tell him that I didn’t think I could solve the problem, he’d tell me to try anyway, and maybe 80% of the time I’d solve the problem. We repeated that for maybe five or six years before I stopped telling Glenn that I didn’t think I could solve the problem. Even though I don’t know when it happened, I know that I eventually started thinking of myself as someone who could solve any open problem that we had.

Grad school, again (2008 - 2010)

At some point during my tenure at Centaur, I switched to being part-time and did a stint taking classes and doing a bit of research at the local university. For reasons which I can’t recall, I split my time between software engineering and CS theory.

I read a lot of software engineering papers and came to the conclusion that we know very little about what makes teams (or even individuals) productive, and that the field is unlikely to have actionable answers in the near future. I also got my name on a couple of papers that I don’t think made meaningful contributions to the state of human knowledge.

On the CS theory side of things, I took some graduate level theory classes. That was genuinely educational and I really “got” algorithms for the first time in my life, as well as complexity theory, etc. I could have gotten my name on a paper that I didn’t think made a meaningful contribution to the state of human knowledge, but my would-be co-author felt the same way and we didn’t write it up.

I originally tried grad school again because I was considering getting a PhD, but I didn’t find the work I was doing to be any more “interesting” than the work I had at Centaur, and after seeing the job outcomes of people in the program, I decided there was less than 1% chance that a PhD would provide any real value to me and went back to Centaur full time.

RC (Spring 2013)

After eight years at Centaur, I wanted to do something besides microprocessors. I had enough friends at other hardware companies to know that I’d be downgrading in basically every dimension except name recognition if I switched to another hardware company, so I started applying to software jobs.

While I was applying to jobs, I heard about RC. It sounded great, maybe even too great: when I showed my friends what people were saying about it, they thought the comments were fake. It was a great experience, and I can see why so many people raved about it, to the point where real comments sound impossibly positive. It was transformative for a lot of people; I heard a lot of exclamations like “I learned more here in 3 months here than in N years of school” or “I was totally burnt out and this was the first time I’ve been productive in a year”. It wasn’t transformative for me, but it was as fun a 3 month period as I’ve ever had, and I even learned a thing or two.

From a learning standpoint, the one major thing I got out of RC was feedback from Marek, whom I worked with for about two months. While the freedom and lack of oversight at Centaur was great for letting me develop my ability to work independently, I basically didn’t get any feedback on my work6 since they didn’t do code review while I was there, and I never really got any actionable feedback in performance reviews.

Marek is really great at giving feedback while pair programming, and working with him broke me of a number of bad habits as well as teaching me some new approaches for solving problems. At a meta level, RC is relatively more focused on pair programming than most places and it got me to pair program for the first time. I hadn’t realized how effective pair programming with someone is in terms of learning how they operate and what makes them effective. Since then, I’ve asked a number of super productive programmers to pair program and I’ve gotten something out of it every time.

Second real job (2013 - 2014)

I was in the right place at the right time to land on a project that was just transitioning from Andy Phelps’ pet 20% time project into what would later be called the Google TPU.

As far as I can tell, it was pure luck that I was the second engineer on the project as opposed to the fifth or the tenth. I got to see what it looks like to take a project from its conception and turn it into something real. There was a sense in which I got that at Centaur, but every project I worked on was either part of a CPU, or a tool whose goal was to make CPU development better. This was the first time I worked on a non-trivial project from its inception, where I wasn’t just working on part of the project but the whole thing.

That would have been educational regardless of the methodology used, but it was a particularly great learning experience because of how the design was done. We started with a lengthy discussion on what core algorithm we were going to use. After we figured out an algorithm that would give us acceptable performance, we coded up design docs for every major module before getting serious about implementation.

Many people consider writing design docs to be a waste of time nowadays, but going through this process, which took months, had a couple big advantages.The first is that working through a design collaboratively teaches everyone on the team everyone else’s tricks. It’s a lot like the kind of skill transfer you get with pair programming, but applied to design. This was great for me, because as someone with only a decade of experience, I was one of the least experienced people in the room.

The second is that the iteration speed is much faster in the design phase, where throwing away a design just means erasing a whiteboard. Once you start coding, iterating on the design can mean throwing away code; for infrastructure projects, that can easily be person-years or even tens of persons-years of work. Since working on the TPU project, I’ve seen a couple of teams on projects of similar scope insist on getting “working” code as soon as possible. In every single case, that resulted in massive delays as huge chunks of code had to be re-written, and in a few cases the project was fundamentally flawed in a way that required the team had to start over from scratch.

I get that on product-y projects, where you can’t tell how much traction you’re going to get from something, you might want to get an MVP out the door and iterate, but for pure infrastructure, it’s often possible to predict how useful something will be in the design phase.

The other big thing I got out of the job was a better understanding of what’s possible when a company makes a real effort to make engineers productive. Something I’d seen repeatedly at Centaur was that someone would come in, take a look around, find the tooling to be a huge productivity sink, and then make a bunch of improvements. They’d then feel satisfied that they’d improved things a lot and then move on to other problems. Then the next new hire would come in, have the same reaction, and do the same thing. The result was tools that that improved a lot while I was there, but not to the point where someone coming in would be satisfied with them. Google was the only place I’d worked where a lot of the tools seem like magic compared to what exists in the outside world7. Sure, people complain that a lot of the tooling is falling over, that there isn’t enough documentation, and that a lot of it is out of date. All true. But the situation is much better than it’s been at any other company I’ve worked at. That doesn’t seem to actually be a competitive advantage for Google’s business, but it makes the development experience really pleasant.

Third real job (2015 - Present)

It’s hard for me to tell what I’ve learned until I’ve had a chance to apply it elsewhere, so this section is a TODO until I move onto another role. I feel like I’m learning a lot right now, but I’ve noticed that feeling like I’m learning a lot at the time is weakly correlated to whether or not I learn skills that are useful in the long run. Unless I get re-org’d or someone makes me an offer I can’t refuse, it seems unlikely that I’d move on until my current project is finished, which seems likely to be at least another 6-12 months.

What about the bad stuff?

When I think about my career, it seems to me that it’s been one lucky event after the next. I’ve been unlucky a few times, but I don’t really know what to take away from the times I’ve been unlucky.

For example, I’d consider my upbringing to be mildly abusive. I remember having nights where I couldn’t sleep because I’d have nightmares about my father every time I fell asleep. Being awake during the day wasn’t a great experience, either. That’s obviously not good, and in retrospect it seems pretty directly related to the academic problems I had until I moved out, but I don’t know that I could give useful advice to a younger version of myself. Don’t be born into an abusive family? That’s something people would already do if they had any control over the matter.

Or to pick a more recent example, I once joined a team that scored a 1 on the Joel Test. The Joel Test is now considered to be obsolete because it awards points for things like “Do you have testers?” and “Do you fix bugs before writing new code?”, which aren’t considered best practices by most devs today. Of the items that aren’t controversial, many seem so obvious that they’re not worth asking about, things like:

  • Can you make a build in one step?
  • Do you make daily builds?
  • Do you have a bug database?
  • Do new candidates write code during their interview?

For anyone who cares about this kind of thing, it’s clearly not a great idea to join a team that does, at most, 1 item off of Joel’s checklist. Getting first-hand experience on a team that scored a 1 didn’t give me any new information that would make me reconsider my opinion.

You might say that I should have asked about those things. It’s true! I should have, and I probably will in the future. However, when I was hired, the TL who was against version control and other forms of automation hadn’t been hired yet, so I wouldn’t have found out about this if I’d asked. Furthermore, even he’d already been hired, I’m still not sure I would have found out about it – this is the only time I’ve joined a team and then found the most of the factual statements made during the recruiting process were untrue. When I was on that team, every day featured a running joke between team members about how false the recruiting pitch was.

I could try to prevent similar problems in the future by asking for concrete evidence of factual claims (e.g., if someone claims the attrition rate is X, I could ask for access to the HR database to verify), but considering that I have a finite amount of time and the relatively low probability of being told outright falsehoods, I think I’m going to continue to prioritize finding out other information when I’m considering a job and just accept that there’s a tiny probability I’ll end up in a similar situation in the future.

When I look at the bad career-related stuff I’ve experienced, almost all of it falls into one of two categories: something obviously bad that was basically unavoidable, or something obviously bad that I don’t know how to reasonably avoid, given limited resources. I don’t see much to learn from that. That’s not to say that I haven’t made and learned from mistakes. I’ve made a lot of mistakes and do a lot of things differently as a result of mistakes! But my worst experiences have come out of things that I don’t know how to prevent in any reasonable way.

This also seems to be true for most people I know. For example, something I’ve seen a lot is that a friend of mine will end up with a manager whose view is that managers are people who dole out rewards and punishments (as opposed to someone who believes that managers should make the team as effective as possible, or someone who believes that managers should help people grow). When you have a manager like that, a common failure mode is that you’re given work that’s a bad fit, and then maybe you don’t do a great job because the work is a bad fit. If you ask for something that’s a better fit, that’s refused (why should you be rewarded with doing something you want when you’re not doing good work, instead you should be punished by having to do more of this thing you don’t like), which causes a spiral that ends in the person leaving or getting fired. In the most recent case I saw, the firing was a surprise to both the person getting fired and their closest co-workers: my friend had managed to find a role that was a good fit despite the best efforts of management; when management decided to fire my friend, they didn’t bother to consult the co-workers on the new project, who thought that my friend was doing great and had been doing great for months!

I hear a lot of stories like that, and I’m happy to listen because I like stories, but I don’t know that there’s anything actionable here. Avoid managers who prefer doling out punishments to helping their employees? Obvious but not actionable.


The most common sort of career advice I see is “you should do what I did because I’m successful”. It’s usually phrased differently, but that’s the gist of it. That basically never works. When I compare notes with friends and acquaintances, it’s pretty clear that my career has been unusual in a number of ways, but it’s not really clear why.

Just for example, I’ve almost always had a supportive manager who’s willing to not only let me learn whatever I want on my own, but who’s willing to expend substantial time and effort to help me improve as an engineer. Most folks I’ve talked to have never had that. Why the difference? I have no idea.

One story might be: the two times I had unsupportive managers, I quickly found other positions, whereas a lot of friends of mine will stay in roles that are a bad fit for years. Maybe I could spin it to make it sound like the moral of the story is that you should leave roles sooner than you think, but both of the bad situations I ended up in, I only ended up in because I left a role sooner than I should have, so the advice can’t be “prefer to leave roles sooner than you think”. Maybe the moral of the story should be “leave bad roles more quickly and stay in good roles longer”, but that’s so obvious that it’s not even worth stating. Every strategy that I can think of is either incorrect in the general case, or so obvious there’s reason to talk about it.

Another story might be: I’ve learned a lot of meta-skills that are valuable, so you should learn these skills. But you probably shouldn’t. The particular set of meta-skills I’ve picked have been great for me because they’re skills I could easily pick up in places I worked (often because I had a great mentor) and because they’re things I really strongly believe in doing. Your circumstances and core beliefs are probably different from mine and you have to figure out for yourself what it makes sense to learn.

Yet another story might be: while a lot of opportunities come from serendipity, I’ve had a lot of opportunities because I spend a lot of time generating possible opportunities. When I passed around the draft of this post to some friends, basically everyone told me that I emphasized luck too much in my narrative and that all of my lucky breaks came from a combination of hard work and trying to create opportunities. While there’s a sense in which that’s true, many of my opportunities also came out of making outright bad decisions.

For example, I ended up at Centaur because I turned down the chance to work at IBM for a terrible reason! At the end of my internship, my manager made an attempt to convince me to stay on as a full-time employee, but I declined because I was going to grad school. But I was only going to grad school because I wanted to get a microprocessor logic design position, something I thought I couldn’t get with just a bachelor’s degree. But I could have gotten that position if I hadn’t turned my manager down! I’d just forgotten the reason that I’d decided to go to grad school and incorrectly used the cached decision as a reason to turn down the job. By sheer luck, that happened to work out well and I got better opportunities than anyone I know from my intern cohort who decided to take a job at IBM. Have I “mostly” been lucky or prepared? Hard to say; maybe even impossible.

Careers don’t have the logging infrastructure you’d need to determine the impact of individual decisions. Careers in programming, anyway. Many sports now track play-by-play data in a way that makes it possible to try to determine how much of success in any particular game or any particular season was luck and how much was skill.

Take baseball, which is one of the better understood sports. If we look at the statistical understanding we have of performance today, it’s clear that almost no one had a good idea about what factors made players successful 20 years ago. One thing I find particularly interesting is that we now have much better understanding of which factors are fundamental and which factors come down to luck, and it’s not at all what almost anyone would have through 20 years ago. We can now look at a pitcher and say something like “they’ve gotten unlucky this season, but their foo, bar, and baz rates are all great so it appears to be bad luck on balls in play as opposed any sort of decline in skill”, and we can also make statements like “they’ve done well this season but their fundamental stats haven’t moved so it’s likely that their future performance will be no better than their past performance before this season”. We couldn’t have made a statement like that 20 years ago. And this is a sport that’s had play-by-play video available going back what seems like forever, where play-by-play stats have been kept for a century, etc.

In this sport where everything is measured, it wasn’t until relatively recently that we could disambiguate between fluctuations in performance due to luck and fluctuations due to changes in skill. And then there’s programming, where it’s generally believed to be impossible to measure people’s performance and the state of the art in grading people’s performance is that you ask five people for their comments on someone and then aggregate the comments. If we’re only just now able to make comments on what’s attributable to luck and what’s attributable to skill in a sport where every last detail of someone’s work is available, how could we possibly be anywhere close to making claims about what comes down to luck vs. other factors in something as nebulous as a programming career?

In conclusion, life is messy and I don’t have any advice.

Appendix A: meta-skills I’d like to learn


I once worked with Jared Davis, a documentation wizard whose documentaiton was so good that I’d go to him to understand how a module worked before I talked to the owner the module. As far as I could tell, he wrote documentation on things he was trying to understand to make life easier for himself, but his documentation was so good that was it was a force multiplier for the entire company.

Later, at Google, I noticed a curiously strong correlation between the quality of initial design docs and the success of projects. Since then, I’ve tried to write solid design docs and documentation for my projects, but I still have a ways to go.

Fixing totally broken situations

So far, I’ve only landed on teams where things are much better than average and on teams where things are much worse than average. You might think that, because there’s so much low hanging fruit on teams that are much worse than average, it should be easier to improve things on teams that are terrible, but it’s just the opposite. The places that have a lot of problems have problems because something makes it hard to fix the problems.

When I joined the team that scored a 1 on the Joel Test, it took months of campaigning just to get everyone to use version control.

I’ve never seen a enviroment go from “bad” to “good” and I’d be curious to know what that looks like and how it happens. Yossi Kreinin’s thesis is that only management can fix broken situations. That might be true, but I’m not quite ready to believe it just yet, even though I don’t have any evidence to the contrary.

Appendix B: other “how I became a programmer” stories

Kragen. Describes 27 years of learning to program. Heavy emphasis on conceptual phases of development (e.g., understanding how to use provided functions vs. understanding that you can write arbitrary functions)

Julia Evans. Started programming on a TI-83 in 2004. Dabbled in programing until college (2006-2011) and has been working as a professional programmer ever since. Some emphasis on the “journey” and how long it takes to improve.

Tavish Armstrong. 4th grade through college. Emphasis on particular technologies (e.g., LaTeX or Python).

Caitie McCaffrey . Started programming in AP computer science. Emphasis on how interests led to a career in programming.

Matt DeBoard. Spent 12 weeks learning Django with the help of a mentor. Emphasis on the fact that it’s possible to become a programmer without programming background.

Kristina Chodorow. Started in college. Emphasis on alternatives (math, grad school).

Michael Bernstein. Story of learning Haskell over the course of years. Emphasis on how long it took to become even minimally proficient.

Thanks to Leah Hanson, Lindsey Kuper, Kelley Eskridge, Jeshua Smith, Tejas Sapre, Adrien Lamarque, Maggie Zhou, Lisa Neigut, Steve McCarthy, Darius Bacon, Kaylyn Gibilterra, and Sarah Ransohoff for comments/criticism/discussion.

  1. If you happen to have contact information for Mr. Swanson, I’d love to be able to send a note saying thanks. [return]
  2. Wayne Dickey, Richard Brualdi, Andreas Seeger, and a visiting professor whose name escapes me. [return]
  3. I strongly recommend Andy Weiner for any class, as well as the guy who taught mathematical physics when I sat in on it, but I don’t remember who that was or if that’s even the exact name of the class. [return]
  4. with the exception of one government lab, which gave me an offer on the strength of a non-technical on-campus interview. I believe that was literally the first interview I did when I was looking for work, but they didn’t get back to me until well after interview season was over and I’d already accepted an offer. I wonder if that’s because they went down the list of candidates in some order and only got to me after N people turned them down or if they just had a six month latency on offers. [return]
  5. Because Intel sees no reason to keep its competitors informed about what it’s doing, this results in a substantial latency when matching new features. They usually announce enough information that you can implement the basic functionality, but behavior on edge cases may vary. We once had a bug (noticed and fixed well before we shipped, but still problematic) where we bought an engineering sample off of ebay and implemented some new features based on the engineering sample. This resulted in an MWAIT bug that caused Windows to hang; Intel had changed the behavior of MWAIT between shipping the engineering sample and shipping the final version.

    I recently saw a post that claims that you can get great performance per dollar by buying some engineering samples off of ebay. Don’t do this. Engineering samples regularly have bugs. Sometimes those bugs are actual bugs, and sometimes it’s just that Intel changed their minds. Either way, you really don’t want to run production systems off of engineering samples.

  6. I occasionally got feedback by taking a problem I’d solved to someone and asking them if they had any better ideas, but that’s much less in depth than the kind of feedback I’m talking about here. [return]
  7. To pick one arbitrary concrete example, look at version control at Microsoft from someone who worked on Windows Vista:

    In small programming projects, there’s a central repository of code. Builds are produced, generally daily, from this central repository. Programmers add their changes to this central repository as they go, so the daily build is a pretty good snapshot of the current state of the product.

    In Windows, this model breaks down simply because there are far too many developers to access one central repository. So Windows has a tree of repositories: developers check in to the nodes, and periodically the changes in the nodes are integrated up one level in the hierarchy. At a different periodicity, changes are integrated down the tree from the root to the nodes. In Windows, the node I was working on was 4 levels removed from the root. The periodicity of integration decayed exponentially and unpredictably as you approached the root so it ended up that it took between 1 and 3 months for my code to get to the root node, and some multiple of that for it to reach the other nodes. It should be noted too that the only common ancestor that my team, the shell team, and the kernel team shared was the root.

    Google and Microsoft both maintained their own forks of perforce because that was the most scalable source control system available at the time. Google would go on to build piper, a distributed version control system (in the distributed systems sense, not in the git sense) that solved the scaling problem, despite having a dev experience that wasn’t nearly as painful. But that option wasn’t really on the table at Microsoft. In the comments to the post quoted above, a then-manager at Microsoft commented that the possible options were:

    1. federate out the source tree, and pay the forward and reverse integration taxes (primarily delay in finding build breaks), or…
    2. remove a large number of the unnecessary dependencies between the various parts of Windows, especially the circular dependencies.
    3. Both 1&2 #1 was the winning solution in large part because it could be executed by a small team over a defined period of time. #2 would have required herding all the Windows developers (and PMs, managers, UI designers…), and is potentially an unbounded problem.

    Someone else commented, to me, that they were on an offshoot team that got the one-way latency down from months to weeks. That’s certainly an improvement, but why didn’t anyone build a system like piper? I asked that question of people who were at Microsoft at the time, and I got answers like “when we started using perforce, it was so much faster than what we’d previously had that it didn’t occur to people that we could do much better” and “perforce was so much faster than xcopy that it seemed like magic”.

    This general phenomenon, where people don’t attempt to make a major improvement because the current system is already such a huge improvement over the previous system, is something I’d seen before and even something I’d done before. This example happens to use Microsoft and Google, but please don’t read too much into that. There are systems where things are flipped around and the system at Google is curiously unwieldy compared to the same system at Microsoft.

12 Sep 13:55

How I got a CS degree without learning what a system call is

Yesterday I wrote that I have 2 CS degrees but didn't know what a system call was when I graduated. Some people think this is surprising and a failure of CS education.

I don't have any opinions really about what a CS education should be, but, to explain how this happened, I wrote down a while ago all the classes I took in my joint math/CS undergrad. The math & CS theory classes at my university were extremely good, so I just took all of them.

This is just to say that "a CS degree" can represent a lot of different educations, and personally I think that's totally fine. I know people who mostly did electrical engineering and human computer interaction! Took a ton of biology classes because they were studying bioinformatics!

But more importantly -- it's ok to not know things. I knew practically nothing about a lot of really important programming concepts when I got out of grad school. Even though I'd started learning to program 8 years before! Now I know those things! I learned them.

I have friends who are amazing programmers who sometimes feel bad because they don't have a CS degree and sometimes don't know algorithms/CS theory stuff that other people know. They've learned the things they needed to know! They are great.

Here's the list.


MATH 235 Algebra 1
MATH 242 Analysis 1
MATH 248 Honours Advanced Calculus
MATH 325 Honours ODE's
MATH 251 Honours Algebra 2
MATH 255 Honours Analysis 2
MATH 377 Honours Number Theory
MATH 354 Honours Analysis 3
MATH 356 Honours Probability
MATH 366 Honours Complex Analysis
MATH 370 Honours Algebra 3
MATH 371 Honours Algebra 4
MATH 355 Honours Analysis 4

MATH 350 Graph Theory and Combinatorics
COMP 250 Intro to Computer Science
COMP 252 Algorithms and Data Structures
COMP 506 Advanced Analysis of Algorithms
COMP 567 Discrete Optimization 2
MATH 552 Combinatorial Optimization
MATH 560 Continuous Optimization
COMP 690 Probabilistic Analysis of Algorithms

COMP 273 Intro to Computer Systems
COMP 206 Intro to Software Systems

COMP 302 Programming Languages & Paradigms
COMP 524 Theoretical Foundations of Programming Languages
COMP 330 Theoretical Aspects of Comp Sci

COMP 761 Quantum Information Theory
COMP 462 Computational Biology Methods
COMP 520 Compiler Design

grad school

I had to take 6 classes during my master's. They were:

  • Higher Algebra 1
  • Geometry & Topology 1
  • Geometry & Topology 2
  • Machine Learning
  • Topics in Computer Science (lie algebras)
  • Advanced Topics Theory 2 (I don't remember the topic right now)

My master's thesis.

14 Sep 10:06

Someone Is Learning How to Take Down the Internet - Lawfare

by brandizzi

Over the past year or two, someone has been probing the defenses of the companies that run critical pieces of the Internet. These probes take the form of precisely calibrated attacks designed to determine exactly how well these companies can defend themselves, and what would be required to take them down. We don't know who is doing this, but it feels like a large a large nation state. China and Russia would be my first guesses.

First, a little background. If you want to take a network off the Internet, the easiest way to do it is with a distributed denial-of-service attack (DDoS). Like the name says, this is an attack designed to prevent legitimate users from getting to the site. There are subtleties, but basically it means blasting so much data at the site that it's overwhelmed. These attacks are not new: hackers do this to sites they don't like, and criminals have done it as a method of extortion. There is an entire industry, with an arsenal of technologies, devoted to DDoS defense. But largely it's a matter of bandwidth. If the attacker has a bigger fire hose of data than the defender has, the attacker wins.

Recently, some of the major companies that provide the basic infrastructure that makes the Internet work have seen an increase in DDoS attacks against them. Moreover, they have seen a certain profile of attacks. These attacks are significantly larger than the ones they're used to seeing. They last longer. They're more sophisticated. And they look like probing. One week, the attack would start at a particular level of attack and slowly ramp up before stopping. The next week, it would start at that higher point and continue. And so on, along those lines, as if the attacker were looking for the exact point of failure.

The attacks are also configured in such a way as to see what the company's total defenses are. There are many different ways to launch a DDoS attacks. The more attack vectors you employ simultaneously, the more different defenses the defender has to counter with. These companies are seeing more attacks using three or four different vectors. This means that the companies have to use everything they've got to defend themselves. They can't hold anything back. They're forced to demonstrate their defense capabilities for the attacker.

I am unable to give details, because these companies spoke with me under condition of anonymity. But this all is consistent with what Verisign is reporting. Verisign is the registrar for many popular top-level Internet domains, like .com and .net. If it goes down, there's a global blackout of all websites and e-mail addresses in the most common top-level domains. Every quarter, Verisign publishes a DDoS trends report. While its publication doesn't have the level of detail I heard from the companies I spoke with, the trends are the same: "in Q2 2016, attacks continued to become more frequent, persistent, and complex."

There's more. One company told me about a variety of probing attacks in addition to the DDoS attacks: testing the ability to manipulate Internet addresses and routes, seeing how long it takes the defenders to respond, and so on. Someone is extensively testing the core defensive capabilities of the companies that provide critical Internet services.

Who would do this? It doesn't seem like something an activist, criminal, or researcher would do. Profiling core infrastructure is common practice in espionage and intelligence gathering. It's not normal for companies to do that. Furthermore, the size and scale of these probes—and especially their persistence—points to state actors. It feels like a nation's military cybercommand trying to calibrate its weaponry in the case of cyberwar. It reminds me of the U.S.'s Cold War program of flying high-altitude planes over the Soviet Union to force their air-defense systems to turn on, to map their capabilities.

What can we do about this? Nothing, really. We don't know where the attacks come from. The data I see suggests China, an assessment shared by the people I spoke with. On the other hand, it's possible to disguise the country of origin for these sorts of attacks. The NSA, which has more surveillance in the Internet backbone than everyone else combined, probably has a better idea, but unless the U.S. decides to make an international incident over this, we won't see any attribution.

But this is happening. And people should know. 

Let's block ads! (Why?)

13 Sep 18:28

Adblock Plus now sells ads | The Verge

by brandizzi

Adblock Plus is launching a new service that... uh, puts more ads on your screen.

Rather than stripping all ads from the internet forever, Adblock Plus is hoping to replace the bad ads — anything it deems too big, too ugly, or too intrusive — with good ads, ones that are smaller, subtler, and theoretically much less annoying.

It’ll begin doing that through an ad marketplace, which will allow blogs and other website operators to pick out so-called “acceptable” ads and place them on their pages. If a visitor using Adblock Plus comes to the page, they’ll be shown those “acceptable ads,” instead of whatever ads the site would normally run.

“It allows you to treat the two different ecosystems completely differently and monetize each one,” says Ben Williams, Adblock Plus’ operations and communications director. “And crucially, monetize the ad blockers on on their own terms.”

The marketplace is a extension of the Acceptable Ads program that Adblock Plus has been running since 2011. Since then, the ad blocker has defaulted to “whitelisting” approved ads, so that they show up even when users have the blocker turned on. But the program has been fairly limited in scope, since publishers and ad networks need to specifically work with (and pay) Adblock Plus to have their ads deemed acceptable. It’s a time-consuming process, Williams emphasized, which limits how many websites can sign up to display ads to would-be blockers.

Adblock Plus hopes that, through this new marketplace, there’ll be a big expansion in the usage of Acceptable Ads. Because they’re already picked out and ready to go, any publisher will be able to sign up, plug some code into their website, and start running whitelisted ads. None of the ads are able to track visitors from site to site, and they’ll all be limited to certain dimensions and page locations, as defined by Adblock Plus’ guidelines.

The program is meant to be friendly to publishers — it is, after all, letting them display some ads instead of none whatsoever. But there’s still obvious reason for publishers to be unhappy. Acceptable ads are likely to be less valuable than the ads a publisher could otherwise display, limiting what a website can earn. And in setting up its own marketplace, Adblock Plus continues to position itself as a gatekeeper charging a toll to get through a gate of its own making.

Publishers will get to keep 80 percent of all ad revenue from marketplace ads, with the remaining 20 percent being divided between various other parties involved with serving the ads. Adblock Plus will receive 6 percent of total revenue.

Williams says he can see why publishers might be unhappy with this arrangement at first, but he says the Acceptable Ads program is meant to solve a problem that would exist no matter what. “Ad blocking would have happened with or without us,” says Williams. “What we were able to do is try and reverse the spread of 100 percent black-and-white ad blocking, blocking everything ... Acceptable ads was a pivot toward what we think is better.”

The ad marketplace is launching in beta today and is supposed to launch in full later this year. At the same time, Adblock Plus is working toward setting up a committee of publishers, privacy advocates, and advertisers to figure out the future of what its Acceptable Ad guidelines should look like. That too is supposed to get nailed down sometime later this year, with committee meetings beginning next year.

Let's block ads! (Why?)

12 Sep 12:42

Before You Spend $26,000 on Weight-Loss Surgery, Do This -

by brandizzi

Earlier this year, the Food and Drug Administration approved a new weight-loss procedure in which a thin tube, implanted in the stomach, ejects food from the body before all the calories can be absorbed.

Some have called it “medically sanctioned bulimia,” and it is the latest in a desperate search for new ways to stem the rising tides of obesity and Type 2 diabetes. Roughly one-third of adult Americans are now obese; two-thirds are overweight; and diabetes afflicts some 29 million. Another 86 million Americans have a condition called pre-diabetes. None of the proposed solutions have made a dent in these epidemics.

Recently, 45 international medical and scientific societies, including the American Diabetes Association, called for bariatric surgery to become a standard option for diabetes treatment. The procedure, until now seen as a last resort, involves stapling, binding or removing part of the stomach to help people shed weight. It costs $11,500 to $26,000, which many insurance plans won’t pay and which doesn’t include the costs of office visits for maintenance or postoperative complications. And up to 17 percent of patients will have complications, which can include nutrient deficiencies, infections and intestinal blockages.

It is nonsensical that we’re expected to prescribe these techniques to our patients while the medical guidelines don’t include another better, safer and far cheaper method: a diet low in carbohydrates.

Once a fad diet, the safety and efficacy of the low-carb diet have now been verified in more than 40 clinical trials on thousands of subjects. Given that the government projects that one in three Americans (and one in two of those of Hispanic origin) will be given a diagnosis of diabetes by 2050, it’s time to give this diet a closer look.

When someone has diabetes, he can no longer produce sufficient insulin to process glucose (sugar) in the blood. To lower glucose levels, diabetics need to increase insulin, either by taking medication that increases their own endogenous production or by injecting insulin directly. A patient with diabetes can be on four or five different medications to control blood glucose, with an annual price tag of thousands of dollars.

Yet there’s another, more effective way to lower glucose levels: Eat less of it.

Glucose is the breakdown product of carbohydrates, which are found principally in wheat, rice, corn, potatoes, fruit and sugars. Restricting these foods keeps blood glucose low. Moreover, replacing those carbohydrates with healthy protein and fats, the most naturally satiating of foods, often eliminates hunger. People can lose weight without starving themselves, or even counting calories.

Most doctors — and the diabetes associations — portray diabetes as an incurable disease, presaging a steady decline that may include kidney failure, amputations and blindness, as well as life-threatening heart attacks and stroke. Yet the literature on low-carbohydrate intervention for diabetes tells another story. For instance, a two-week study of 10 obese patients with Type 2 diabetes found that their glucose levels normalized and insulin sensitivity was improved by 75 percent after they went on a low-carb diet.

At our obesity clinics, we’ve seen hundreds of patients who, after cutting down on carbohydrates, lose weight and get off their medications. One patient in his 50s was a brick worker so impaired by diabetes that he had retired from his job. He came to see one of us last winter, 100 pounds overweight and panicking. He’d been taking insulin prescribed by a doctor who said he would need to take it for the rest of his life. Yet even with insurance coverage, his drugs cost hundreds of dollars a month, which he knew he couldn’t afford, any more than he could bariatric surgery.

Instead, we advised him to stop eating most of his meals out of boxes packed with processed flour and grains, replacing them with meat, eggs, nuts and even butter. Within five months, his blood-sugar levels had normalized, and he was back to working part-time. Today, he no longer needs to take insulin.

Another patient, in her 60s, had been suffering from Type 2 diabetes for 12 years. She lost 35 pounds in a year on a low-carb diet, and was able to stop taking her three medications, which included more than 100 units of insulin daily.

One small trial found that 44 percent of low-carb dieters were able to stop taking one or more diabetes medications after only a few months, compared with 11 percent of a control group following a moderate-carb, lower-fat, calorie-restricted diet. A similarly small trial reported those numbers as 31 percent versus 0 percent. And in these as well as another, larger, trial, hemoglobin A1C, which is the primary marker for a diabetes diagnosis, improved significantly more on the low-carb diet than on a low-fat or low-calorie diet. Of course, the results are dependent on patients’ ability to adhere to low-carb diets, which is why some studies have shown that the positive effects weaken over time.

A low-carbohydrate diet was in fact standard treatment for diabetes throughout most of the 20th century, when the condition was recognized as one in which “the normal utilization of carbohydrate is impaired,” according to a 1923 medical text. When pharmaceutical insulin became available in 1922, the advice changed, allowing moderate amounts of carbohydrates in the diet.

Yet in the late 1970s, several organizations, including the Department of Agriculture and the diabetes association, began recommending a high-carb, low-fat diet, in line with the then growing (yet now refuted) concern that dietary fat causes coronary artery disease. That advice has continued for people with diabetes despite more than a dozen peer-reviewed clinical trials over the past 15 years showing that a diet low in carbohydrates is more effective than one low in fat for reducing both blood sugar and most cardiovascular risk factors.

The diabetes association has yet to acknowledge this sizable body of scientific evidence. Its current guidelines find “no conclusive evidence” to recommend a specific carbohydrate limit. The organization even tells people with diabetes to maintain carbohydrate consumption, so that patients on insulin don’t see their blood sugar fall too low. That condition, known as hypoglycemia, is indeed dangerous, yet it can better be avoided by restricting carbs and eliminating the need for excess insulin in the first place. Encouraging patients with diabetes to eat a high-carb diet is effectively a prescription for ensuring a lifelong dependence on medication.

At the annual diabetes association convention in New Orleans this summer, there wasn’t a single prominent reference to low-carb treatment among the hundreds of lectures and posters publicizing cutting-edge research. Instead, we saw scores of presentations on expensive medications for blood sugar, obesity and liver problems, as well as new medical procedures, including that stomach-draining system, temptingly named AspireAssist, and another involving “mucosal resurfacing” of the digestive tract by burning the inside of the duodenum with a hot balloon.

Interactive Feature | Sign Up for the Opinion Today Newsletter Every weekday, get thought-provoking commentary from Op-Ed columnists, the Times editorial board and contributing writers from around the world.

We owe our patients with diabetes more than a lifetime of insulin injections and risky surgical procedures. To combat diabetes and spare a great deal of suffering, as well as the $322 billion in diabetes-related costs incurred by the nation each year, doctors should follow a version of that timeworn advice against doing unnecessary harm — and counsel their patients to first, do low carbs.

Let's block ads! (Why?)

17 Sep 05:13

Comic for September 17, 2016

by Scott Adams
16 Sep 20:14

More Cluster Fudge HERE

More Cluster Fudge HERE

16 Sep 19:08


by Will Tirando


16 Sep 16:06

Ghost Detective

by Reza


16 Sep 07:00

Whomp! - Pig Kahuna


New comic!

Today's News:
15 Sep 19:59

Já pensou?

by Will Tirando
Adam Victor Brandizzi



15 Sep 07:01

The Pirate and the Ent

by Doug
15 Sep 07:43

Comic for September 15, 2016

by Scott Adams
14 Sep 16:40

Holding Someone’s Baby

by Brian
13 Sep 10:59


by Dorktoes


Hurray! Vacation comics! ALSO AN ANIMATED GIF. Pepijn bought a go-pro and we filmed lotsa underwater stuff

14 Sep 14:46

Saturday Morning Breakfast Cereal - Wishes


Screw it. I'm gonna go steal some souls from kids playing D&D.

New comic!
Today's News:
14 Sep 01:00

How to Talk Behind Someone's Back

by Scott Meyer

Once, a long, long time ago, I got into a fight with one of my older brother’s friends. I won’t get into the reasons. I was 14, he was 17, none of us handled the situation particularly well. My brother ended up taking his friend’s side and there was quite a bit of yelling.

Later that evening, when things had cooled down a bit, my brother (who I remind you, was 17 at the time, an age at which most of us say a lot of stupid things) told me, “I don’t like your friends either, but at least I have the guts to insult them when they’re not around.”

The guy I’d been arguing with and I looked at each other, then started laughing so hard that any ill will left from our argument completely dissipated.


You can comment on this comic on Facebook.

As always, thanks for using my Amazon Affiliate links (USUKCanada).

13 Sep 14:28

Saturday Morning Breakfast Cereal - You're Off This Case



New comic!
Today's News:

Just a few days left to submit your BAHFest West proposal, for a chance to share the stage with Adam Savage, Mary Roach, and Phil Plait!

13 Sep 07:35

Photo Album

by Doug

Photo Album

Dedicated to long-time reader Sarah, who is celebrating a milestone birthday this year – happy birthday to you! :)

And here’s more language.

13 Sep 05:00

Comic for 2016.09.13

by Rob DenBleyker
12 Sep 17:16

Spaceship Earth

by Reza


12 Sep 00:00

Earth Temperature Timeline

[After setting your car on fire] Listen, your car's temperature has changed before.
12 Sep 14:26

Atitude positiva

by ricardo coimbra
Clique na imagem para aumentar