Shared posts

08 Mar 22:09

Why Yammer believes the traditional engineering organizational structure is dead

by Kris Gale
Claus.dahl

I think this is a pretty clever response to "the bandwidth problem" of software design, i.e. the growing coordination costs of larger teams. I don't however think, that you can get around the fundamentals of complex systems. If it's complex, it's going to have costs. My preference is productization of subsystems, but old school project control isn't nearly as bad as they tell you, when done well.

Structure

Kris Gale, VP of Engineering at Yammer, argues the key to building fast at scale lies in small teams. He explores intricacies of organizational design in engineering and explains how making intentional decisions throughout growth ensures that you don’t lose the things that are unique and special about being a startup. At the end of the day, it’s your job as a CTO to think about how engineering can be organized and optimized.

Small teams ship faster

In the early days, the Yammer team took a very heads-down approach: by focusing much more on the product itself, they didn’t really consider hiring in the context of building a larger organization. As the company continued to expand, they realized that the marginal increase in productivity for every new engineer decreases over time because of greater overhead.

Simultaneously, the rest of the world recognized the fact that Yammer was doing big things; startups were mimicking their product left and right, and even the big companies were launching products to directly compete with theirs. Yammer felt strongly that there could only be a single dominant player in the space and if they weren’t nimble enough it wouldn’t be them. Modern web applications need to be nimble and need to change.

Yammer employed a small teams approach in order to ship faster. However, it’s more than just organizing your company into small teams. If those teams are in any way restricted from shipping code into production, then they’re useless. They need to be free to get stuff done outside of the larger organization.

Specialization in small teams

In the very early days of Yammer, when the team first launched a feature, they’d split it up amongst the three engineers by way of specialization. It wasn’t terribly rigid, so if there was a ton of Rails work, Gale would still help out even though Rails wasn’t his key strength. Your goal should be to create similar-styled groups, ones that are small and specialized but not rigidly siloed because different problems will demand different expertise (and you should have different experience).

Serial processes and big tech companies

Even though this team approach worked really well at — three engineers, as the Yammer team grew they moved to a more traditional model of engineering organizational design, one where the team was broken up by functional expertise. So there was a back-end team, front-end team, mobile team, etc. By the end of 2010 the company was up from three engineers working on features to roughly thirty. But were they ten times faster? Nope.

Per Amdahl’s law of parallel computing, the speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if you have a computation (that you can only parallelize half of), you could throw 100 processors at it, and you’d only get a twofold increase in speed. If you throw another 100 processors at it, you’re just going to get slightly closer to 2x. You spend a fixed amount of time in the serial half of that computation.

Too many people make the incorrect assumption that software engineering is writing code to a spec…but it’s not. What also matters are the engineering decisions you make, not just the code you write – and no one builds engineering organizations with this in mind. If you think that it’s the engineering management’s job, or the engineers who have “levelup-ed” in the ranks to make engineering decisions, you’re probably making a really expensive mistake.

If you imagine a typical medium to large engineering organization chart, you might have a front-end team, a back-end team, and potentially a “middleware” team. Those teams own their code bases; each one has a proprietary manager and those managers report up to some boss. The point is that the organization usually matches the code’s architecture. When it comes time to do work, the managers at the top have to make decisions about what’s being built before they can break up and delegate work. You then have questions like “What is the backend team going to do?” and “How does that interface with the front-end team?”

Ultimately, if you’ve broken up work in the way described above, where the top-level managers have to divide tasks and then delegate them, you’re doing it wrong. Think about it: If the individual who’s actually implementing the code spots something that’s wrong with the spec, he or she has to propose a change all the way up the ladder which then has to filter back down. It’s a blocking process and will bring product development to a halt. Meanwhile the other engineers in different parts of the organization will see this as churn since they’re not working closely with the engineer who proposed the change. They won’t understand the rationale behind the revision itself.

At Yammer, they said screw all of that.

Management should not make engineering decisions

People will always tell you to hire individuals who are better and smarter than you. If you take them up on their advice, then shouldn’t you be able to trust these people to make decisions that would normally fall on you? Ultimately, it’s your job as an engineering leader to build and nurture an organization. Your transition from coding to focusing exclusively on the organization probably needs to happen sooner than you think.

I don’t think you should be building a product. I think you should be building an organization that builds a product.

Be very wary of only trusting managers with engineering decisions; in fact, you should delegate these all the way down to individual contributors. If managers are the only ones making decisions as you grow past thirty to forty people, this should be a red flag.

So how do you actually build features with this in mind?

When Yammer builds features, they aim to improve one of their three core metrics:

  • Virality
  • Engagement
  • Monetization

On a more basic level, Yammer aims to build features that attract, retain, and sell to customers. While your core metrics might not be the same, you should definitely start with a few key goals that are communicated throughout the organization. Otherwise, you can’t enable everyone at your company to make good decisions.

In a traditional organization, PMs will come up with an idea and write a spec for it; that’s sort of a misnomer at Yammer. Rather than build a rigid specification of what needs to be built, a spec is viewed as a starting point for a cross-functional team to fully flesh it out. If there’s too much text or too much already prescribed, be wary. You should want your engineers who are involved to understand the decisions that went into the feature so they can implement the code in the most efficient and effective way possible.

The “2 and 10″ rule

Yammer’s biggest rule of thumb is “2 to 10 people, 2 to 10 weeks,” which means they generally don’t do projects that are larger or more complicated. There is a non-linear relationship between the complexity of a project and the wrap-up integration phase at the end. If you go anywhere beyond ten weeks, the percentage of time in the wrap-up phase becomes disproportionate.

If you employ the “2 to 10″ rule, it’ll also force you to release often, test your assumptions, and not over-invest in mistakes. It’s sort of the lean startup mentality, and if you’re going to try and do that you have to codify it in your organization.

You must develop a sense of urgency. Often very long projects cause engineers to lose track of the end goal. Think of it in terms of hiking: you start off feeling really fresh, you’re super excited about the hike ahead and you’re moving quickly. As you make progress though, your body starts to get tired and you can’t see where you began or where you’re going. If you’re at that point, it takes a lot of mental will to force yourself to keep moving; unfortunately, a lot of organizations have put engineers in that middle phase for the majority of their jobs.

But, near the end, you can start to make out the end of your trail and you get excited; every step you take is clearly moving towards bringing the goal closer to you. It’s important to keep your engineers in this state, one where they can measure progress and see it visually. It’s the only way to maintain urgency and morale.

Code ownership

Beware of creating code ownership – designing your organization in such a way where people own code bases can create a lot of perverse incentives that you’ll want to avoid.  At Yammer, organization is broken into areas of expertise. Ultimately, engineers are really smart and self-motivated and if you get them aligned with your business’s goals they’ll do amazing things (even autonomously).

Determining what the team that actually does the work will look like

Before Yammer puts together a product team around a feature, they examine a spec in better detail; specifically, they try and estimate the amount of effort that needs to go into a given feature.

Take Product X: It’s an imaginary feature that’s going to increase virality by providing a way to invite your friends during the signup flow. In this particular example, it’s fair to say that it’s front-end heavy so you might need two UI people. And since you’d probably need to change some of the sign-up flow, you’ll need someone from your Rails team to code these new endpoints.

Once your product team agrees on a priority for the project, all that’s left is waiting for the engineers (in this case two front-end and a Rails guy) to free up. At Yammer, they actually have a big physical whiteboard called the “Big Board” that has a sweeping grid overlay. On one side the board lists the projects; on another, there’s a list of all of the engineers who work on features. There’s an obvious physical constraint where an engineer can only be assigned to one project at a time. The big board also helps to provide transparency about priorities. Every single engineering development resource is accounted for, and at any point the CEO can walk by and say, “Oh, so this is what engineering is doing.”

If you’re able to guarantee complete focus on a single project, it will speed up your company’s velocity in a big way. While everybody knows how expensive context switching is, it’s staggering that nobody builds that into their organization as a constraint. With total focus, you build one thing, ship it, and then are able to move onto something else.

But then who’s left to fix the bugs?

With everyone working on features who’s available to handle bug squashing? At Yammer they just build more cross-functional teams to handle this responsibility. They take a few people from the Rails team, the front-end team, the mobile team, etc. and say, “Your job is to handle all the incoming bugs and work down our list.” It’s a temporary gig (like all the project-oriented groups), and people rotate on and off of it. This organizational structure has allowed them to handle support in a way that doesn’t block feature-engineering.

It also doesn’t create a second class of engineers. As opposed to simply telling a bunch of junior engineers to fix a bunch of bugs, they involve the senior guys too. This is very intentional: when you fix bugs you want people to be able to fix the root of the cause, not just for a ticket to go away. Senior people should feel empowered to refactor code if necessary.

This system also creates a feedback loop between the people building features and those fixing the bugs. When people see the frequency and types of bugs, it helps to inform decisions about product engineering moving forward.

[Image Credit: Photoctor on Flickr]

Kris Gale

Kris oversees all product development engineering at Yammer. He was an original member of Yammer’s engineering team, serving as Director of Infrastructure, where he focused on performance and scalability from the company’s public launch through its first three years of rapid growth.


08 Mar 22:04

The Oreo separator machine

by Jason Kottke
Claus.dahl

sådan

And the TED Prize ("awarded to an extraordinary individual with a creative and bold vision to spark global change") this year goes to this guy, who invented a machine for separating Oreos:

Congratulations! (thx, brad)

Tags: Oreo   video
08 Mar 21:57

The professor and the bikini model

by Jason Kottke
Claus.dahl

Afsløring: Fysikere har også tissemænd

Paul Frampton is a 69-year-old theoretical particle physicist who has co-authored papers with Nobel laureates. In late 2011, the absentminded professor met a Czech bikini model online. Over email and Yahoo chat, they became romantically involved and she sent him a plane ticket to come meet her at a photo shoot in Bolivia. Then she asked him to bring a bag of hers with him on his flight.

While in Bolivia, Frampton corresponded with an old friend, John Dixon, a physicist and lawyer who lives in Ontario. When Frampton explained what he was up to, Dixon became alarmed. His warnings to Frampton were unequivocal, Dixon told me not long ago, still clearly upset: "I said: 'Well, inside that suitcase sewn into the lining will be cocaine. You're in big trouble.' Paul said, 'I'll be careful, I'll make sure there isn't cocaine in there and if there is, I'll ask them to remove it.' I thought they were probably going to kidnap him and torture him to get his money. I didn't know he didn't have money. I said, 'Well, you're going to be killed, Paul, so whom should I contact when you disappear?' And he said, 'You can contact my brother and my former wife.' " Frampton later told me that he shrugged off Dixon's warnings about drugs as melodramatic, adding that he rarely pays attention to the opinions of others.

On the evening of Jan. 20, nine days after he arrived in Bolivia, a man Frampton describes as Hispanic but whom he didn't get a good look at handed him a bag out on the dark street in front of his hotel. Frampton was expecting to be given an Hermès or a Louis Vuitton, but the bag was an utterly commonplace black cloth suitcase with wheels. Once he was back in his room, he opened it. It was empty. He wrote to Milani, asking why this particular suitcase was so important. She told him it had "sentimental value." The next morning, he filled it with his dirty laundry and headed to the airport.

Crazy story. (via @stevenstrogatz)

Tags: crime   drugs   Paul Frampton   physics   science   travel
08 Mar 14:29

"I’m acting healthier. I walk my ten thousand steps, I pass up my son’s offer of pink..."

Claus.dahl

Min Mood Panda. Hvis jeg kunne @-mentionne Hønissen her så gjorde jeg det.

“I’m acting healthier. I walk my ten thousand steps, I pass up my son’s offer of pink ice-cream-filled Oreos.

And yet, sometimes my Mood Panda drops to 3. I feel like I’m getting a preview of a dystopia worthy of a young-adult novel. When we all start extreme recording, we’ll all have to censor ourselves. We’ll all be as careful as politicians, knowing that we risk making our own version of Romney’s 47 percent remark. We’ll all have to worry more about hackers and Big Brother poaching our data. It will be a world with a lot less mystery, which might mean a lot less fun. How do you plan a surprise party when all your friends know exactly where you are at all times?

And yes, you’ll have a full record of life — but will it be the record of a lesser life? Because that’s the problem with reality — it’s not really life. Reality is messy, nuanced, repetitive, and dull.”

- Nine Weeks with a Camera - AJ Jacobs I Am a Camera - Esquire
08 Mar 13:59

Four short links: 6 March 2013

by Nat Torkington
Claus.dahl

Det' noget geeky shit

  1. High Performance Networking in Google Chrome — far more than you ever wanted to know about how Chrome is so damn fast.
  2. Tactical Chathow the military uses IRC to wage war.
  3. http-console — a REPL loop for HTTP.
  4. Inductive Charger for Magic Mouse — my biggest bugbear with Bluetooth devices is the incessant appetite for batteries. Huzzah!
07 Mar 10:17

PGP

Claus.dahl

Jeg har altid godt kunnet lide at billetterne i DSBs billet-app har *hologram* - altså en animeret gif. Giver lige den ekstra sikkerhed.

If you want to be extra safe, check that there's a big block of jumbled characters at the bottom.
06 Mar 22:21

Beethoven's pulsing dance beats

by Jason Kottke
Claus.dahl

Lidt træls speedsnakker stil i programmet, men sjovt fact, og man, jeg ville gerne høre B5 sådan

Late in his life, just after the invention of the metronome and after completely losing his hearing, Beethoven went back and adjusted the tempos of his symphonies to much faster than you might expect. Radiolab investigates.

Tags: Ludwig von Beethoven   music   Radiolab
06 Mar 21:32

Finally! More devices using Android 4 than older versions

by Kevin C. Tofel
Claus.dahl

It also has to do with Froyo being a great leap in functionality way back when.

It has taken since the introduction of Android 4.0 in Dec. 2011 until now, but there are finally more devices running Android 4.0 or better software than those that run older versions of Google’s platform. On its Android Developer Dashboard, Google notes that 45.1 percent of Androids hitting the Google Play store of late use Android 4.0 or better. That compares to the 44.2 percent that still use Android 2.3 Gingerbread software.

The uptake of Android 4.0 and its sub-versions of late has been quick. In October, I saw that 1 in 4 devices visiting Google Play used Android 4.0 or better. At that time, I suggested that we’d see half of all Androids use recent versions of software within four to six months. We’re not at the halfway mark yet, but it’s only been four months. With the acceleration of phones and tablets running newer software, I won’t be surprised to see us reaching the tipping point next month.

Android versions Feb 2013

Clearly helping this phenomenon is Android’s changing pace. It has slowed over the past year or so, and that’s a good thing. It means that Android is more on par with iOS and other platforms than ever before. That’s part of the reason some prominent long-time iPhone users are now checking out Android — listen to our latest podcast to hear more on that topic, because there are other reasons as well.

Hardware makers have also “caught” up to the software changes. Even after Android 4.0 arrived in late 2011, it took a good six months for phones to ship with a recent version of Android. By and large many of these now ship with Android 4.1 and not Android 4.2, but the differences between the versions aren’t that great. If the average consumer were to compare an Android 4.1 phone to one with Android 4.2, it’s safe to say they’d be hard pressed to tell the two apart.

The feature differences brought by distributed Android software updates has been a key target for iOS users when looking to criticize Android. These points have definitely had merit; particularly early on in Android’s life-cycle. But I’d argue that Google’s issue has largely diminished and it’s really not that different on iOS; it’s just handled differently.

Some iOS features found in software aren’t applicable to older devices and yet, these are reported as having the same version of iOS as devices that can use the new features. The last three iPad models Apple has produced can run iOS 6, which includes Siri, for example, but only Apple’s third- and fourth generation iPad’s can actually use Siri; different code is actually pushed by Apple to different devices, yet all have the same public version number.

Regardless of which platform you use, this should help Android developers target more devices for mobile apps. And they shouldn’t have to worry as much about version numbers or supported API levels as more Androids run newer versions of the platform.

This story was updated at 2:18 pm to correct the point about iOS 6 compatibility with iPads. Originally, the post incorrectly stated that all iPads can run it.


Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

06 Mar 19:53

Nyt fra internettet 4

by Morten
Claus.dahl

Logaritme => Algoritme. Omkring det her med "data løser problemet", så passer det jo heldigvis ikke, det er tit falske proxies. Tag nu bare Klout.....

1.
Tilbage i 2009 skrev jeg en blogpost, En tsunami af lort. Den beskrev hvordan Demand Media bruger lavtlønnedede medarbejdere og en logaritme algoritme til at producere et væld af dårligt indhold til internettet. Udtrykket dukkede op i min bevidsthed, da jeg læste et fortrinligt blog-indlæg om, hvordan logaritmer ikke bare bruges til at producere “indhold” men også bruges til at producere fysiske produkter. T-shirts med vittigheder om voldtægt og bizarre bøger bliver til en slags produktspam. Indlægget er fyldt med gode links og er det mest interessante, jeg har læst på nettet i 2013.

2.
Alle på internettet skriver for tiden om Google Glass. Jeg synes, den mest interessante anmeldelse, jeg har læst er fra The Verge.

3.

Et af de mest interessante kulturelle fænomener på nettet lige nu er Netflix’ fremragende politiske intrigeserie, House of Cards med Kevin Spacey. Grunden til, at den er interessant er, at den bl.a. er blevet til på baggrund af Netflix’ kæmpe mængder data om, hvad deres brugere ser. Der var fx et stort segment, der både var glade for politiske tv-serier og for Kevin Spacey… Og for David Fincher, så ham gjorde de da til instruktør.

På sin vis er det jo godt nok at bruge data til at skabe nye sammenhænge. Men det kan også blive umådeligt trist, hvis man bruger data til at fjerne alt det, folk ikke kan lide fra kunst: Vi kan se, at folk zapper ved den her slags scener…

Mere her, her og her.

4.

Data var også en af kernerne i den amerikanske præsidentvalgkamp. Her er et par rigtigt interessante artikler om det: Technology Review og NY Times.

5.

Sidste teknologi-skeptiske links fra mig handler om, hvordan Tesco- og Amazon-ansatte bærer særlige armbåndscomputere, som mikrostyrer deres handlinger. Således reducerer man andre mennesker til effektive robotter, der ikke skal tænke selv.

06 Mar 19:47

Jim'll Paint It

Claus.dahl

Herlig, herlig samling - selvom jeg ku' såmænd være kunstneren. Ville sætte pris på hvis det bare var en algoritme.

finally, a drawing of Moby throwing ninja stars at a melancholic badger  
06 Mar 19:43

Your innovation is killing us

by Hamish McKenzie
Claus.dahl

MAV-videoens people=perps er scary. Ross Andersons teknoparanoia er der ikke noget nyt i, men det er sjovt sat op.

BigDog

Entrepreneurs and technologists, be careful with just how hard you push your innovating. One day it’s a photo app, the next it’s a self-aware nanofactory that has the ability to self-replicate ad infinitum and supplant the human race.

For now, location-aware predictive search engines and wearable computers are all very cool, but in just a few generations all this technology could quickly turn against us. Take, for instance, the dangers outlined in Ross Anderson’s 8,000-word feature on existential threat, published in Aeon magazine and excerpted in a Kottke.org post headlined “Will technology help humans conquer the universe or kill us all?”

For a start, people, please cease your efforts to build super-intelligent but utterly unempathetic machines. As Anderson writes, such artificial intelligence (AI) is bound to act against human interest:

If its goal is to win at chess, an AI is going to model chess moves, make predictions about their success, and select its actions accordingly. It’s going to be ruthless in achieving its goal, but within a limited domain: the chessboard. But if your AI is choosing its actions in a larger domain, like the physical world, you need to be very specific about the goals you give it.

‘The basic problem is that the strong realisation of most motivations is incompatible with human existence,’ [Daniel] Dewey [a research fellow at the Future of Humanity Institute] told me. ‘An AI might want to do certain things with matter in order to achieve a goal, things like building giant computers, or other large-scale engineering projects. Those things might involve intermediary steps, like tearing apart the Earth to make huge solar panels. A superintelligence might not take our interests into consideration in those situations, just like we don’t take root systems or ant colonies into account when we go to construct a building.’

Also, you should forget about building machines that know the answer to every question. Again, refer to Anderson’s piece:

‘Let’s say you have an Oracle AI that makes predictions, or answers engineering questions, or something along those lines,’ Dewey told me. ‘And let’s say the Oracle AI has some goal it wants to achieve. Say you’ve designed it as a reinforcement learner, and you’ve put a button on the side of it, and when it gets an engineering problem right, you press the button and that’s its reward. Its goal is to maximise the number of button presses it receives over the entire future. See, this is the first step where things start to diverge a bit from human expectations. We might expect the Oracle AI to pursue button presses by answering engineering problems correctly. But it might think of other, more efficient ways of securing future button presses….

‘One day we might ask it how to cure a rare disease that we haven’t beaten yet. Maybe it would give us a gene sequence to print up, a virus designed to attack the disease without disturbing the rest of the body. And so we sequence it out and print it up, and it turns out it’s actually a special-purpose nanofactory that the Oracle AI controls acoustically. Now this thing is running on nanomachines and it can make any kind of technology it wants, so it quickly converts a large fraction of Earth into machines that protect its button, while pressing it as many times per second as possible. After that it’s going to make a list of possible threats to future button presses, a list that humans would likely be at the top of. Then it might take on the threat of potential asteroid impacts, or the eventual expansion of the Sun, both of which could affect its special button. You could see it pursuing this very rapid technology proliferation, where it sets itself up for an eternity of fully maximised button presses. You would have this thing that behaves really well, until it has enough power to create a technology that gives it a decisive advantage – and then it would take that advantage and start doing what it wants to in the world.’

Yeah, so enough of that. And, please, if you do insist on building such a super-smart machine, at the very least don’t put any buttons on it.

While we’re at it, I would like to implore Boston Dynamics to stop building robots that have the ability to hurl humans across rooms. For any intelligent machine, it will be just too tempting a proposition. (Thanks to John Biggs at TechCrunch for alerting us to this imminent threat.)

If you are a smart technologist who happens to be working with the US military, I’d ask you to abandon research on tiny bug-like drones that can fly into our windows and kill us. Can’t we just accept death the old-fashioned way, by unmanned aircraft flying so high that we can’t even see them?

Zen Robotics, your recycling robot? Wouldn’t be surprised if that Johnny Five-look-alike has secret plans to repurpose our internal organs. Scrap that, please.

And Google, for God’s sake, stop with the Glass project already, okay? It’s going to give us all cancer.

Hamish McKenzie

hamishmckenzie Hamish McKenzie is a Baltimore-based reporter for PandoDaily who covers media, politics, and international startups. His first name is pronounced "hey-mish" and you can follow him on Twitter.


06 Mar 19:29

"Companies are now able to search and analyse up to two years of Twitter updates for market research..."

Claus.dahl

So, it's now a business decision that Twitter-search is shit. And they are locking your data more and more down. Exit-heading time comes closer.

Companies are now able to search and analyse up to two years of Twitter updates for market research purposes.

Firms can search tweets back to January 2010 in order to plan marketing campaigns, target influential users or even try to predict certain events.

Until today, only the previous 30 days of tweets were available for companies to search. Regular users can access posts from the past seven days.



-

BBC News - Twitter partners with Datasift to unlock tweet archive

Via Julian Oliver on Twitter: “Fellow data worker, I bring great news! Twitter has successfully sold 2 yrs of your tweets to an analytics company!”

(via new-aesthetic)

06 Mar 19:28

mermaid-sea-slut: when my grandma was in the hospital before...

Claus.dahl

Touching - and hard core



mermaid-sea-slut:

when my grandma was in the hospital before she died, she was too sick to speak, so she had to write everything down.; i got her handwriting tattooed on me in memory of her.

06 Mar 19:00

The history of Hadoop: From 4 nodes to the future of data

by Derrick Harris
Claus.dahl

Det' grundig søgehistorje

Depending on how one defines its birth, Hadoop is now 10 years old. In that decade, Hadoop has gone from being the hopeful answer to Yahoo’s search-engine woes to a general-purpose computing platform that’s poised to be the foundation for the next generation of data-based applications.

Alone, Hadoop is a software market that IDC predicts will be worth $813 million in 2016 (although that number is likely very low), but it’s also driving a big data market the research firm predicts will hit more than $23 billion by 2016. Since Cloudera launched in 2008, Hadoop has spawned dozens of startups and spurred hundreds of millions in venture capital investment since 2008.

In this four-part series, we’ll explain everything anyone concerned with information technology needs to know about Hadoop. Part I is the history of Hadoop from the people who willed it into existence and took it mainstream. Part II is more graphic; a map of the now-large and complex ecosystem of companies selling Hadoop products. Part III is a look into the future of Hadoop that should serve as an opening salvo for much of the discussion at our Structure: Data conference March 20-21 in New York. Finally, Part IV will highlight some the best Hadoop applications and seminal moments in Hadoop history, as reported by GigaOM over the years.

Wanted: A better search engine

Almost everywhere you go online now, Hadoop is there in some capacity. Facebook, eBay, Etsy, Yelp , Twitter, Salesforce.com — you name a popular web site or service, and the chances are it’s using Hadoop to analyze the mountains of data it’s generating about user behavior and even its own operations. Even in the physical world, forward-thinking companies in fields ranging from entertainment to energy management to satellite imagery are using Hadoop to analyze the unique types of data they’re collecting and generating.

Everyone involved with information technology at least knows what it is. Hadoop even serves as the foundation for new-school graph and NoSQL databases, as well as bigger, badder versions of relational databases that have been around for decades.

But it wasn’t always this way, and today’s uses are a long way off from the original vision of what Hadoop could be.

Doug Cutting

Doug Cutting

When the seeds of Hadoop were first planted in 2002, the world just wanted a better open-source search engine. So then-Internet Archive search director Doug Cutting and University of Washington graduate student Mike Cafarella set out to build it. They called their project Nutch and it was designed with that era’s web in mind.

Looking back on it today, early iterations of Nutch were kind of laughable. About a year into their work on it, Cutting and Cafarella thought things were going pretty well because Nutch was already able to crawl and index hundreds of millions of pages. “At the time, when we started, we were sort of thinking that a web search engine was around a billion pages,” Cutting explained to me, “so we were getting up there.”

There are now about 700 million web sites and, according to Wired’s Kevin Kelly, well over a trillion web pages.

But getting Nutch to work wasn’t easy. It could only run across a handful of machines, and someone had to watch it around the clock to make sure it didn’t fall down.

Mike Cafarella

Mike Cafarella

“I remember working on it for several months, being quite proud of what we had been doing, and then the Google File System paper came out and I realized ‘Oh, that’s a much better way of doing it. We should do it that way,’” reminisced Cafarella. “Then, by the time we had a first working version, the MapReduce paper came out and that seemed like a pretty good idea, too.”

Google released the Google File System paper in October 2003 and the MapReduce paper in December 2004. The latter would prove especially revelatory to the two engineers building Nutch.

“What they spent a lot of time doing was generalizing this into a framework that automated all these steps that we were doing manually,” Cutting explained.

Raymie Stata, founder and CEO of Hadoop startup VertiCloud (and former Yahoo CTO), calls MapReduce “a fantastic kind of abstraction” over the distributed computing methods and algorithms most search companies were already using:

“Everyone had something that pretty much was like MapReduce because we were all solving the same problems. We were trying to handle literally billions of web pages on machines that are probably, if you go back and check, epsilon more powerful than today’s cell phones. … So there was no option but to latch hundreds to thousands of machines together to build the index. So it was out of desperation that MapReduce was invented.”

MapReduce diagram, from the Google paper

Parallel processing in MapReduce, from the Google paper

Over the course of a few months, Cutting and Cafarella built up the underlying file systems and processing framework that would become Hadoop (in Java, notably, whereas Google’s MapReduce used C++) and ported Nutch on top of it. Now, instead of having one guy watch a handful of machines all day long, Cutting explained, they could just set it running on between 20 and 40 machines that he and Cafarella were able to scrape together from their employers.

Bringing Hadoop to life (but not in search)

Anyone vaguely familiar with the history of Hadoop can guess what happens next: In 2006, Cutting went to work with Yahoo, which was equally impressed by the Google File System and MapReduce papers and wanted to build open source technologies based on them. They spun out the storage and processing parts of Nutch to form Hadoop (named after Cutting’s son’s stuffed elephant) as an open-source Apache Software Foundation project and the Nutch web crawler remained its own separate project.

“This seem like a perfect fit because I was looking for more people to work on it, and people who had thousands of computers to run it on,” Cutting said.

Cafarella, now an associate professor at the University of Michigan, opted to forgo a career in corporate IT and focus on his education. He’s happy as a professor — and currently working on a Hadoop-complementary project called RecordBreaker — but, he joked, “My dad calls me the Pete Best of the big data world.”

Ironically, though, the 2006-era Hadoop was nowhere near ready to handle production search workloads at webscale — the very task it was created to do. “The thing you gotta remember,” explained Hortonworks Co-founder and CEO Eric Baldeschwieler (who was previously VP of Hadoop software development at Yahoo), “is at the time we started adopting it, the aspiration was definitely to rebuild Yahoo’s web search infrastructure, but Hadoop only really worked on 5 to 20 nodes at that point, and it wasn’t very performant, either.”

Baldeschwieler at Hadoop Summit 2010. Source: Yodel Anectdotal

Baldeschwieler at Hadoop Summit 2010. Source: Yodel Anectdotal

Stata recalls a “slow march” of horizontal scalability, growing Hadoop’s capabilities from the single digits of nodes into the tens of nodes and ultimately into the thousands. “It was just an ongoing slog … every factor of 2 or 1.5 even was serious engineering work,” he said. But Yahoo was determined to scale Hadoop as far as it needed to go, and it continued investing heavy resources into the project.

It actually took years for Yahoo to moves its web index onto Hadoop, but in the meantime the company made what would be a fortuitous decision to set up what it called a “research grid” for the company’s data scientists, to use today’s parlance. It started with dozens of nodes and ultimately grew to hundreds as they added more and more data and Hadoop’s technology matured. What began life as a proof of concept fast became a whole lot more.

“This very quickly kind of exploded and became our core mission,” Baldeschwieler said, “because what happened is the data scientists not only got interesting research results — what we had anticipated — but they also prototyped new applications and demonstrated that those applications could substantially improve Yahoo’s search relevance or Yahoo’s advertising revenue.”

Shortly thereafter, Yahoo began rolling out Hadoop to power analytics for various production applications. Eventually, Stata explained, Hadoop had proven so effective that Yahoo merged its search and advertising into one unit so that Yahoo’s bread-and-butter sponsored search business could benefit from the new technology.

Cutting (center) flanked by Baldeschwieler and Om Malik at GigaOM's Hadoop Meetup in 2008.

Cutting (center) flanked by Baldeschwieler and Om Malik at GigaOM’s Hadoop Meetup in 2008.

And that’s exactly what happened, because although data scientists didn’t need things like service-level agreements, business leaders did. So, Stata said, Yahoo implemented some scheduling changes within Hadoop. And although data scientists didn’t need security, Securities and Exchange Commission requirements mandated a certain level of security when Yahoo moved its sponsored search data onto it.

“That drove a certain level of maturity,” Stata said. “… We ran all the money in Yahoo through it, eventually.”

The transformation into Hadoop being “behind every click” (or every batch process, technically) at Yahoo was pretty much complete by 2008, Baldeschwieler said. That meant doing everything from these line-of-business applications to spam filtering to personalized display decisions on the Yahoo front page. By the time Yahoo spun out Hortonworks into a separate, Hadoop-focused software company in 2011, Yahoo’s Hadoop infrastructure consisted of 42,000 nodes and hundreds of petabytes of storage.

From the classroom …

However, although Yahoo was responsible for the vast majority of development during its formative years, Hadoop didn’t exist in a bubble inside Yahoo’s headquarters. It was a full-on Apache project that attracted users and contributors from around the world. Guys like Tom White, a Welshman who actually wrote O’Reilly Media’s book Hadoop: The Definitive Guide despite being what Cutting describes as a guy who just liked software and played with Hadoop at night.

Up in Seattle in 2006, a young Google engineer named Christophe Bisciglia was using his 20 percent time to teach a computer science course at the University of Washington. Google wanted to hire new employees with experience working on webscale data, but its MapReduce code was proprietary, so it bought a rack of servers and used Hadoop as a proxy.

Go to page 2 (of 2) on GigaOM .


Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

06 Mar 17:04

Four short links: 4 March 2013

by Nat Torkington
Claus.dahl

Steve Mann - nørden der bare ikke ku få nogen respekt. Jeg skrev om ham i mig og min brors magasin-one-off UCMag i 2001 www.classy.dk/ucmag.pdf

  1. Life Inside the Aaron Swartz Investigationdo hard things and risk failure. What else are we on this earth for?
  2. crossfilter — open source (Apache 2) JavaScript library for exploring large multivariate datasets in the browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records.
  3. Steve Mann: My Augmediated Life (IEEE) — Until recently, most people tended to regard me and my work with mild curiosity and bemusement. Nobody really thought much about what this technology might mean for society at large. But increasingly, smartphone owners are using various sorts of augmented-reality apps. And just about all mobile-phone users have helped to make video and audio recording capabilities pervasive. Our laws and culture haven’t even caught up with that. Imagine if hundreds of thousands, maybe millions, of people had video cameras constantly poised on their heads. If that happens, my experiences should take on new relevance.
  4. The Google Glass Feature No-One Is Talking AboutThe most important Google Glass experience is not the user experience – it’s the experience of everyone else. The experience of being a citizen, in public, is about to change.
06 Mar 16:58

Retrotechtacular: Donner 3500 portable analog computer

by Mike Szczys
Claus.dahl

De havde stadig en kæmpe stor udgave af sådan en her på Fysikinstituttet på H.C. Ørsted-instituttet da jeg gik dig. De er slet ikke dumme. I stedet for at *simulere* differentialligninger laver man bare ligningerne i elektronikkomponenter, der tilfældigvis efterlever dem, og får så løst ligningen i realtid. Det var - helt op i 80erne - en udemærket og hurtig måde at få bestemte løsninger på, selvom maskinen selvfølgelig mest var bevaret som kuriositet.

retrotechtacular-donner3500

What if we told you we had a computer you can take with you? What if it only weighed 28 pounds? This is a pretty hard sell when today you can get a 1.5 GHz quad-core processor packing computer to carry in your pocket which weighs less than 5 ounces. But back in the day the Donner 3500 was something to raise an eyebrow at, especially for tinkerers like us.

The machine was unveiled in 1959 as an analog computer. Instead of accepting programs via a terminal, or punch cards, it worked more like a breadboard. The top of the case features a grid of connectors (they look like banana plugs to us but we’re not sure). The kit came with components which the user could plug into the top to make the machine function (or compute) in different ways.

We’re skeptical as to how portable this actually was. It used vacuum tubes which are not fans of being jostled. Still, coming during the days when most computers were taking up entire buildings we guess the marketing claim holds up. If you’d like to see a bit more about the machine’s internals check out this forum post.


Filed under: computer hacks
06 Mar 16:50

inkle » inklewriter

by clausd
Claus.dahl

Must try - interaktiv fiktion har en genvækkelse; der er også Waxy's mere nørdede http://playfic.com/

inklewriter is a free tool designed to allow anyone to write and publish interactive stories. It’s perfect for writers who want to try out interactivity, but also for teachers and students looking to mix computer skills and creative writing.
06 Mar 16:47

Mechanical Turk workers aren't anonymous

Claus.dahl

universelt ID - dårlig idé

in tests, looking up the profile pages of 30% of workers revealed real names  
06 Mar 15:06

Four short links: 5 March 2013

by Nat Torkington
Claus.dahl

Damn. Puls-forstærkningsvideoerne giver totalt cyborg-med-super-syn vibes..... (link 1)

  1. Eulerian Video Magnification — papers and the MatLab source code for that amazing effect of exaggerating small changes in file. (*This work is patent pending)
  2. CopyrightX — MOOC on current law of copyright and the ongoing debates concerning how that law should be reformed. Through a combination of pre-recorded lectures, live webcasts, and weekly online seminars, participants in the course will examine and assess the ways in which law seeks to stimulate and regulate creative expression. (via BoingBoing)
  3. Cost Effectiveness for Open Access JournalsThis plot reveals the prestige (Article Influence score) and publication charges for open access journals.
  4. Results of SANS SCADA Survey 2013 (PDF) — Unfortunately, at this time they seem unable to monitor the PLCs, terminal units and connections to field equipment due to lack of native security in the control systems themselves. (via InfoSecIsland)
06 Mar 14:19

The future of the internet is avatars and connected services (video)

by Stacey Higginbotham
Claus.dahl

Vidste ikke at Kuniavsky ikke var selvstændig mere.

There is no single internet of things, just a series of connected services and avatars, the physical hardware that connect to those services. This is what Mike Kuniavsky, a principal in the Innovation Services Group at PARC, explained as his vision for the internet of things in a talk last week at the GigaOM internet of things meetup.

The audio in this video is fuzzy, but Kuniavsky is worth listening to, from his definition of the internet of things to his vision for how we are going to have to change our thinking about software development in order to program it. At 19 minutes the video is the perfect length for watching during a lunch break. Check it out.

Ooyala Video Thumbnail
Watch this video for free on GigaOM

If you missed it, here is yesterday’s video from the same event: Video: Why you shouldn’t care about securing the Internet of things just yet


Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

06 Mar 14:16

MeCam $49 flying camera concept follows you around, streams...

Claus.dahl

MeCam er et kedeligt navn. Paparazzo-copter eller lignende havde været sjovere.

06 Mar 13:56

Basis Lands $11.5M, Adds Esther Dyson & Deepak Chopra As Advisors As It Beefs Up Production Of Its Wearable Health Tracker

by Rip Empson
Claus.dahl

"adds Deepak Chopra as advisor". Nå, men farvel, så

screen-shot-2012-11-29-at-4-31-31-am

Back in November, Basis officially took on Jawbone, Nike, Fitbit and the fast-expanding cohort of startups developing health tracking devices, releasing its long-awaited health-monitoring wristwatch. Capitalizing on the popularity of the Quantified Self Movement, the $200 sensor-happy Basis band and its accompanying web dashboard allow novices and fitness enthusiasts alike to easily track everything from their activity and skin temperature to heart rate, calories burned and sleep patterns.

The initial consumer demand for the mobile health tracker has been high, Basis CEO Jef Holove tells us. With the added publicity around its public launch in November, the startup’s pre-order wait list grew “well into the five-digit range,” he says, forcing Basis to shut down its “store” and tell customers it had sold out of its first batch, as it scrambled to fill its initial orders. Today, however, Basis is getting some help in that regard, announcing that it has raised $11.5 million of fresh funding to help it accelerate production, distribution and beef up its team.

The series B round was led by Mayfield Fund and included contributions from the startup’s existing investors, including DCM and Norwest Venture Partners, and brings its total investment to just over $20 million. As a result of the round, Mayfield Fund Managing Director Tim Chang (a long-standing Basis supporter who has led investments in companies like ngmoco and Playdom) will be joining the startup’s board of directors.

“Basis’ differentiated approach to health and wellness was what led me to invest in the company in the first place, and that remains true today,” Chang says. “No other health tracking device has been able to pack as many sensors into a wearable package, particularly its ability to track heart rate from the wrist, or created a healthy habits system that actually keeps people engaged over time.”

Over the last year, the market for health tracking gadgets has become increasingly active (read: crowded), as startups look to capitalize on the growing consumer interest in mobile health and wellness devices that help them lose a few pounds, burn more calories or just get a better night’s sleep. And, to the startups credit, the market is finally producing products that people are no longer embarrassed to be seen wearing in public.

Traditionally, wearable health tracking devices have been supported by a core group of rabid fitness enthusiasts and geeks, but as the cost of censors (and the underlying technology as a whole) has come down, so has the price, which is beginning to give these devices real mainstream appeal. As the space heats up, Basis is looking to distinguish itself from the competition by not only offering a greater array of sensors and a deeper web analytics product than the next guy, but by seeking out the support of industry veterans and big-name advisors.

In conjunction with its new round of funding, Basis also announced today that it has added the well-known healthy living expert, Deepak Chopra and long-time healthtech investor and analyst, Esther Dyson, to its advisory board.

For those unfamiliar, Chopra is one of the most well-known (and controversial) practitioners of alternative medicine in the U.S. He has published more than 65 books, and currently serves as a fellow of the American College of Physicians and as a professor at the Kellogg School of Management. Esther Dyson, in turn, is an active and prolific angel investor, entrepreneur and speaker, focusing particularly on healthcare and health technology. She was one of the early proponents of the Quantified Self Movement and has invested in companies like 23andMe, Genomera, HealthTap, HealthRally, Keas, Omada Health and Voxiva.

With its new capital and support from Dyson and Chopra, Holove says that Basis will look to beef up its supply chain, from ramping up production and development to expanding the amount of products it ships each week. Beyond its sensors, the key to the Basis experience is its “Healthy Habits” web dashboard, which aim to help users move toward their goals incrementally, catering not just to fitness enthusiasts, but to busy people who don’t have time to run a marathon every day of the week. The idea is to enable people to break their health goals down into flexible, micro-benchmarks that make it easier to build momentum as they work towards their goals.

As we wrote at the time, the platform allows “users to choose the healthy habits that work best with their lifestyle,” offering ten different habits that “aim to gradually increase user activity (or sleep) … and automatically adjust to the frequency or target of a user’s chosen habit at the end of each week, basing that calibration on their performance from the week before.”

Holove says that he wants to leverage the Healthy Habits platform to do more and that Basis is just scratching the surface on this front and in terms of user engagement and interactivity, so the startup will be looking to ramp up hiring in their hardware, cloud service and software teams.

Another part of the startup’s roadmap, and really a critical element of creating a sticky user experience, is mobile. The startup is currently working on an Android app that is slated to launch later this month, with an iOS app to follow close behind. Adding mobile apps will allow users not only to see their health data and track habits on-the-go but allow the device to wirelessly sync with its dashboard and create more unique and engaging feedback loops.

Basis has been off to a great start, and while there are still some kinks to iron out and the device itself is still in serious need of more customization options, it’s got a lot to recommend it even this early in the game. That being said, with the recent news that Fitbit is raising $30 million at a $300+ million valuation, it’s not the only one capitalizing on the growth of the health tracking space. And the competition is only going to get more intense.

For more, find Basis at home here.


06 Mar 13:46

How rope was made the old fashioned way

by Jason Kottke

This is a clip from the BBC series Edwardian Farm that shows how rope was made in the olden days.

The entire series is available to watch online.

Tags: Edwardian Farm   TV   video
06 Mar 13:26

Bradley Manning Nominated For Nobel Peace Prize As People Begin Realizing How Damaging His Case Is To A Free Press

by Mike Masnick
Claus.dahl

The response to wiki leaks is a low point in our democratic history

With Bradley Manning pleading guilty to some of the lesser charges against him, Harvard law professor Yochai Benkler -- who is a possible expert witness in the trial -- has an excellent and detailed post about why the entire case against him should be seen as a threat to the nature of whistleblowing and a free press. He notes that the US prides itself on its support of the First Amendment, even in uncomfortable situations, but this case could flip that around in a very damaging way.
A country's constitutional culture is made up of the stories we tell each other about the kind of nation we are. When we tell ourselves how strong our commitment to free speech is, we grit our teeth and tell of Nazis marching through Skokie. And when we think of how much we value our watchdog press, we tell the story of Daniel Ellsberg. Decades later, we sometimes forget that Ellsberg was prosecuted, smeared, and harassed. Instead, we express pride in a man's willingness to brave the odds, a newspaper’s willingness to take the risk of publishing, and a Supreme Court’s ability to tell an overbearing White House that no, you cannot shut up your opponents.
Yet, in the case of Manning, the government is going much, much, much further. It is trying to make leaking information to the press the equivalent of espionage and aiding the enemy -- a capital offense. If you want to create chilling effects on free speech and a free press, this is how you do it. If you believe in the stories above, about the fundamental respect for the First Amendment, then the nature of the prosecution should worry you a great deal.

As for those who claim that leaking to Wikileaks is not like the Pentagon Papers or leaking something to the press, Benkler's detailed analysis shows why that's bunk. Since Wikileaks released some of the material that Manning sent them, the organization has been painted as being this evil anti-American organization, and there's also been a big spotlight on Julian Assange, who is certainly not presented as a particularly likeable character. But, as Benkler points out, before Wikileaks got that material, it was regularly seen as an upstart media property, and a great place for whistleblowers to go to expose fraud and corruption. In other words, the idea that Manning chose to go to Wikileaks to harm the US seems quite unlikely. His story of exposing wrongdoing by the US and forcing a debate on how to have America live up to its principles has more credibility when you realize just how Wikileaks was portrayed prior to Manning's material being submitted:
The reputation that WikiLeaks has been given by most media outlets over the past two and a half years, though, obscures much of this—it just feels less like “the press” than the New York Times. This is actually the point on which I am expected to testify at the trial, based on research I did over the months following the first WikiLeaks disclosure in April 2010. When you read the hundreds of news stories and other materials published about WikiLeaks before early 2010, what you see is a young, exciting new media organization. The darker stories about Julian Assange and the dangers that the site poses developed only in the latter half of 2010, as the steady release of leaks about the U.S. triggered ever-more hyperbolic denouncements from the Administration (such as Joe Biden's calling Assange a “high-tech terrorist”), and as relations between Assange and his traditional media partners soured.

In early 2010, when Manning did his leaking, none of that had happened yet. WikiLeaks was still a new media phenom, an outfit originally known for releasing things like a Somali rebel leader’s decision to assassinate government officials in Somalia, or a major story exposing corruption in the government of Daniel Arap Moi in Kenya. Over the years WikiLeaks also exposed documents that shined a light on U.S. government practices, such as operating procedures in Camp Delta in Guantanamo or a draft of a secretly negotiated, highly controversial trade treaty called the Anti-Counterfeiting Trade Agreement. But that was not the primary focus. To name but a few examples, it published documents that sought to expose a Swiss Bank’s use of Cayman accounts to help rich clients avoid paying taxes, oil related corruption in Peru, banking abuses in Iceland, pharmaceutical company influence peddling at the World Health Organization, and extra-judicial killings in Kenya. For its work, WikiLeaks won Amnesty International's New Media award in 2009 and the Freedom of Expression Award from the British magazine, Index of Censorship, in 2008.
It's sometimes difficult to remember that, given everything that happened in the past two and a half years.

Benkler goes on to point out that the "precedents" that the US tries to rely on to argue that whistleblowing to the press is a form of aiding the enemy are ancient, obsolete and laughable. Many of the arguments go back to some Civil War-era precedents, and even then, when you look at the details you realize they were discussing something extremely different than what happened with Manning (i.e., the cases involved using the press to send coded messages about confidential info, not releasing the info to the public).

In the end, Benkler makes a powerful point:
If Bradley Manning is convicted of aiding the enemy, the introduction of a capital offense into the mix would dramatically elevate the threat to whistleblowers. The consequences for the ability of the press to perform its critical watchdog function in the national security arena will be dire. And then there is the principle of the thing. However technically defensible on the language of the statute, and however well-intentioned the individual prosecutors in this case may be, we have to look at ourselves in the mirror of this case and ask: Are we the America of Japanese Internment and Joseph McCarthy, or are we the America of Ida Tarbell and the Pentagon Papers? What kind of country makes communicating with the press for publication to the American public a death-eligible offense?

What a coup for Al Qaeda, to have maimed our constitutional spirit to the point where we might become that nation.
Given all of that, you can see why some have nominated Manning for the Nobel Peace Prize. While it is highly unlikely that Manning will be given serious consideration for the prize, the more you look at the case, the more you realize how dangerous the US government's own argument is here, and how much of an attack it is on fundamental principles we supposedly believe in and fight for here in the US.

Permalink | Comments | Email This Story


06 Mar 13:24

EMC further embraces in-server flash storage with more memory cards

by Jordan Novet
Claus.dahl

Flash på vej frem, også hos os i Firmafon, forresten

Just a day after Violin Memory announced a new line of PCI-Express flash memory cards to provide quick-to-access storage inside servers, EMC said Tuesday it will offer new PCIe cards of its own.

EMC’s new XtremSF PCIe cards come in a few sizes. Enterprise multi-level-cell models with 550GB and a 2.2TB capacities are now available, and 700GB and 1.4TB models will come in the second quarter of the year. More sizes will follow. Last year EMC introduced two PCIe cards — 350GB and 700GB — under the name VFCache. Those two have joined the XtremSF line alongside the four new cards, said Barry Ader, general manager of EMC’s flash business unit.

EMC and others in the data center storage market have turned nearly 180 degrees from where they were just a few years ago, when they decried in-server flash memory from such companies as Fusion-io and sang the praises of separate flash memory arrays instead. Server-side storage eliminates the bottleneck between the processor and storage in separate boxes, decreasing latency.

It’s also a reaction to a fast-growing market, with webscale companies such as Facebook spending for fast-acting server-side flash storage. Fusion-io reported a 43 percent gain in year-to-year revenue for the fourth quarter of 2012, coming in at $120.5 million, according to figures on file with U.S. Securities and Exchange Commission.

With more products coming into the flash PCIe market, prices could fall further, accelerating enterprise adoption. Then again, companies could still find a way to compete in this new market by offering different capabilities, which mean other companies would need to jump in before prices race to the bottom.


Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

06 Mar 13:16

Facebook kisses DRAM goodbye, builds memcached for flash

by Derrick Harris
Claus.dahl

A library of Congress every 10 minutes

Q: What do you get when you mix Facebook’s extensive memcached usage with its strategy of “cold storage” for infrequently accessed data?

A: McDipper, a Facebook-built implementation of the popular memcached key-value store designed to run on flash memory rather than pricier DRAM.

Memcached, for the unfamiliar, is an open-source key-value store that caches frequently accessed data in memory so applications can access and serve it faster than if it were stored on hard disks. It’s a very popular component of many web applications stacks, including at Facebook where the company runs thousands of memcached servers to power its various applications.

But DRAM is expensive, especially when you get to Facebook’s scale, and not all applications deserve that kind of performance. So, according to a Facebook Engineering post on Tuesday, the company designed McDipper to handle “working sets that had very large footprints but moderate to low request rates. … Compared with memory, flash provides up to 20 times the capacity per server and still supports tens of thousands of operations per second.”

Facebook has deployed McDipper for a handful of these workloads, the blog states, and has “reduced the total number of deployed servers in some pools by as much as 90% while still delivering more than 90% of get responses with sub-millisecond latencies.” It has been part of Facebook’s photo infrastructure for about a year and serves 150 gigabits of data per second — or “about one library of congress (10 TB) every 10 minutes” — over Facebook’s content-delivery network.

mcdipper

How McDipper stores data

This is the same logic that drove Facebook to undertake its cold storage engineering effort for even more infrequently accessed data, which aims to find a middle ground between the inefficiency and latency of hard disks and the high cost of flash storage. To meet that goal, the company is getting creative by considering everything from lower-performance flash to Blu-ray — pretty much anything but tape — VP of Engineering Jay Parikh told me in January.

Building a tool like McDipper is the just the tip of the iceberg, though, when it comes to managing the cost and efficiency of infrastructure at large web companies such as Facebook. On Tuesday, eBay released its Digital Service Efficiency report that lays out a methodology for assessing the effect that infrastructure (more than 52,000 servers in eBay’s case; Facebook has even more) has on larger corporate goals such as clean energy and the bottom line.

And later this month at our Structure: Data conference, data center executives from Facebook, Microsoft and Goldman Sachs will take the stage to discuss how smart analytics help them plan to meet capacity needs while keeping costs in check.

Feature image is Facebook’s new all-flash Dragonstone server design.

Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now.


Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

06 Mar 13:06

via Laughing Squid: Grand Theft Auto IV - Toy Story Sheriff...

Claus.dahl

Grant theft Toy Story. Stærkt.



via Laughing Squid:

Grand Theft Auto IV - Toy Story Sheriff Woody (MOD) HD (by taltigolt)

06 Mar 12:56

FARK.com: (7626911) Why are PC sales down? Moore's law

by (author unknown)
Claus.dahl

Moore's law was never about 'power', but about transistors/chip. It's become an increasingly difficult (software) problem to translate this into 'power'

That article seem to say the opposite - Moore's law is no longer holding. If processing power is only improving 10% per year, then the law has been broken.
www.fark.com/comments/.../Why-are-PC-sales-down-Moores-...
05 Mar 13:27

Corporations Are “Bad AI”

by Klint Finley
Claus.dahl

This is awesome - and very true. I wrote much the same thing here http://clausnotes.wikispaces.com/The+Singularity+was+already+here a few years ago

Tim Maly writes:

One of my favourite recurring tropes of AI speculation/singulatarian deep time thinking is mediations on how an evil AI or similar might destroy us. [...]

And all I can think is: we already have one of those. It is pretty clear to anyone who’s paying attention that 1. a marketplace regime of firms dedicated to maximizing profit has—broadly speaking—added a lot of value to the world 2. there are a lot of important cases where corporate profit maximization causes harm to humans 3. corporations are—broadly speaking—really good at ensuring that their needs are met.

I don’t think that it’s all that far fetched to suggest that maybe they’re getting better and better at ensuring their needs are met. Pretty much the only thing that the left and right in America can agree on is that moneyed influence has corrupted American politics and yet neither side seems able to do much of anything about it.

Full Story: Quiet Babylon: The Singularity Already Happened; We Got Corporations

See also: Yes, There is a Sub-Reddit Dedicated to Preventing SkyNet

05 Mar 13:25

On the Leakiness of Surveillance Culture, the Corporate Gaze, and What That Has To Do With the New Aesthetic

by Tim Maly
Claus.dahl

Why did I not properly link this up?

When David Rowe put a smart meter in his home, it wasn’t so that he could spy on Amy, his teenage daughter. But that’s what happened anyway.

800KM from home, with the same idle curiosity that has me popping open my Twitter feed when there’s a lull in activity, he decided to check on his Fluksometer. Noticing suspiciously high power usage, he did a little investigating and busted up her unauthorized New Year’s party.

Where Am I?
Creative Commons License photo credit: Sam Breach

It’s a cute story, but it illustrates a crucial point: surveillance culture is leaky. Primary measurements beget chains of reasoning and implication. Second and third order conclusions can be drawn by clever observers and unintended consequences are the order of the day. That’s how we end up with stories of Target outing pregnant teens to their parents through the ultra-empathetic medium of coupons.

So far, I’ve been telling you the story about this kind of surveillance that companies who market these services want you to hear. They may be strange and creepy, but they are creepily accurate.

From here, we get the fantasy that with the right mixture of surveillance and analysis, you too can make terrifyingly accurate predictions about your customer’s/children’s/terrorism suspect’s behaviour. It falls to you only to determine how to act on that information in an appropriate manner. And, surely, you know best how to do that.

Reviewing SmartMeter™ with homeowner
Creative Commons License photo credit: pgegreenenergy

The truth is illustrated by an infographic halfway through Wired’s scathing overview of Klout. It shows that Klout ranks Robert Scoble as more influential than RZA, Sarah Palin, and Craig Venter. (You can learn a lot about the blinkered nature of Klout by the fact that their official account proudly linked to the piece.)

Any sane marketing organization would look at these results and conclude that Klout’s metrics are utterly flawed. Instead, we learn that some companies are offering perks to people with high Klout scores in the hopes that they’ll spread the word about their VIP treatment. In turn, we learn about people (including the reporter) who find themselves altering their behaviour in the hopes of finding favour with this blind, demented judge.

Me socially?
Creative Commons License photo credit: elventear

What’s particularly insidious about Klout is that it’s an opt-out service. You get a Klout score unless you take the time to tell them to fuck off. In this way, it’s of a lineage with Girls Around Me the app that scraped Foursquare check-ins and Facebook profiles to build up a stalker’s toolkit and Please Rob Me which listed empty burglable houses, based on Twitter geo-data.

I’m coming around to Eben Moglen’s view that social networking, as currently designed, is an ecological disaster for the social environment. This isn’t, like, a new insight or anything. We are the product and all that. But sometimes it takes a turn of phrase to drive a point home. Here’s the line that tipped me over the edge: “Every time you tag anything or respond to anything or link to anything, you’re informing on your friends.”

Zandok: The Shape of Live that Never Came
Creative Commons License photo credit: pedroliveira

More to the point, you are informing on your friends so that a cadre of socially clueless dudes can get rich selling the output of broken algorithms to marketers, in the form of human lives sliced up in such a way as to make it easier to run database queries.

This is a situation that’s profoundly broken. It’s basically an open secret that it’s broken ethically, but it’s also broken emperically. To understand how broken, consider Alexis Madrigal’s attempt to work out how much user data is worth. The answer he comes up with is plus-or-minus 7 orders of magnitude. Half-a-penny or $1,200. You know. Depending.

All social-networking systems, as currently designed, demonstrably create social awkwardnesses that did not, and could not, exist before. All social-networking systems constrain, by design and intention, any expression of the full band of human relationship types to a very few crude options – and those static! A wiser response to them would be to recognize that, in the words of the old movie, “the only way to win is not to play.”

Adam Greenfield – Antisocial networking

“That’s all well and good,” I hear you ask, “but what does it have to do with The New Aesthestic?”

Designer Reverse-Engineers Face-Detection Tech to Develop Camouflage Makeup
Image from Adam Harvey’s CV Dazzle project.

Let’s start with Timo Arnall’s Robot Readable World.

He describes the video like this: “How do robots see the world? How do they gather meaning from our streets, cities, media and from us?”

“Robot Readable World” is a useful shorthand but using the video to ask “how do robots see the world?” is exactly wrong. The images in the video, compelling though they are, don’t depict robots seeing the world any more than the Terminator HUD depicts a realistic view of how a well-designed T-1000 would see the world.

Arnall’s video is actually a depiction of the debug output of machine vision, processed and formatted to be human-readable. It looks the way it does because programmers threw together a visualization to help them understand why the machines weren’t seeing what they were supposed to be seeing, or to confirm that they were seeing what they were supposed to be seeing when everything seemed to work. It’s an attempt to peer into the mind of an algorithm. Its aesthetic core comes from the same place as scrolling lines of program output in a VT-100 terminal or the bright orange of safety vests.

It’s the aesthetic of engineers and function. It’s the aesthetic of hacked together monitors, using available crude rendering. It’s the same aesthetic as any debugger, and no more reflective of robot perception than the list of diagnostic println’s we used to use to trace crashes in a script are reflective of a game.

Got here.
Got here.
Got here.
Got here.
Fatal exception.

As Matt Frost remarked, “we can expect to read sentences like ‘the motive of the algorithm is still unclear’ a lot in the coming years.”

Here’s some New Aesthetic. Consider HP’s much maligned webcam software, helplessly trying to find the person in the frame. You can almost hear the algorithms scream in anguish as they try to make sense of the cacophonous firehose of data, bad lighting, and unanticipated skin tone. There are no overlaid facial recognition squares, just the mute stubborn refusal to recognize Desi.

Ah, but now I’ve made the mistake that Bruce Sterling cautions about. I’ve given the robot a personality. I’ve tried to make it a friend.

We’re not going to be able to gloss over this gaping vacuity by “making the machines our friends.” Because they’re not our friends. Machines are never our friends, even if they’re intimates in our purses and pockets eighteen hours a day. They may very well be our algorithmic investors, but they’re certainly not our art critics, because at that, they suck even worse than they do at running our economy.

Bruce Sterling An Essay on the New Aesthetic.

It seems to me that this mistake is unavoidable. It may even be at the core of The New Aesthetic, this multiplication of entities and agents. In James Bridle’s post SxSW roundup, he seems to say as much.

One of the core themes of the New Aesthetic has been our collaboration with technology, whether that’s bots, digital cameras or satellites (and whether that collaboration is conscious or unconscious).

James Bridle #sxaesthetic.

It’s a profoundly human action, to multiply entities. Perhaps it comes from the same root as pareidolia. We see faces in the clouds, we see personality in pets, we see collaboration in algorithms. Perhaps it’s all Pixar’s fault.

We are living inside a Cambrian explosion of entities of varying independence and varying physicality, some quite compact and individual, others smeared across great expanses of space and time. Some tied very much to a medium, others extruding parts of themselves into the biosphere, the noosphere, the memesphere, the digisphere.

I keep thinking about Aujik’s next nature Shintoist animism. They divide nature into the refined (robotics, artificial intelligence, nano technology, augmented reality, body enhancements) and the primitive (plants, soil, organisms, stones). Plenty of room in their cosmology for all sorts of new entities and hybrids.

Why stop there at the primitive and refined? There’s another class of entities to whom we have already granted personhood. I’m speaking, of course, about corporations. Immortal entities of terrifying inhuman thinking, capable of entering into contracts and incurring debts, and owed a subset of the rights which we accord to human persons.

Now we’re circling back to surveillance.

punch card 1974 EX
Creative Commons License photo credit: THE Holy Hand Grenade!

I’m interested in the aesthetics of the corporate readable world, and their truly alien gaze.

Corporations communicate to us through money, press-releases, and advertising, always advertising. For a glimpse of the corporate readable world, look to Twitter’s routinely useless “who to follow” panel, Klout’s laughable ideas about what you are influential about, Facebook’s clumsy attempts to get you to join a dating site, and Google’s demented, personalized, Gmail ads. You can see it in your credit rating, and your position on the actuarial tables. You can see it in Blackwater/Xe/Academi’s attempt to conceal itself by shedding names like a trickster god shedding skins.

These aren’t as visually appealing as most of the examples that show up on Bridle’s Tumblr but they’re an aesthetic nonetheless.

2500 Creative Commons Licenses
Creative Commons License photo credit: qthomasbower

Not long ago, Cory Doctorow delivered a talk about The Coming Civil War over General Purpose Computing (dig the New Aesthetic scrolling background on that page).

At its core is the argument that the forces that attempt to regulate computers for the purposes of protecting the current copyright regime are a precursor to a wider battle about general purpose computing. Eventually, he argues, everything will have computation inside it. And the same logic that led copyright holders to embed spyware on compact discs will lead regulators to make demands that engineers allow them to limit the capabilites of e.g. self-driving cars.

But there’s a problem. We don’t know how to make a computer that can run all the programs we can compile except for whichever one pisses off a regulator, or disrupts a business model, or abets a criminal.

Cory Doctorow – The Coming Civil War over General Purpose Computing.

The same forces that make copyright untenable make surveillance inevitable. Computers are copying machines. They make copies of everything, including every action that you take within their field of sensation.

Historically, that’s meant the things that happen online, with the main avenue of input being keystrokes. But as we wire up the rest of the planet with cameras, accelerometers, potentionmeters, microphones, thermal sensors, pressure plates, and switches, that means the computer and corporate gaze will reach everything, everywhere, always.

Jay Owens (@hautepop) September 19, 2012

David didn’t need to log in to his Fluxometer at the moment Amy’s party was in progress. I’m sure it keeps logs. He could have stumbled across the information at any old time.