Shared posts

16 Mar 14:54

Ultron’s Ultimate Weakness [Comic]

by Geeks are Sexy
12 Mar 10:00

Entering the BIOS

by sharhalakis

by @just_hank_moody, andreibkn and necessaryaegis

08 Mar 14:20

The Story of Grace Hopper: Pioneering Computer Scientist [Comic]

by Geeks are Sexy

This is a fantastic comic by artist Pablo Stanley illustrating the story of pioneering computer scientist Grace Hopper (1906-1992,) also known as “Amazing Grace.”

grace
grace1
grace2
grace3

[Source: Stanley Color | Udemy]

The post The Story of Grace Hopper: Pioneering Computer Scientist [Comic] appeared first on Geeks are Sexy Technology News.

24 Feb 18:00

172. ISAAC ASIMOV: A lifetime of learning

by Gav

ASIMOV01

Isaac Asimov (1920-1992) was a writer, known for his contribution to science fiction (including The Three Laws of Robotics, I, Robot and the Foundation series) and his staggering work in other genres and non-fiction.

Asimov had a formal education in chemistry, earning his PhD and working as a chemist for the Navy during WWII. He taught biochemistry and later became a professor at the Boston Univeristy of Medicine, all while writing stories for fantasy magazines in his spare time. He finally left the University in 1958 to focus on writing. Asimov’s output was truly mind-blowing, writing over 500 (!!!) books and 90,000 letters. He said: “Writing is my only interest. Even speaking is an interruption.”

Asimov’s non-fiction books were mostly on astronomy, but his other titles covered general science, history, mathematics, physics, Shakespeare, the Bible and mythology. He was completely self-taught in these areas and was successful for being able to take difficult scientific concepts and make them entertaining for the general public. He said he could “read a dozen dull books and make one interesting book out of them.” To get some idea of how vast Asimov’s knowledge was, his books appear in nine of the ten Dewey Decimal Classes.

The quotes used in this comic are taken from a fantastic interview Asimov did in 1988 (which you can watch on YouTube). In it, Asimov predicts how in the near-future, personal computers will help anyone learn anything ‘that strikes their fancy’ in the privacy of their own home and at their own leisure. Of course, that prediction came true with the internet, and even though the technology from The Matrix isn’t available yet, where we could upload information directly into our brain and shout “I know kung-fu!”, it has never been easier to learn whatever you want, no matter how niche. Thanks to reader Jenny for sending me the quote and the Brain Pickings article that featured the interview.

RELATED COMICS: Carl Sagan Pale Blue Dot. Richard Dawkins The Lucky Ones. Albert Einstein A Human Being is Part of the Whole. Jack London I Would Rather be Ashes Than Dust.

- I admit not having read any of Asimov’s books. Where should I start? The Foundation series? His story Nightfall was voted the best short science fiction story of all-time, so maybe that?
– Asimov said that one of only two men he knew who was smarter than himself was his good friend Carl Sagan.

05 Mar 10:00

When you think you have it under control

by sharhalakis

by uaiHebert

04 Mar 10:00

How to respond about on-going outages

by sharhalakis
image

uaiHebert

24 Feb 10:00

Project handover

by sharhalakis
image

by uaiHebert

19 Feb 17:00

New Tattoo Removal Cream Exploits Immune System

by JLister

falkenham

A university researcher says he’s developing a tattoo removal cream that could remove tattoos cheaply and without pain.

A tattoo doesn’t work simply by burying ink deep enough  (in the dermis) that it remains in place as the skin regenerates. Instead it’s that the needle doesn’t just deliver the pigment, but rather that its penetration causes the body’s immune system to send protective cells known as macrophages to the wound. These macrophages soak up the pigment to protect the rest of the skin and, while some macrophages are carried to the lymph nodes, others stay lodged in place, with the dye remaining visible through the skin.

At the moment, most tattoo removal involves using lasers to break down the ink particles so that they are small enough to be absorbed into the body. In effect it’s an attempt to speed up a natural process by which tattoos fade over time because of exposure to sunlight (just not quickly enough for the tattoo to disappear during an average lifespan.) That can mean painful inflammation and even scarring.

Alec Falkenham, a PhD student at Dalhouse University in Halifax, Nova Scotia, has developed what he calls Bisphosphonate Liposomal Tattoo Removal cream. Although it doesn’t require puncturing the skin, it simulates the introduction of a foreign body that comes with tattooing. The idea is that this stimulates the body to deliver fresh macrophages, with the existing ones (containing the pigment) carried to the lymph nodes.

At the moment Falkenham has only tested the cream on mice. He’s looking to move to pigs next before eventual human tests. The estimated price of the cream would be just four and a half cents per square centimetre, though Falkenham isn’t yet sure how many treatments would be needed in humans.

Falkenham has four tattoos and, although perfectly happy with them, says the experience of getting them started him thinking about the relationship between tattooing and the immune system.

The post New Tattoo Removal Cream Exploits Immune System appeared first on Geeks are Sexy Technology News.

11 Feb 10:00

Forgot to push the commits

by sharhalakis

by uaiHebert

09 Feb 10:00

Lockess algorithm

by sharhalakis
image

by Lwahonen

(Correction: lockless)

04 Feb 08:01

Comic: Precautions

by tycho@penny-arcade.com (Tycho)
New Comic: Precautions
04 Feb 10:00

Before diving into the legacy code

by sharhalakis

by Ordon

03 Feb 15:00

The Ingredient for Life [Comic]

by Geeks are Sexy
30 Jan 10:01

Ops after a long night deploy

by sharhalakis

by uaiHebert

23 Jan 20:39

The ‘X-Files’ Is Getting a Reboot

by Remy Carreiro

xfiles1

Holy crap, the X-Files is getting a reboot. Though they have been in talks for years about bringing this exciting show to the masses once again, up until now it had only been speculation. As of yesterday, speculation crossed over into fact (like an actual X-Files episode). From Vanity Fair:

Over the weekend, Fox confirmed that they were in the logistical phase of rebooting the series. That is to say, they were checking to make sure they could get all the original players back and available at the same time because, Fox promised, they’re not doing The X-Files without Mulder and Scully.

I will admit, I am a huge fan of the original series, and watched it weekly without missing an episode. And even though it sounds like they’re doing it right by getting the original cast, I’m a tad bit hesitant. While I am very excited about the prospect of getting to explore the paranormal with agents Mulder and Scully again, can lightning strike twice for this prestigious series? Guess we will have to just wait and see. I have faith, though. I KNOW the truth is still out there, so their work isn’t done yet.

[Story from VanityFair | Image Source]

The post The ‘X-Files’ Is Getting a Reboot appeared first on Geeks are Sexy Technology News.

16 Jan 17:00

Too Darn Bright [Comic]

by Geeks are Sexy
21 Jan 13:14

The Truth About the Internet

by Zach Weinersmith

it will scare you

Continue reading on Medium »

20 Jan 09:10

Calvin and Hobbes for January 20, 2015

13 Jan 04:30

A fear submitted by liamkruger to deep dark fears. Check out...



A fear submitted by liamkruger to deep dark fears. Check out prints and stuff at the deep dark fears store.

08 Jan 00:00

Lunar Swimming

by xkcd

Lunar Swimming

What if there was a lake on the Moon? What would it be like to swim in it? Presuming that it is sheltered in a regular atmosphere, in some giant dome or something.

Kim Holder

This would be so cool.

In fact, I honestly think it's cool enough that it gives us a pretty good reason to go to the Moon in the first place. At the very least, it's better than the one Kennedy gave.

Floating would feel about the same on the Moon as on Earth, since how high in the water you float depends only on your body's density compared to the water's, not the strength of gravity.

Swimming underwater would also feel pretty similar. The inertia of the water is the main source of drag when swimming, and inertia is a property of matter[1]♬ BILL NYE THE SCIENCE GUY ♬ independent of gravity. The top speed of a submerged swimmer would be about the same on the Moon as here—about 2 meters/second.

Everything else would be different and way cooler. The waves would be bigger, the splash fights more intense, and swimmers would be able to jump out of the water like dolphins.

This[2]Not this one. The other one.​[3]The simplest approach, which gives us an approximate answer, is to treat the swimmer as a simple projectile. The formula for the height of a projectile is:

\( \frac{\text{speed}^2}{2\times\text{gravity}} \)

... which tells us that a champion swimmer moving at 2 meters per second (4.5 mph) would only have enough kinetic energy to lift their body about 20 centimeters against gravity.

That's not totally accurate, although it's enough to tell us that dolphin jumps on Earth probably aren't in the cards for us. But to get a more accurate answer (and an equation we can apply to the Moon), we need to account for a few other things.

When a swimmer first breaks the surface, they don't have to lift their full weight; they're partially supported by buoyancy. As more of their body leaves the water, the force of buoyancy decreases, since their body is displacing less water. Since the force of gravity isn't changing, their net weight increases.

You can calculate how much potential energy is required to lift a body vertically through the surface to a certain height, but it's a complicated integral (you integrate the displacement of the submerged portion of their body over the vertical distance they travel) and depends on their body shape. For a human body moving fast enough to jump most of the way out of the water, this effect probably adds about half a torso-length to their final height—and less if they're not able to make it all the way out.

The other effect we have to account for is the fact that a swimmer can continue kicking as they start to leave the water. When a swimmer is submerged and moving at top speed, the drag from the water is equal to the thrust they generate by kicking and ... whatever the gerund form of the verb is for the things your arms do while swimming. My first thought was "stroking," but it's definitely not that.

Anyway, once the jumping swimmer breaks the surface, the drag almost vanishes, but they can keep kicking for a few moments. To figure out how much energy this adds, you can multiply the thrust from kicking by the distance over which they're kicking after breaking the surface, since energy equals force times distance. The distance is most of a body length, or 1 to 1.5 meters. As for the force from kicking, random Google results for a search for lifeguard qualifications suggest that good swimmers might be able to carry 10 lbs over their heads for a short distance, which means they're generating a little more than 10 pounds-force (50+ N) of kicking thrust.

We can combine all these together into a big ol' equation:

\[ \text{Jump height}=\left(\frac{\tfrac{1}{2}\times\text{body mass}\times\left(\text{top speed}\right)^2+\text{kick force}\times\text{torso length}}{\text{Earth gravity}\times\text{body mass}}\right)+\left(\text{buoyancy correction} \right) \] footnote contains some detail on the math behind a dolphin jump. Calculating the height a swimmer can jump out of the water requires taking several different things into account, but the bottom line is that a normal swimmer on the Moon could probably launch themselves a full meter out of the water, and Michael Phelps may well be able to manage 2 or 3.

The numbers get even more exciting when we introduce fins.

Swimmers wearing fins can go substantially faster than regular swimmers without them (although the fastest swimmer wearing flippers will still lose to a runner, even if the runner is also wearing flippers and jumping over hurdles).

Champion finswimmers can go almost 3.2 m/s wearing a monofin, which is fast enough for some pretty impressive jumps—even on Earth. Data on swimfin top speeds and thrusts[4]This paper provides some sample data. suggest that on the Moon, a champion finswimmer could probably launch themselves as high as 4 or 5 meters into the air. In other words, on the Moon, you could conceivably do a high dive in reverse.

But it gets even better. A 2012 paper in PLoS ONE, titled Humans Running in Place on Water at Simulated Reduced Gravity, concluded that while humans can't run on the surface of water on Earth,[5]They actually provide a citation for this statement, which is delightful. they might just barely be able to do so on the Moon. (I highly recommend reading their paper, if only for the hilarious experimental setup illustration on page 2.)

Because of the reduced gravity on the Moon, the water would be launched upward more easily, just like the swimmers. The result would be larger waves and more flying droplets. In technical terms, a pool on the Moon would be more "splashy".[6]The SI unit of splashiness is the splashypant.

To avoid splashing all the water out, you'd want to design the deck so water drains quickly back into the pool. You could just make the rim higher, but then you'd spoil one of the key joys of a pool on the Moon—exiting via Slip 'N Slide:

I 100% support this idea. If we ever build a Moon base, I think we should absolutely build a big swimming pool there. Sure, sending a swimming pool's worth of water (135 horses) to the Moon's surface would be expensive.[7]If you decided to bundle a backyard pool into individual two-liter bottles, and sent them in 3,000 batches of 10 each via the startup Astrobotic, it would cost you $72 billion (according to their website's calculator). But on the other hand, this lunar base is going to have people on it, so you need to send some water anyway.[8]Sending a supply of water and a filter system is probably cheaper than sending a replacement astronaut every 3 or 4 days, although I encourage NASA to run the numbers on that to be sure.

And it's really not impossible. A large backyard swimming pool weighs about as much as four Apollo lunar landers. A next-generation[9](or, heck, previous-generation) heavy-lift rocket, like Boeing's NASA SLS or Elon Musk's SpaceX Falcon Heavy, would be able to deliver a good-sized pool to the Moon in not too many trips.

So maybe the next step, if you really want a swimming pool on the Moon, is to call Elon Musk and ask for a quote.

22 Jan 17:50

What The Old Reader Readers Are Reading

Do you know what’s popular on the web right now?

If you ignore search engines, social media, and shopping, the most popular content on the web is sports (espn.com), news (cnn.com, huffingtonpost.com, foxnews.com), and porn.

If you ignore celebrities like Katy Perry, the most popular stuff on Twitter is mainstream news sites (CNN, BBC). 

If you look at what’s popular among The Old Reader users, you get a much different picture. 

First off, you like comics. Really, really like comics. XKCD, Dilbert, and the Oatmeal dominate the list of most popular feeds on The Old Reader. 

image

After comics, the majority of feeds are tech blogs and tech news sites. Then comes lifestyle stuff like Lifehacker. There is also a lot of longer form content like TED Talks or in-depth magazine reporting. We also see national news sites like nytimes.com and what might be considered local news sites, like Boston.com.

Interestingly, there is very little sports in our feeds. That might be because our users are just not sports fans. Or it might be that sports is easy to consume on Twitter. 

Looking at all of the data, I’m starting to think that The Old Reader is like a newspaper. Our readers are using it to compile a single source of information, news, analysis, satire, and opinion. It’s a source of information that you would have to work really hard to get just going online or using social media.

In fact, I think that the popularity of comics on our list supports my theory. It seems to me that just like in the days of the newspaper, comics are the one thing everyone can agree on. 

And as a comic fan, I’d like to point out that the comics you like are not childish entertainment. These comics are satire. Satire is only useful or interesting to people who have a good handle on what’s going on and are looking for a more subtle, sophisticated take- a way to make sense of the all the other stuff they read.  

On the Internet or social media, most people don’t read much beyond the headlines on mainstream news sites. But judging from our most popular feeds, The Old Reader makes it possible to consume a broader range of stuff, from comics and satire to news and analysis, to blogs and feature-length content.

Having information and being informed are not the same thing. Our users are looking to be informed. The paradox of our time is that you can have all of the information in the world available and learn less. There are more sources of information, but you need new literacy skills to decode messages in the way news and information are presented.

Most of us don’t have the time or mental energy to really analyze everything coming at us. But if you use it right, I really believe The Old Reader can help you get a better handle on a complicated world. 

13 Jan 10:00

sudo

by sharhalakis

by Mau

plus this older post

11 Nov 14:08

Why You Should Never Use MongoDB

by sarahmei
Renato Cerqueira

Pros amg programadores, ando lendo sobre o assunto pra um projeto que eu ando desenvolvendo, alguém tem pontos a favor ou contra? Corroborando ou discordando da amg que escreveu o post?
Tem mais posts sobre o assunto? :)

Aliás, não precisa ser só sobre mongo, pode ser sobre qualquer coisa "nosql" nesse estilo.

Disclaimer: I do not build database engines. I build web applications. I run 4-6 different projects every year, so I build a lot of web applications. I see apps with different requirements and different data storage needs. I’ve deployed most of the data stores you’ve heard about, and a few that you probably haven’t.

I’ve picked the wrong one a few times. This is a story about one of those times — why we picked it originally, how we discovered it was wrong, and how we recovered. It all happened on an open source project called Diaspora.

The project

Diaspora is a distributed social network with a long history. Waaaaay back in early 2010, four undergraduates from New York University made a Kickstarter video asking for $10,000 to spend the summer building a distributed alternative to Facebook. They sent it out to friends and family, and hoped for the best.

But they hit a nerve. There had just been another Facebook privacy scandal, and when the dust settled on their Kickstarter, they had raised over $200,000 from 6400 different people for a software project that didn’t yet have a single line of code written.

Diaspora was the first Kickstarter project to vastly overrun its goal. As a result, they got written up in the New York Times – which turned into a bit of a scandal, because the chalkboard in the backdrop of the team photo had a dirty joke written on it, and no one noticed until it was actually printed. In the NEW YORK TIMES. The fallout from that was actually how I first heard about the project.

As a result of their Kickstarter success, the guys left school and came out to San Francisco to start writing code. They ended up in my office. I was working at Pivotal Labs at the time, and one of the guys’ older brothers also worked there, so Pivotal offered them free desk space, internet, and, of course, access to the beer fridge. I worked with official clients during the day, then hung out with them after work and contributed code on weekends.

They ended up staying at Pivotal for more than two years. By the end of that first summer, though, they already had a minimal but working (for some definition) implementation of a distributed social network built in Ruby on Rails and backed by MongoDB.

That’s a lot of buzzwords. Let’s break it down.

“Distributed social network”

If you’ve seen the Social Network, you know everything you need to know about Facebook. It’s a web app, it runs on a single logical server, and it lets you stay in touch with people. Once you log in, Diaspora’s interface looks structurally similar to Facebook’s:

A screenshot of the Diaspora interface

A screenshot of the Diaspora user interface

There’s a feed in the middle showing all your friends’ posts, and some other random stuff along the sides that no one has ever looked at. The main technical difference between Diaspora and Facebook is invisible to end users: it’s the “distributed” part.

The Diaspora infrastructure is not located behind a single web address. There are hundreds of independent Diaspora servers. The code is open source, so if you want to, you can stand up your own server. Each server, called a pod, has its own database and its own set of users, and will interoperate with all the other Diaspora pods that each have their own database and set of users.

The Diaspora Ecosystem

Pods of different sizes communicate with each other, without a central hub.

Each pod communicates with the others through an HTTP-based API. Once you set up an account on a pod, it’ll be pretty boring until you follow some other people. You can follow other users on your pod, and you can also follow people who are users on other pods. When someone you follow on another pod posts an update, here’s what happens:

1. The update goes into the author’s pod’s database.

2. Your pod is notified over the API.

3. The update is saved in your pod’s database.

4. You look at your activity feed and see that post mixed in with posts from the other people you follow.

Comments work the same way. On any single post, some comments might be from people on the same pod as the post’s author, and some might be from people on other pods. Everyone who has permission to see the post sees all the comments, just as you would expect if everyone were on a single logical server.

Who cares?

There are technical and legal advantages to this architecture. The main technical advantage is fault tolerance.

Here is a very important fault tolerant system that every office should have.

If any one of the pods goes down, it doesn’t bring the others down. The system survives, and even expects, network partitioning. There are some interesting political implications to that — for example, if you’re in a country that shuts down outgoing internet to prevent access to Facebook and Twitter, your pod running locally still connects you to other people within your country, even though nothing outside is accessible.

The main legal advantage is server independence. Each pod is a legally separate entity, governed by the laws of wherever it’s set up. Each pod also sets their own terms of service. On most of them, you can post content without giving up your rights to it, unlike on Facebook. Diaspora is free software both in the “gratis” and the “libre” sense of the term, and most of the people who run pods care deeply about that sort of thing.

So that’s the architecture of the system. Let’s look at the architecture within a single pod.

It’s a Rails app.

Each pod is a Ruby on Rails web application backed by a database, originally MongoDB. In some ways the codebase is a ‘typical’ Rails app — it has both a visual and programmatic UI, some Ruby code, and a database. But in other ways it is anything but typical.

The internal structure of one Diaspora pod

The visual UI is of course how website users interact with Diaspora. The API is used by various Diaspora mobile clients — that part’s pretty typical — but it’s also used for “federation,” which is the technical name for inter-pod communication. (I asked where the Romulans’ access point was once, and got a bunch of blank looks. Sigh.) So the distributed nature of the system adds layers to the codebase that aren’t present in a typical app.

And of course, MongoDB is an atypical choice for data storage. The vast majority of Rails applications are backed by PostgreSQL or (less often these days) MySQL.

So that’s the code. Let’s consider what kind of data we’re storing.

I Do Not Think That Word Means What You Think That Means

“Social data” is information about our network of friends, their friends, and their activity. Conceptually, we do think about it as a network — an undirected graph in which we are in the center, and our friends radiate out around us.

Photos all from rubyfriends.com. Thanks Matt Rogers, Steve Klabnik, Nell Shamrell, Katrina Owen, Sam Livingston-Grey, Josh Susser, Akshay Khole, Pradyumna Dandwate, and Hephzibah Watharkar for contributing to #rubyfriends!

When we store social data, we’re storing that graph topology, as well as the activity that moves along those edges.

For quite a few years now, the received wisdom has been that social data is not relational, and that if you store it in a relational database, you’re doing it wrong.

But what are the alternatives? Some folks say graph databases are more natural, but I’m not going to cover those here, since graph databases are too niche to be put into production. Other folks say that document databases are perfect for social data, and those are mainstream enough to actually be used. So let’s look at why people think social data fits more naturally in MongoDB than in PostgreSQL.

How MongoDB Stores Data

MongoDB is a document-oriented database. Instead of storing your data in tables made out of individual rows, like a relational database does, it stores your data in collections made out of individual documents. In MongoDB, a document is a big JSON blob with no particular format or schema.

Let’s say you have a set of relationships like this that you need to model. This is quite similar to a project that come through Pivotal that used MongoDB, and was the best use case I’ve ever seen for a document database.

At the root, we have a set of TV shows. Each show has many seasons, each season has many episodes, and each episode has many reviews and many cast members. When users come into this site, typically they go directly to the page for a particular TV show. On that page they see all the seasons and all the episodes and all the reviews and all the cast members from that show, all on one page. So from the application perspective, when the user visits a page, we want to retrieve all of the information connected to that TV show.

There are a number of ways you could model this data. In a typical relational store, each of these boxes would be a table. You’d have a tv_shows table, a seasons table with a foreign key into tv_shows, an episodes table with a foreign key into seasons, and reviews and cast_members tables with foreign keys into episodes. So to get all the information for a TV show, you’re looking at a five-table join.

We could also model this data as a set of nested hashes. The set of information about a particular TV show is one big nested key/value data structure. Inside a TV show, there’s an array of seasons, each of which is also a hash. Within each season, an array of episodes, each of which is a hash, and so on. This is how MongoDB models the data. Each TV show is a document that contains all the information we need for one show.

Here’s an example document for one TV show, Babylon 5.

It’s got some title metadata, and then it’s got an array of seasons. Each season is itself a hash with metadata and an array of episodes. In turn, each episode has some metadata and arrays for both reviews and cast members.

It’s basically a huge fractal data structure.

Sets of sets of sets of sets. Tasty fractals.

All of the data we need for a TV show is under one document, so it’s very fast to retrieve all this information at once, even if the document is very large. There’s a TV show here in the US called “General Hospital” that has aired over 12,000 episodes over the course of 50+ seasons. On my laptop, PostgreSQL takes about a minute to get denormalized data for 12,000 episodes, while retrieval of the equivalent document by ID in MongoDB takes a fraction of a second.

So in many ways, this application presented the ideal use case for a document store.

Ok. But what about social data?

Right. When you come to a social networking site, there’s only one important part of the page: your activity stream. The activity stream query gets all of the posts from the people you follow, ordered by most recent. Each of those posts have nested information within them, such as photos, likes, reshares, and comments.

The nested structure of activity stream data looks very similar to what we were looking at with the TV shows.

Users have friends, friends have posts, posts have comments and likes, each comment has one commenter and each like has one liker. Relationship-wise, it’s not a whole lot more complicated than TV shows. And just like with TV shows, we want to pull all this data at once, right after the user logs in. Furthermore, in a relational store, with the data fully normalized, it would be a seven-table join to get everything out.

Seven-table joins. Ugh. Suddenly storing each user’s activity stream as one big denormalized nested data structure, rather than doing that join every time, seems pretty attractive.

In 2010, when the Diaspora team was making this decision, Etsy’s articles about using document stores were quite influential, although they’ve since publicly moved away from MongoDB for data storage. Likewise, at the time, Facebook’s Cassandra was also stirring up a lot of conversation about leaving relational databases. Diaspora chose MongoDB for their social data in this zeitgeist. It was not an unreasonable choice at the time, given the information they had.

What could possibly go wrong?

There is a really important difference between Diaspora’s social data and the Mongo-ideal TV show data that no one noticed at first.

With TV shows, each box in the relationship diagram is a different type. TV shows are different from seasons are different from episodes are different from reviews are different from cast members. None of them is even a sub-type of another type.

But with social data, some of the boxes in the relationship diagram are the same type. In fact, all of these green boxes are the same type — they are all Diaspora users.

A user has friends, and each friend may themselves be a user. Or, they may not, because it’s a distributed system. (That’s a whole layer of complexity that I’m just skipping for today.) In the same way, commenters and likers may also be users.

This type duplication makes it way harder to denormalize an activity stream into a single document. That’s because in different places in your document, you may be referring to the same concept — in this case, the same user. The user who liked that post in your activity stream may also be the user who commented on a different post.

Duplicate data Duplicate data

We can represent this in MongoDB in a couple of different ways. Duplication is any easy option. All the information for that friend is copied and saved to the like on the first post, and then a separate copy is saved to the comment on the second post. The advantage is that all the data is present everywhere you need it, and you can still pull the whole activity stream back as a single document.

Here’s what this kind of fully denormalized stream document looks like.

Here we have copies of user data inlined. This is Joe’s stream, and it has a copy of his user data, including his name and URL, at the top level. His stream, just underneath, contains Jane’s post. Joe has liked Jane’s post, so under likes for Jane’s post, we have a separate copy of Joe’s data.

You can see why this is attractive: all the data you need is already located where you need it.

You can also see why this is dangerous. Updating a user’s data means walking through all the activity streams that they appear in to change the data in all those different places. This is very error-prone, and often leads to inconsistent data and mysterious errors, particularly when dealing with deletions.

Is there no hope?

There is another approach you can take to this problem in MongoDB, which will more familiar if you have a relational background. Instead of duplicating user data, you can store references to users in the activity stream documents.

With this approach, instead of inlining this user data wherever you need it, you give each user an ID. Once users have IDs, we store the user’s ID every place that we were previously inlining data. New IDs are in green below.

MongoDB actually uses BSON IDs, which are strings sort of like GUIDs, but to make these samples easier to read I’m just using integers.

This eliminates our duplication problem. When user data changes, there’s only one document that gets rewritten. However, we’ve created a new problem for ourselves. Because we’ve moved some data out of the activity streams, we can no longer construct an activity stream from a single document. This is less efficient and more complex. Constructing an activity stream now requires us to 1) retrieve the stream document, and then 2) retrieve all the user documents to fill in names and avatars.

What’s missing from MongoDB is a SQL-style join operation, which is the ability to write one query that mashes together the activity stream and all the users that the stream references. Because MongoDB doesn’t have this ability, you end up manually doing that mashup in your application code, instead.

Simple Denormalized Data

Let’s return to TV shows for a second. The set of relationships for a TV show don’t have a lot of complexity. Because all the boxes in the relationship diagram are different entities, the entire query can be denormalized into one document with no duplication and no references. In this document database, there are no links between documents. It requires no joins.

On a social network, however, nothing is that self-contained. Any time you see something that looks like a name or a picture, you expect to be able to click on it and go see that user, their profile, and their posts. A TV show application doesn’t work that way. If you’re on season 1 episode 1 of Babylon 5, you don’t expect to be able to click through to season 1 episode 1 of General Hospital.

Don’t. Link. The. Documents.

Once we started doing ugly MongoDB joins manually in the Diaspora code, we knew it was the first sign of trouble. It was a sign that our data was actually relational, that there was value to that structure, and that we were going against the basic concept of a document data store.

Whether you’re duplicating critical data (ugh), or using references and doing joins in your application code (double ugh), when you have links between documents, you’ve outgrown MongoDB. When the MongoDB folks say “documents,” in many ways, they mean things you can print out on a piece of paper and hold. A document may have internal structure — headings and subheadings and paragraphs and footers — but it doesn’t link to other documents. It’s a self-contained piece of semi-structured data.

If your data looks like that, you’ve got documents. Congratulations! It’s a good use case for Mongo. But if there’s value in the links between documents, then you don’t actually have documents. MongoDB is not the right solution for you. It’s certainly not the right solution for social data, where links between documents are actually the most critical data in the system.

So social data isn’t document-oriented. Does that mean it’s actually…relational?

That Word Again

When people say “social data isn’t relational,” that’s not actually what they mean. They mean one of these two things:

1. “Conceptually, social data is more of a graph than a set of tables.”

This is absolutely true. But there are actually very few concepts in the world that are naturally modeled as normalized tables. We use that structure because it’s efficient, because it avoids duplication, and because when it does get slow, we know how to fix it.

2. “It’s faster to get all the data from a social query when it’s denormalized into a single document.”

This is also absolutely true. When your social data is in a relational store, you need a many-table join to extract the activity stream for a particular user, and that gets slow as your tables get bigger. However, we have a well-understood solution to this problem. It’s called caching.

At the All Your Base Conf in Oxford earlier this year, where I gave the talk version of this post, Neha Narula had a great talk about caching that I recommend you watch once it’s posted. In any case, caching in front of a normalized data store is a complex but well-understood problem. I’ve seen projects cache denormalized activity stream data into a document database like MongoDB, which makes retrieving that data much faster. The only problem they have then is cache invalidation.

“There are only two hard problems in computer science: cache invalidation and naming things.”

Phil Karlton

It turns out cache invalidation is actually pretty hard. Phil Karlton wrote most of SSL version 3, X11, and OpenGL, so he knows a thing or two about computer science.

Cache Invalidation As A Service

But what is cache invalidation, and why is it so hard?

Cache invalidation is just knowing when a piece of your cached data is out of date, and needs to be updated or replaced. Here’s a typical example that I see every day in web applications. We have a backing store, typically PostgreSQL or MySQL, and then in front of that we have a caching layer, typically Memcached or Redis. Requests to read a user’s activity stream go to the cache rather than the database directly, which makes them very fast.

Typical cache and backing store setup

Application writes are more complicated. Let’s say a user with two followers writes a new post. The first thing that happens (part 1) is that the post data is copied into the backing store. Once that completes, a background job (part 2)  appends that post to the cached activity stream of both of the users who follow the author.

This pattern is quite common. Twitter holds recently-active users’ activity streams in an in-memory cache, which they append to when someone they follow posts something. Even smaller applications that employ some kind of activity stream will typically end up here (see: seven-table join).

Back to our example. When the author changes an existing post, the update process is essentially the same as for a create, except instead of appending to the cache, it updates an item that’s already there.

What happens if that step 2 background job fails partway through? Machines get rebooted, network cables get unplugged, applications restart. Instability is the only constant in our line of work. When that happens, you’ll end up with invalid data in your cache. Some copies of the post will have the old title, and some copies will have the new title. That’s a hard problem, but with a cache, there’s always the nuclear option.

Always an option >_<

You can always delete the entire activity stream record out of your cache and regenerate it from your consistent backing store. It may be slow, but at least it’s possible.

What if there is no backing store? What if you skip step 1? What if the cache is all you have?

When MongoDB is all you have, it’s a cache with no backing store behind it. It will become inconsistent. Not eventually consistent — just plain, flat-out inconsistent, for all time. At that point, you have no options. Not even a nuclear one. You have no way to regenerate the data in a consistent state.

When Diaspora decided to store social data in MongoDB, we were conflating a database with a cache. Databases and caches are very different things. They have very different ideas about permanence, transience, duplication, references, data integrity, and speed.

The Conversion

Once we figured out that we had accidentally chosen a cache for our database, what did we do about it?

Well, that’s the million dollar question. But I’ve already answered the billion-dollar question. In this post I’ve talked about how we used MongoDB vs. how it was designed to be used. I’ve talked about it as though all that information were obvious, and the Diaspora team just failed to research adequately before choosing.

But this stuff wasn’t obvious at all. The MongoDB docs tell you what it’s good at, without emphasizing what it’s not good at. That’s natural. All projects do that. But as a result, it took us about six months, a lot of user complaints, and a lot of investigation to figure out that we were using MongoDB the wrong way.

There was nothing to do but take the data out of MongoDB and move it to a relational store, dealing as best we could with the inconsistent data we uncovered along the way. The data conversion itself — export from MongoDB, import to MySQL — was straightforward. For the mechanical details, you can see my slides from All Your Base Conf 2013.

The Damage

We had eight months of production data, which turned into about 1.2 million rows in MySQL. We spent four pair-weeks developing the code for the conversion, and when we pulled the trigger, the main site had about two hours of downtime. That was more than acceptable for a project that was in pre-alpha. We could have reduced that downtime more, but we had budgeted for eight hours of downtime, so two actually seemed fantastic.

NOT BAD

Epilogue

Remember that TV show application? It was the perfect use case for MongoDB. Each show was one document, perfectly self-contained. No references to anything, no duplication, and no way for the data to become inconsistent.

About three months into development, it was still humming along nicely on MongoDB. One Monday, at the weekly planning meeting, the client told us about a new feature that one of their investors wanted: when they were looking at the actors in an episode of a show, they wanted to be able to click on an actor’s name and see that person’s entire television career. They wanted a chronological listing of all of the episodes of all the different shows that actor had ever been in.

We stored each show as a document in MongoDB containing all of its nested information, including cast members. If the same actor appeared in two different episodes, even of the same show, their information was stored in both places. We had no way to tell, aside from comparing the names, whether they were the same person. So to implement this feature, we needed to search through every document to find and de-duplicate instances of the actor that the user clicked on. Ugh. At a minimum, we needed to de-dup them once, and then maintain an external index of actor information, which would have the same invalidation issues as any other cache.

You See Where This Is Going

The client expected this feature to be trivial. If the data had been in a relational store, it would have been. As it was, we first tried to convince the PM they didn’t need it. After that failed, we offered some cheaper alternatives, such as linking to an IMDB search for the actor’s name. The company made money from advertising, though, so they wanted users to stay on their site rather than going off to IMDB.

This feature request eventually prompted the project’s conversion to PostgreSQL. After a lot more conversation with the client, we realized that the business saw lots of value in linking TV shows together. They envisioned seeing other shows a particular director had been involved with, and episodes of other shows that were released the same week this one was, among other things.

This was ultimately a communication problem rather than a technical problem. If these conversations had happened sooner, if we had taken the time to really understand how the client saw the data and what they wanted to do with it, we probably would have done the conversion earlier, when there was less data, and it was easier.

Always Be Learning

I learned something from that experience: MongoDB’s ideal use case is even narrower than our television data. The only thing it’s good at is storing arbitrary pieces of JSON. “Arbitrary,” in this context, means that you don’t care at all what’s inside that JSON. You don’t even look. There is no schema, not even an implicit schema, as there was in our TV show data. Each document is just a blob whose interior you make absolutely no assumptions about.

At RubyConf this weekend, I ran into Conrad Irwin, who suggested this use case. He’s used MongoDB to store arbitrary bits of JSON that come from customers through an API. That’s reasonable. The CAP theorem doesn’t matter when your data is meaningless. But in interesting applications, your data isn’t meaningless.

I’ve heard many people talk about dropping MongoDB in to their web application as a replacement for MySQL or PostgreSQL. There are no circumstances under which that is a good idea. Schema flexibility sounds like a great idea, but the only time it’s actually useful is when the structure of your data has no value. If you have an implicit schema — meaning, if there are things you are expecting in that JSON — then MongoDB is the wrong choice. I suggest taking a look at PostgreSQL’s hstore (now apparently faster than MongoDB anyway), and learning how to make schema changes. They really aren’t that hard, even in large tables.

Find The Value

When you’re picking a data store, the most important thing to understand is where in your data — and where in its connections — the business value lies. If you don’t know yet, which is perfectly reasonable, then choose something that won’t paint you into a corner. Pushing arbitrary JSON into your database sounds flexible, but true flexibility is easily adding the features your business needs.

Make the valuable things easy.

The End.

Thanks for reading! Let me sum up how I feel about comments on this post:

09 Jan 10:00

When a recruiter asks for a devops certification

by sharhalakis

by @maximilienriehl

07 Jan 10:00

Hearing that your project failed, then realizing it was someone else's fault

by sharhalakis

by Craig

29 Dec 17:28

M&Ms Trick Your Brain When Positioned on a LEGO Checkerboard [Pics]

by Geeks are Sexy

check

Mary Coffelt, Briena Heller, and Michael McCamy have created this fun take on Akiyoshi Kitaoka’s “Bulge” illusion at the Barrow Neurological Institute last summer. Sure, we all now that lines on a LEGO checkerboard can’t be anything but straight, but when you strategically position M&Ms on it, your brain warps your vision and make the whole thing all funny looking.

Can can check out this video to see what happens to a similar illustion when you make the “dots” disappear all at once.

[Source: SciAm | Via IO9]

Created by Mary Coffelt and her colleagues at the Barrow Neurological Institute, this version of Akiyoshi Kitaoka’s “Bulge” illusion uses strategically placed white and purple M&Ms to warp your perception of a checkerboard built from legos of the same color. (Hit the jump to see a video of the Bulge illusion in action.)

The post M&Ms Trick Your Brain When Positioned on a LEGO Checkerboard [Pics] appeared first on Geeks are Sexy Technology News.

15 Dec 04:53

Pendrav de colocar chip

by ProgramadorREAL

Olha a pérola que o Douglas Junior me enviou… Aliás, ele não deve fazer nada mais da vida pra ficar me mandando essas coisas… :D

pendrav

Pessoa 1: Alguem pode me informar onde compro pendrav desbloqueado de notebook para colocar o chip
Pessoa 2: vc fala um moldem eu tenho um da vivo desbloquiado
Pessoa 1: É um pendraiv de colocar chip de celular so que o pendraiv tenque ser desbloqueado
Pessoa 3: Eu tenho da tim nunca usado
Pessoa 3: Nao e pendrav e um molde da tim
Pessoa 1: Mais eu quero um pendrav de colocar chip de celular…
Pessoa 3: Sim ele da pra colocar chip de celular
Pessoa 1: Manda uma foto pra mim
Pessoa 4: Na Tim tem
Pessoa 4: E e desbloqueado
Pessoa 1: Mais eu quero um baratinho

The post Pendrav de colocar chip appeared first on Vida de Programador.

17 Dec 10:00

Thanks for the commit, but do not break the build again

by sharhalakis

by uaiHebert

16 Dec 10:00

Accidentally destroying the wrong VM

by sharhalakis

by torax

16 Dec 10:51

Mentirinhas #745

by Fábio Coala

mentirinhas_734

Melhor que ser hackeado.

O post Mentirinhas #745 apareceu primeiro em Mentirinhas.