The Old Reader

07 Sep 14:54

Mirth and laughter elicited by electrical stimulation of the human anterior cingulate cortex.

by Caruana F, Avanzini P, Gozzo F, Francione S, Cardinale F, Rizzolatti G

Mirth and laughter elicited by electrical stimulation of the human anterior cingulate cortex.

Cortex. 2015 Aug 1;71:323-331

Authors: Caruana F, Avanzini P, Gozzo F, Francione S, Cardinale F, Rizzolatti G

Abstract
Laughter is a complex motor behavior that, typically, expresses mirth. Despite its fundamental role in social life, knowledge about the neural basis of laughter is very limited and mostly based on a few electrical stimulation (ES) studies carried out in epileptic patients. In these studies laughter was elicited from temporal areas where it was accompanied by mirth and from frontal areas plus an anterior cingulate case where laughter without mirth was observed. On the basis of these findings, it has been proposed a dichotomy between temporal lobe areas processing the emotional content of laughter and anterior cingulate cortex (ACC) and motor areas responsible of laughter production. The present study is aimed to understand the role of ACC in laughter. We report the effects of stimulation of 10 rostral, pregenual ACC (pACC) patients in which the ES elicited laughter. In half of the patients ES elicited a clear burst of laughter with mirth, while in the other half mirth was not evident. This large dataset allow us to offer a more reliable picture of the functional contribute of this region in laughter, and to precisely localize it in the cingulate cortex. We conclude that the pACC is involved in both the motor and the affective components of emotions, and challenge the validity of a sharp dichotomy between motor and emotional centers for laughing. Finally, we suggest a possible anatomical network for the production of positive emotional expressions.

PMID: 26291664 [PubMed - as supplied by publisher]

25 Aug 15:40

On the Street…La Fortezza, Florence

by The Sartorialist

24 Aug 14:23

Same BMI, different body

by Nathan Yau

Body mass index is often used as a way to set weight classes of underweight to obese, but the measurement is likely too basic. From the New York Times:

The illustrations here were created from scans of six people, who were all 5 feet 9 inches tall and 172 pounds. This means that though their bodies look very different, they all have exactly the same body mass index, or B.M.I. At 25.4, technically each of them could be considered overweight.

And Dwayne “The Rock” Johnson is classified as obese.

Of course, this applies to most indexes and classifications. You try to get it to work for as many as people, places, or things as you can, but there's always going to be exceptions and more information to be had.

Tags: health, New York Times, weight

24 Aug 14:16

Rockin the tabloids

by Andrew

Rick Gerkin points me to this opinion piece from a couple years ago by biologist Randy Schekman, titled “How journals like Nature, Cell and Science are damaging science” and subtitled “The incentives offered by top journals distort science, just as big bonuses distort banking.” Here’s Schekman:

The prevailing structures of personal reputation and career advancement [in biology] mean the biggest rewards often follow the flashiest work, not the best. . . .

We all know what distorting incentives have done to finance and banking. The incentives my colleagues face are not huge bonuses, but the professional rewards that accompany publication in prestigious journals – chiefly Nature, Cell and Science.

These luxury journals are supposed to be the epitome of quality, publishing only the best research. Because funding and appointment panels often use place of publication as a proxy for quality of science, appearing in these titles often leads to grants and professorships. But the big journals’ reputations are only partly warranted. . . .

These journals aggressively curate their brands, in ways more conducive to selling subscriptions than to stimulating the most important research. Like fashion designers who create limited-edition handbags or suits, they know scarcity stokes demand, so they artificially restrict the number of papers they accept. . . .

A paper can become highly cited because it is good science – or because it is eye-catching, provocative or wrong. Luxury-journal editors know this, so they accept papers that will make waves because they explore sexy subjects or make challenging claims. . . . It builds bubbles in fashionable fields where researchers can make the bold claims these journals want . . .

In extreme cases, the lure of the luxury journal can encourage the cutting of corners, and contribute to the escalating number of papers that are retracted as flawed or fraudulent. . . .

Sharif don’t like it.

The post Rockin the tabloids appeared first on Statistical Modeling, Causal Inference, and Social Science.

Rdporto1 likes this

14 Aug 14:32

Saturday Morning Breakfast Cereal - Natural Selection

by admin@smbc-comics.com

Hovertext: Now, if you'll excuse me, I'm going to go kill off some weak baby gazelles.

New comic!
Today's News:

Yohan, Emily Stephen and 6 others like this

13 Aug 13:57

Fitbit during sex

by Nathan Yau

Reddit user noveltysin wore a Fitbit during sex, and then posted a screenshot of her heart rate estimates.

So yeah. There that is. See the reddit thread for a mature and academic discussion of the data, including a line-by-line adult parody of Eminem's Lose Yourself.

Doesn't quite beat the marriage proposal heartbeat.

Tags: Fitbit, humor

David Pelaez likes this

13 Aug 13:56

Building the Next New York Times Recommendation Engine

by By Alexander Spangher

The New York Times publishes over 300 articles, blog posts and interactive stories a day.

Refining the path our readers take through this content — personalizing the placement of articles on our apps and website — can help readers find information relevant to them, such as the right news at the right times, personalized supplements to major events and stories in their preferred multimedia format.

In this post, I’ll discuss our recent work revamping The New York Times’s article recommendation algorithm, which currently serves behind the Recommended for You section of NYTimes.com.

History

Content-based filtering

News recommendations must perform well on fresh content: breaking news that hasn’t been viewed by many readers yet. Thus, the article data available at publishing time can be useful: the topics, author, desk and associated keyword tags of each article.

Our first recommendation engine used these keyword tags to make recommendations. Using tags for articles and a user’s 30-day reading history, the algorithm recommends articles similar to those that have already been read.

Because this technique relies on a content model, it’s part of a broader class of content-based recommendation algorithms.

The approach has intuitive appeal: If a user read ten articles tagged with the word “Clinton,” they would probably like future “Clinton”-tagged articles. And this technique performs as well on fresh content as it does on older content, since it relies on data available at the time of publishing.

However, this method relies on a content model that, sometimes, has unintended effects. Because the algorithm weights tags by their rareness within a corpus, rare tags have a large effect. This works well most of the time, but occasionally degrades the user experience. For instance, one reader noted that while she was interested in same-sex pieces, occasionally in the Weddings section, she was being recommended wedding coverage about heterosexual couples. This is because a low-frequency tag, “Weddings and Engagements,” was in an article previously clicked, outweighing all other tags that may have been more applicable for that reader.

Collaborative Filtering

To accommodate the shortcomings of the previous method, we tested collaborative filtering. Collaborative filters surface articles based on what similar readers have read; in our case, similarity was determined by reading history.

This approach is also appealing: If one reader’s preferences are very similar to another reader’s, articles that the first reader reads might interest the second, and vice versa.

However, this approach fails at recommending newly-published, unexplored articles: articles that were relevant to groups of readers but hadn’t yet been read by any reader in that group. A collaborative filter might also, hypothetically, cluster reading patterns in narrow viewpoints.

Current Approach

It turns out that straddling both techniques can give us the best of both worlds. We built an algorithm inspired by a technique, Collaborative Topic Modeling (CTM), that (1) models content, (2) adjusts this model by viewing signals from readers, (3) models reader preference and (4) makes recommendations by similarity between preference and content.

Overview

Our algorithm starts by modeling each article as a mixture of the topics it addresses. We can think of a topic as an unobserved theme, like Politics or Environment, that affects the words observed in the article. For example, if an article is about the environment, we’d expect words like “tree” or “conservation.”

We model each reader based on their topic preferences. We can then recommend articles based on how closely their topics match a reader’s preferred topics.

As an example, we run our algorithm supposing that all New York Times articles published in the last month can be represented as a combination of two topics. Under these constraints, the algorithm identifies these topics, roughly, as Politics and Art. Our algorithm finds an article, America Deepens its Footprint in Iraq Once More, as 100% Politics, and a film review by A.O. Scott as 100% Art. It labels mixtures, too; for example, an article about art politics, Frick Museum Abandons Contested Renovation Plan, is labeled 50% Politics, 50% Art.

In this Politics–Art space, we might describe our articles in the following manner:

Next, suppose that one reader prefers to read about Art 60% of the time and Politics 40% of the time. We might represent that reader with the red x. The magical part is that they are spatially close to articles that align with their interests, even if they haven’t read them yet; we recommend articles that are closest to them in the space.

There are further questions for us to answer. Can this topic space capture ambiguous word usage? And how do we best observe the preferences of our readers? Clicks are, after all, not robust: I’m sure that at some point, you have clicked on something you didn’t enjoy and missed something you would have found interesting.

We tested many options carefully; the algorithm we built brings us closer to addressing some of these questions, and gives us a powerful new way to understand The New York Times.

This is a three-part challenge:

Part 1: How to model an article based on its text.
Part 2: How to update the model based on audience reading patterns.
Part 3: How to describe readers based on their reading history.

Part 1: How to model an article based on its text.

First, our algorithm looks at the body of each article and applies Latent Dirichlet Allocation (LDA), a content modeling algorithm. LDA learns the mixture of “topics” in each article: A topic is formally defined as a distribution over a vocabulary. If a document has a certain topic weighted highly, the words seen in the article are more likely to be words weighted highly under this topic.

LDA is quick, accurate for our purposes and capable of online inference (or learning topics in real time as new articles are published). LDA topics tend to be broad (some examples are Middle East, Film and Healthcare), which allows us to relate pieces from differing viewpoints.

LDA is based on a graphical model, which can easily be extended to incorporate new assumptions and information. In our case, we extend our model to model not only the text of an article, but also the specific readers reading that article, described in the next section.

Part 2: How to update the model based on audience reading patterns.

LDA takes words as input, but words are often ambiguous: Context, style and voice can adjust their meaning. For example, if Gail Collins writes a piece with the words “dog,” “car” and “roof,” can we tell that she’s being allegorical, and not just writing about animals and automobiles?

Indeed, a purely LDA-based approach gives weight to travel, placing her piece at the blue point in the diagram below.

However, a large sample of readers reading the piece have also read pieces about Hillary Clinton and Ted Cruz (visualized for illustration at the red x’s), so we’d like our algorithm to adjust it to the green point in the Politics topic. By adding offsets to model topic error, as described in the CTM paper, our algorithm incorporates reading patterns on top of content modeling to create a hybrid approach.

The CTM algorithm works by iteratively adjusting offsets and then recalculating reader scores. It runs until there is little change in either. A randomly chosen subset of readers, called our training sample, gives us the information we need.

We tested two methods for calculating offsets: (1) CTM and (2) an approach called Collaborative Poisson Factorization. In online A/B testing, we found CTM to perform better.

Part 3: How to describe readers based on their reading history.

The methods used for adjusting article topics also calculate reader preference, but they can’t scale to all our users. Therefore, we needed a quick way to calculate reader preferences, which can occur after finalizing article topics.

A simple approach would be to average together the topics of all articles you’ve read: If you’ve clicked on one article that is 40% Politics, 60% Art and another that is 60% Politics, 40% Art, you’re [. 5, .5] in the Politics–Art topic space.

However, this assumes that clicks perfectly indicate preference. What if you clicked on an article you didn’t like, or missed one you would have? One way to approach this is to back off a bit, and say that you only “90% like” the articles you read, and “10% like” the ones you didn’t. This leaves room for mistaken clicks or missed gems.

In this diagram, the green dots represent articles a reader has read, and the red dots represent ones they haven’t. The black x might be the reader’s preferences calculated using an average of articles read, and the blue x would be using the back-off approach.

The back-off approach makes a more conservative estimate of preferences, allowing us to be more robust to noisy data. It also, we’ve noticed, brings readers out of niches and exposes them to different, “serendipitous” recommendations.

By implementing some neat algorithmic speed-ups, we were able to calculate preferences in less than one millisecond per reader, enabling us to scale to all registered users.

Conclusion

By modeling article content and reader preferences with topics, then adjusting based on reading patterns, we’ve reconceptualized our recommendation engine. Our system is now a successful, large-scale implementation of cutting-edge research in collaborative topic modeling, and it provides significant performance increases when compared with previous algorithms used to make recommendations.

Recommendation systems, we hope, can enable a dynamic New York Times to serve interesting articles at opportune times. They also help shine light on the types of articles we’re writing, and who they appeal to.

The New York Times technology team is responsible for building, maintaining and improving NYTimes.com, our mobile apps and products, and the technology that powers them. We’re a group of over 250 people, delivering content from one of the largest news organizations in the world. If you’re interested in leveraging one of the deepest datasets in the news industry and helping us develop better products, make smarter business decisions and reach larger audiences through our personalization efforts, then we’d love to hear from you.

11 Aug 02:08

Disappearing Arctic reflected in National Geographic maps

by Nathan Yau

In the most recent update to their atlas coming in September, National Geographic explains the shrinking Arctic through the lens of previous atlas maps. It's not looking good.

As the ocean heats up due to global warming, Arctic sea ice has been locked in a downward spiral. Since the late 1970s, the ice has retreated by 12 percent per decade, worsening after 2007, according to NASA. May 2014 represented the third lowest extent of sea ice during that month in the satellite record, according to the National Snow and Ice Data Center (NSIDC).

Tags: Arctic, global warming, National Geographic

Sofía Henao, David Pelaez likes this

04 Aug 17:12

More Money For Better Open-Source Software

03 Aug 20:55

The Web We Have to Save

Edenovellis
Interesting thoughts on the current state of the web from an Iranian blogger who has been in jail for the last six years

29 Jul 14:41

Avoid busy times at local businesses with Google

by Nathan Yau

Waiting in line stinks. I purposely go to the grocery store during off-times with my son, so I don't have to deal with the long lines. Google, I think currently only on Android phones, now provides information on when people go to the businesses around you, using a similar logic to auto traffic on Google Maps. Nice.

Tags: Google, location, traffic

27 Jul 14:18

nytlabs

Edenovellis
Real time annotating and tagging using recurrent neural nets by the new york times

22 Jul 01:28

Swear maps

by Nathan Yau

Linguist Jack Grieve posted a bunch of maps that show swearing geographically, based on geotagged tweets. Above is the map for "gosh". The more red, the higher the relative usage in a county and the more blue, the less usage.

Here is the geographic distribution for "darn", which has a strong showing in the midwest:

Darn map

There are of course more four-letter words to look at, but I'll leave that to you.

Keep in mind that being Twitter-based maps, this comes with the usual caveat that the distributions show the Twitter population's language, which skews younger and more affluent than the general population. There's also some smoothing and clustering going on here. It's the same methodology Grieve used for previous maps that showed patterns for "bro", "dude", and "fella".

Tags: swearing, Twitter

M likes this

16 Jul 15:39

What are Bloom filters?

Edenovellis
http://billmill.org/bloomfilter-tutorial/

Not apropos of anything....

14 Jul 16:14

I Let IBM’s Robot Chef Tell Me What to Cook for a Week — How We Get To Next — Medium

13 Jul 21:22

It’s 2015 — You’d Think We’d Have Figured Out How To Measure Web Traffic By Now

by Sam Dean

In May, a Vanity Fair article about Bill Simmons’s departure from ESPN said that Grantland had 6 million unique visitors in March but that “ESPN’s internal numbers … had the site reaching 10 million uniques in April.”

Late last year, The Wall Street Journal noted that Buzzfeed had 74.6 million monthly uniques, but that its “internal traffic numbers are far higher than the comScore figures … in striking distance of passing 200 million unique viewers per month.”

Last fall, Arianna Huffington wrote “100 Million Thank-Yous” to celebrate Huffington Post’s 115 million unique visitors in August but noted that their “internal numbers, at 368 million UVs, are much higher, of course.”

Not even as numerate an institution as FiveThirtyEight is immune.

dean-feature-webmetrics

Uniques are what most people mean when they talk about a website’s traffic. Show up once and you count as one unique visitor — show up again in the same month, or even visit the site every day in that month, and you still count as one unique visitor (or at least that’s the idea). Uniques are the big-picture number — the Nielsen rating, the Blue Book value, the GDP — that’s supposed to show how well a website is doing. People used to talk about pageviews, a simple count of how many pages were loaded over a certain amount of time. But uniques have taken over, because uniques measure people, not pages. Advertisers care about the former when they’re planning an ad buy.

If uniques are people, how do 4 million, or 125 million, or 253 million people go missing? In an age when we assume our phones and laptops are tracking our every move, taking an actual head count of how many people go to a website is still almost impossible. There’s a blind spot at the center of the panopticon, and it’s roughly the size and shape of a cookie.

Lou Montulli invented “Web cookies” to give the Web a memory. On his blog, The Irregular Musings of Lou Montulli, he described surfing the pre-cookie Internet as “a bit like talking to someone with Alzheimer[’s] disease,” where “each interaction would result in having to introduce yourself again, and again, and again.”

Practically, this meant that every time you wanted to check your email, you had to re-enter your username and password. Shopping online was even harder: Getting all the way through the checkout process depended on clicking directly from page to page — if you happened to hit “back” or just closed your Outpost.com⁴¹ window by mistake, you’d have to start over from the beginning.

In 1994 Montulli noted all this while he was a programmer at Netscape, and he decided to fix it — he decided to make cookies to serve as little memory files for our online lives.⁴² After that, when you went to Outpost.com, your browser would download a cookie file to a folder on your hard drive. The next time you visited, the site would ask your browser to check whether you had an old Outpost.com cookie sitting around. If so, it would remember who you were, or that you had a left-handed Apple mouse in your virtual shopping cart, and you wouldn’t have to start from scratch.

The simplest solution to the problem of a Web with no memory would have been to give every Web browser, or even every Web user, a unique ID code, a driver’s license for the information superhighway. But Montulli made sure that didn’t happen.

“I was very much against this concept,” Montulli writes, “because the unique identifier could be used to track a user at every website.” Cookies, in other words, were designed to thwart surveillance and the kind of broad-spectrum tracking that advertisers crave. Far from a driver’s license, cookies were just online loyalty cards, stamped by a website every time you stopped by.

Marketers soon realized that cookie technology, with a slight twist, could work for them in some ways. In addition to a website’s own “first-party” cookies, marketers started asking websites to serve up the marketer’s own “third-party” cookies, too. Then, when you visited two websites that had agreed to serve up the same marketer’s third-party cookie, the marketer’s server would register a match and know that you’d been on both sites — spread those matches far enough, and the marketer now has a good picture of your overall behavior. No need for a driver’s license if marketers can just slap a sign on your back when you aren’t looking.

This allowed marketers to build up usage profiles and then, more importantly, start serving up ads across a person’s Web experience — if you went to pools.com, they could see that you had visited mesotheliomalawyers.com earlier that week and serve up an ad asking if you need legal help with an asbestos-related disease. But thanks to the way cookies work, third-party cookies still couldn’t tell marketers how many real people went to a website. Because cookies, whether first-party loyalty cards or third-party secret trackers, aren’t attached to people at all, but individual browsers on particular computers.

If you use both Chrome and Safari in a day, week or month, then you, the person, are now represented by two separate cookies. If you use Chrome and Safari on both your work and home computers, then two cookies becomes four. If you also use a phone and a tablet, and use multiple browsers on those, four becomes eight. And if, at some point during the month in which these cookies are being tracked, you or your antivirus programs delete your cookie cache, then fresh cookies get served, and the numbers climb even higher.

MW_FiveThirtyEight_03_01

Those huge, parenthetical, internal traffic numbers are the raw cookie counts — the number of humans who visited a site, multiplied by all the browsers, machines and accidental deletions.

The lower numbers are just the cookies, crunched. ComScore, Quantcast, Nielsen and other measurement companies use proprietary models to estimate how many actual people went to a website over a given amount of time. There really is no way to directly measure uniques, but the companies’ estimates are much more accurate reflections of traffic reality.

ComScore was one of the first companies to get into the measurement game for the Web. I asked their chief research officer, Josh Chasin, how they come up with their numbers every month. Some background was required before he could answer.

“When comScore started out, we said we measured the Internet, but what we really measured was computer access to the Internet,” Chasin said. “At the time, those two were synonymous. But now measuring the Internet means measuring across multiple devices, notably smartphones and tablets, but also gaming consoles, Roku, Apple TV, and it’s probably also going to mean measuring watches.”

ComScore was one of the first businesses to take the approach Nielsen uses for TV and apply it to the Web. Nielsen comes up with TV ratings by tracking the viewing habits of its panel — those Nielsen families — and taking them as stand-ins for the population at large. Sometimes they track people with boxes that report what people watch; sometimes they mail them TV-watching diaries to fill out.⁴³ ComScore gets people to install the comScore tracker onto their computers and then does the same thing.

Nielsen gets by with a panel of about 50,000 people as stand-ins for the entire American TV market. ComScore uses a panel of about 225,000 people⁴⁴ to create their monthly Media Metrix numbers, Chasin said — the numbers have to be much higher because Internet usage is so much more particular to each user. The results are just estimates, but at least comScore knows basic demographic data about the people on its panel, and, crucial in the cookie economy, knows that they are actually people.⁴⁵

As Chasin noted, though, the game has changed. Mobile users are more difficult to wrangle into statistically significant panels for a basic technical reason: Mobile apps don’t continue running at full capacity in the background when not in use, so comScore can’t collect the constant usage data that it relies on for its PC panel. So when more and more users started going mobile, comScore decided to mix things up.

“Before 2009, we were pretty staunchly in the panel camp, but then we realized they weren’t enough,” Chasin said. “We’re pretty clear on this now: good measurement requires the integration of panel measurement and site-centric measurement from tagging.”

Tagging works basically like third-party cookies. Websites that hire comScore or Quantcast or Nielsen to measure their sites embed little one-pixel “beacons” in each of their pages, which ping back to the measurement company’s servers each time they’re loaded, recording data such as users’ IP addresses, what time they loaded the page and what cookies they already have saved. The companies then combine the panel and tagging data, compare that to the raw internal cookies, and out pop the uniques.

ComScore produces the most widely referenced online audience measurement numbers in the business, but that doesn’t mean its numbers are the most accurate. “It’s probably fair to say right now that our mobile panel could be larger,” Chasin said. Using the server-side tagging system helps close that gap to some degree, but as the majority of Web traffic migrates to mobile, that leaves a huge potential hole in comScore’s numbers. On a less technical note, too, there’s the fundamental problem that all this modeling takes time. ComScore and its competitors come out with their top-level traffic rankings weeks or months after the period they’re measuring, leaving publishers and ad buyers to work with old data in an industry built on the premise of instantaneous communication.

Each measurement company comes up with different numbers each month, because they all have different proprietary models, and the data gets more tenuous when they start to break it out into age brackets or household income or spending habits, almost all of which is user-reported. (And I can’t be the only person who intentionally lies, extravagantly, on every online survey that I come across.)

In the end, though, just having a number that everyone can point to as an acceptable proxy of reality is more important than how accurate that number may be. The Nielsen TV rating is notoriously fuzzy, but companies bought $78 billion of TV ads in 2013 based on their faith that those ratings were good enough. ComScore could theoretically measure mobile better, and come out with real-time reporting, but money is as much a limiting factor as technology. Metrics are only ever as good as it is financially viable for them to be, and advertisers, publishers and agencies will pay for only as much accuracy as their own business will support. Right now, comScore leads the industry when it comes to online audience measurement, and comScore has to be only accurate enough to keep that lead.

So, unless you have a serious paywall, and therefore have users who are logged in 100 percent of the time (like the Financial Times), there is just no way to know for sure how many individual real-live people visit your site in a month, week or day.

And that’s assuming that real people are even visiting your site in the first place. A study published this year by a Web security company found that bots make up 56 percent of all traffic for larger websites, and up to 80 percent of all traffic for the mom-and-pop blogs out there. More than half of those bots are “good” bots, like the crawlers that Google uses to generate its search rankings, and are discounted from traffic number reports. But the rest are “bad” bots, many of which are designed to register as human users — that same report found that 22 percent of Web traffic was made up of these “impersonator” bots.

Given the size of this bot horde, an industry-funded regulatory agency called the Media Ratings Council is moving to require all measurement services to include bot-detection and exclusion methods in their products in order to get their official stamp of approval. But even if all the bot traffic can be weeded out, that’s one more estimation that has to be folded into the estimates, all using another layer of proprietary methods, further widening the divide between what can be directly measured and what can be considered reality.

I asked David Coletti, ESPN’s VP of digital media research and analytics, how big the difference between the internal and external numbers tends to be across the sites (like this one) that he oversees for the company.

“We always see a delta of at least a couple million,” Coletti said, for the smaller sites under his aegis (again, like this one). But in his experience, “the more the site is visited, the bigger the discrepancy gets.”

At ESPN.com, the mothership of ESPN Web properties, Coletti says he’ll often see the internal numbers for monthly unique visitors running at three times the comScore numbers.

“If I were to go out and make the argument that the internal number is correct,” Coletti said, “I would be suggesting that every American visited ESPN in the past month, which would be wonderful, but unlikely.”

Traffic, as represented by unique visitors, will always be estimated under the current technological regime, and those parenthetical “internal numbers” that reporters drop in media stories bear little relation to how many actual people go to a given website. Or as Coletti puts it: “Neither numbers are right or wrong — they’re just counting in different ways, and it’s unsatisfying.”

Facebook is trying to change that.

The social media giant announced in May that it would begin hosting articles directly on its own servers, with no link out to the websites that created them. The content-creating websites (in the pilot program, that means outlets including The New York Times and Buzzfeed, but more are sure to come) justified this move as necessary to bring in high Web traffic. Hosting the articles on Facebook allows for flashier “read this” buttons and shorter loading times, which in turn, theoretically, makes more people read the articles, boosting traffic.

But for Facebook, and advertisers and the media companies themselves, this move also solves the cookie problem. Facebook doesn’t need cookies — it has faces, faces of real people, or at least accounts that correspond to real people, which means that it knows how many real people look at an article hosted on Facebook. And more than that, even, it knows their names, and their ages, and what they “like,” and probably where they live.

Apple and Google are in a position to break the cookie regime, too, with the possibility of persistent logins across browsers, devices, days and years, but Facebook is out front. In the current version of the future, knowing how many real people went to a given site will likely also mean knowing which real people went to a given site. No proxy, no guessing, just you.

The Internet has become the first fully paranoid mass medium. If we read, if we click, if we watch, we do so with the knowledge that we are being watched in turn. When ads adjust to what we type and feeds adjust to what we like, we have visual proof that the network is looking at us. When the watchers seem to get it wrong, and show us an ad for orthopedic surgery after we search for elbow macaroni, we get to experience the grim glee, once reserved for prisoners and test subjects, of hearing loud snores through the one-way mirror.

This wasn’t the purpose of the Internet when it first got going, but it quickly became its selling point. Advertisers dreamed of reaching “one to one,” a state of omniscience in which they could precisely target not only specific demographics but individual consumers with a particular ad. The Internet promised to make that dream come true.

Twenty years later, we take it as a given that we’re living in that dream. We are tracked, through our phones and our laptops, by a long list of companies, and assume that they probably know everything we do.

But the assumption has preceded the reality.

The cookie conundrum, the direct uncountability of how many people actually go to a given website, isn’t even considered a major issue in the online ad world — they have much bigger problems. Studies over the past couple of years have suggested that more than half of the ads on the Internet never even make it to the visible rectangle of someone’s screen. For 20 years, people have been paying for ads that, far from being shown to the one person most susceptible to their charms, have been shown to literally no one. Video ads, until very recently, qualified as “seen” even if they played in a hidden tab, with the sound off, or below the fold. The industry that we assume is watching us all the time has only just come up with a working definition for when an ad is “viewed.”

Right now, though, the industry is finally starting to catch up to its omniscient image. An association of online advertisers has declared 2015 the official “Year of Transition” as publishers and marketers try to figure this all out, but they will figure it out soon. The technology is in place to watch users as we assumed we’d been being watched all along. Chartbeat, for instance, runs lightweight JavaScript programs on its clients’ websites to record, every 15 seconds, where our cursor is on the screen, how often we scroll down the page and a host of other “engagement” metrics. The industry as a whole — publishers, marketers, advertisers and measurement companies — will presumably agree on the best way to use that tracking technology in the next couple of years, and start buying and selling ads based on the metrics it can measure. If that happens, the dream will get a lot closer to coming true.

The days of the cookie and its intentional privacy features (or tracking flaws) may be numbered, too. Right now, its fallibility makes us harder to count and harder to track, but it might become obsolete as browsers stop accepting third-party cookies, more and more users switch to the mobile Web, and persistent logins (like Facebook’s) become more widespread. Mobile devices have persistent identities — the generic MAC Address, Android_ID for Android devices, and the unsubtly named Identifier for Advertisers on Apple devices — which let marketers tie a single device to a single user, and Verizon Wireless has even been quietly inserting a “Unique Identifier Header” (essentially that online driver’s license) into the Web traffic of its subscribers for at least two years.

But for now, at least, Lou Montulli’s cookie is still doing its job, serving as a kind of passive privacy shield. Its virtue is its impermanence, giving us a small escape hatch out of the economy that it helped create. Third-party cookies have the air of the nefarious, but they’re grainy black-and-white security cameras stuck in a corner. We’re on the cusp of the HD era, about to enter the sci-fi surveillance world we thought we’d been living in all along.

Or at least, to ratchet down the paranoia, a world where we can say, for sure, how many people visited this page.

Yohan likes this

13 Jul 18:25

On the Street…La Fortezza, Florence

by The Sartorialist

Adam Victor Brandizzi, Molly.ortiz and one other like this

13 Jul 18:25

Don’t do the Wilcoxon

by Andrew

no_wilcoxon

The Wilcoxon test is a nonparametric rank-based test for comparing two groups. It’s a cool idea because, if data are continuous and there is no possibility of a tie, the reference distribution depends only on the sample size. There are no nuisance parameters, and the distribution can be tabulated. From a Bayesian point of view, however, this is no big deal, and I prefer to think of Wilcoxon as a procedure that throws away information (by reducing the data to ranks) to gain robustness.

Fine. But if you’re gonna do that, I’d recommend instead the following approach:

1. As in classical Wilcoxon, replace the data by their ranks: 1, 2, . . . N.

2. Translate these ranks into z-scores using the inverse-normal cdf applied to the values 1/(2*N), 3/(2*N), . . . (2*N – 1)/(2*N).

3. Fit a normal model.

In simple examples this should work just about the same as Wilcoxon as it is based on the same general principle, which is to discard the numerical information in the data and just keep the ranks. The advantage of this new approach is that, by using the normal distribution, it allows you to plug in all the standard methods that you’re familiar with: regression, analysis of variance, multilevel models, measurement-error models, and so on.

The trouble with Wilcoxon is that it’s a bit of a dead end: if you want to do anything more complicated than a simple comparison of two groups, you have to come up with new procedures and work out new reference distributions. With the transform-to-normal approach you can do pretty much anything you want.

The question arises: if my simple recommended approach indeed dominates Wilcoxon, how is it that Wilcoxon remains popular? I think much has to do with computation: the inverse-normal transformation is now trivial, but in the old days it would’ve added a lot of work to what, after all, is intended to be rapid and approximate.

Take-home message

I am not saying that the rank-then-inverse-normal-transform strategy is always or even often a good idea. What I’m saying is that, if you were planning to do a rank transformation before analyzing your data, I recommend this z-score approach rather than the classical Wilcoxon method.

The post Don’t do the Wilcoxon appeared first on Statistical Modeling, Causal Inference, and Social Science.

09 Jul 22:03

Misunderstanding the genome: A (polite) rant

by Jonathan M. Gitlin

A recent Ars feature story about genetic screening generated quite a lively debate in the discussion thread. However, it also underlined just how many misconceptions people have when it comes to genetics. Public perception hasn't been helped by scientists overhyping their findings or by inaccurate portrayals in the media (GATTACA, anyone?). So today, I'm going to try to clear some common confusions.

Before moving recently to Ars full time, I spent six years working in the policy office of the National Human Genome Research Institute, the part of the National Institutes of Health responsible for the Human Genome Project (along with the UK's Wellcome Trust). The job gave me a front row seat to the challenge of explaining a horribly complex topic, one where common assumptions are often counterfactual.

Maria Delany's Ars article does a great job laying out how screening at-risk individuals for mutations in a pair of genes—BRCA1 and BRCA2—can spare people from developing cancer. Delany also explains why there isn't unanimity among clinicians about rolling out BRCA testing at the population level. At first glance, such testing seems like a no brainer, right? Testing right now is targeted to at-risk groups, like women with a family history of breast cancer, but studies have found those mutations in people with no family history of the disease. If testing people for BRCA mutations finds them before cancer does, where's the downside?

Read 21 remaining paragraphs | Comments

Yohan likes this

08 Jul 14:42

Starting from scratch: How do you build a world-class research lab?

by John Timmer

What does it cost to build a research center from scratch these days? Gerry Rubin, who runs the Howard Hughes Medical Institute's Janelia Research Campus in Virginia, estimated that his organization will spend a few billion dollars before it's clear if HHMI's research will work out. Ken Herd, who helped set up GE's new research center in Rio de Janeiro, said the building alone carried a $150 million bill.

But a steep pricetag is merely the start. While securing funds is a massive initial barrier for any new facility, a modern world-class lab also needs the right combination of appeal for researchers, planning, and flexibility for when said planning doesn't work out. And on top of that, would-be lab builders better start out with a lot of institutional support.

Supporting a new history

These days, many research centers are outgrowths of something that already exists. For example, in response to a state bioscience initiative, organizations like the Mayo Clinic, Scripps Research Institute, and the Max Planck Institute opened research centers in Florida. In these cases, there's already strong institutional support for research, and the organization is largely transplanting an existing research model to a new location.

Read 27 remaining paragraphs | Comments

08 Jul 14:32

In Russia, selfie takes you, prompts official “safety selfie” warning

by Sam Machkovech

Edenovellis
Mind yourself

On Tuesday, Russia's Ministry of Internal Affairs unveiled a new public health program to reduce deaths and injuries caused by people taking dangerous selfie photos. The "Safety Selfie" campaign warns Russian citizens that their "health and life" are worth more than "a million likes on social networks."

Included in the campaign is a series of cartoon warning signs that depict dangerous scenarios in which someone might take a selfie. Some of those, like the ones posted above, include: posing in front of oncoming cars and trains; taking a selfie while steering or riding in a vehicle; pulling off a sick boat maneuver; standing very close to an uncaged tiger; posing with a pistol; and falling down a cliff or some stairs.

Sadly, some of those cartoon images mirror actual selfie incidents in which Russians have gotten hurt, including a teen who died after falling from a bridge mid-selfie in 2014 and a woman who accidentally shot herself in the head in May. An Associated Press report on the Russian campaign alleged that "at least 10 Russians have been killed and 100 injured" due to a rise in people taking risky selfies, but it didn't cite where that statistic came from, and the Safety Selfie campaign doesn't include that figure.

Read 1 remaining paragraphs | Comments

02 Jul 14:14

Chicago Netflix customers: Your bill is about to go up 9 percent

by Cyrus Farivar

Starting Wednesday, the city of Chicago’s new "cloud tax" went into effect: it imposes a 9-percent tax on "patrons of amusement," including those services that are "delivered electronically."

In short: Netflix users in Chicago will be paying a little extra for their subscriptions pretty soon.

"We will be adding it to the cost we charge subscribers," Anne Marie Squeo, a Netflix spokeswoman, told Ars in a statement. "Jurisdictions around the world, including the US, are trying to figure out ways to tax online services. This is one approach."

Read 16 remaining paragraphs | Comments

30 Jun 16:45

Waxy O'Connor's Will Bring Its Brand of 'Modern Irish Bar' to Brookline This Fall

by Jacqueline Cain

There's also a Kingston outpost in the works.

The former Mission Cantina restaurant at 1032 Beacon St. in Brookline is finally getting a new restaurant: "modern Irish bar" Waxy O'Connor's will open an outpost there, hopefully by October, the company's director of operations Buzz Karnbach confirmed.

Boston Restaurant Talk noted that that was the rumor; a Chowhound user who was interested in the active construction there posted the news yesterday, and BRT noticed Waxy's website indicates a Brookline site is coming soon.

The blog also noted a Kingston Waxy's is on its way; Karnbach said that location is targeting a September debut, making the Brookline restaurant the eventual fifth Waxy's in Massachusetts. The Foxboro-based company currently has outposts in that town, as well as Woburn and Lexington. It also has branches in Plainville, Conn.; Keene, N.H.; San Antonio, Texas; and its original location in Ft. Lauderdale, Fla., which opened in 1997.

The existing Waxy's locations offer a typical Irish pub menu, with entrees like Guinness beef stew, lamb shepherd's pie, and baked haddock; appetizers including Irish nachos, calamari, and Irish sausage rolls; plus burgers, pizzas, and salads. The company website notes 2015 is a big year for Waxy O'Connor's, with the two news Massachusetts locations on the way as well as a second Connecticut outpost coming to Norwalk. The company adds it will "develop our new food and drink ranges across all of our bars" this year. Karnbach notes the Brookline and Kingston menus have yet to be confirmed.

22 Jun 14:40

Searching through the Years

Edenovellis
Turns out you can download your entire Google search history and analyze it....

19 Jun 20:34

The lessons of famous science frauds

19 Jun 14:18

Caffeine could limit damage of chronic stress

by Roheeni Saxena

Edenovellis
Drink more caffeine

During periods of chronic stress, we often up our caffeine consumption. This works better than you might expect—the increase can reduce some of the negative effects of long-term stress, including depression and memory deterioration. In a new study published in PNAS, researchers dug further into this finding, examining the signaling networks that caffeine influences within the brain. One of the proteins they identify is a potential treatment target for the symptoms of long-term stress.

Chronic unpredictable stress alters neural circuits in the hippocampus. It dampens mood, reduces memory performance, and increases an individual’s susceptibility to depression. The researchers studied this phenomenon in mice by exposing them to chronic, unpredictable, long-term stress in a variety of forms: cage-tilting, damp sawdust, predator sounds, placement in an empty cage, switching cages, and inversion of day/night light cycles. Just like humans experiencing chronic stress, the mice showed weight loss and memory deterioration. The mice also demonstrated helplessness and loss of interest in stimuli, which are markers of depression in mice.

After being chronically stressed, the mice were exposed to caffeine in their drinking water. As expected, caffeine reduced the mice’s depressive symptoms. Additionally, it improved the memory impairment in these mice, measured via recall of maze-based problem solving and object displacement.

Read 5 remaining paragraphs | Comments

18 Jun 20:58

Stepson of Stuxnet stalked Kaspersky for months, tapped Iran nuke talks

by Dan Goodin

Not long after blowing the lid off a National Security Agency-backed hacking group that operated in secret for 14 years, researchers at Moscow-based Kaspersky Lab returned home from February's annual security conference in Cancun, Mexico to an even more startling discovery. Since some time in the second half of 2014, a different state-sponsored group had been casing their corporate network using malware derived from Stuxnet, the highly sophisticated computer worm reportedly created by the US and Israel to sabotage Iran’s nuclear program.

Some of the malware's stealth capabilities were unlike anything Kaspersky researchers had ever seen, and in many respects, the malware was more advanced than the malicious programs developed by the NSA-tied Equation Group that Kaspersky just exposed. More intriguing still, Kaspersky antivirus products showed the same malware has infected one or more venues that hosted recent diplomatic negotiations the US and five other countries have convened with Iran over its nuclear program. Also puzzling: among the other 100 or fewer estimated victims were parties involved in events remembering the 70th anniversary of the liberation of the Auschwitz-Birkenau extermination camp.

Developers planted several false flags in the malware to give the appearance its origins were in Eastern Europe or China. But as the Kaspersky researchers delved further into the 100 modules that encompass the platform, they discovered it was an updated version of Duqu, the malware discovered in late 2011 with code directly derived from Stuxnet. Evidence later suggested Duqu was used to spy on Iran's efforts to develop nuclear material and keep tabs on the country's trade relationships. Duqu's precise relation to Stuxnet remained a mystery when the group behind it went dark in 2012. Now, not only was it back with updated Stuxnet-derived malware that spied on Iran, it was also escalating its campaign with a brazen strike on Kaspersky.

Read 44 remaining paragraphs | Comments

15 Jun 15:01

Because there is no observable certainty other than the existence of thought

by Andrew

Someone who is teaching a college philosophy class writes:

We discussed Descartes’ Meditations on First Philosophy last week — specifically, concerning the existence of God — and I had students write down their best proof for God’s existence in one minute, independent of their beliefs. Attached is a particularly funny response:

Another good one was the blank sheet of paper that a student handed in…

The post Because there is no observable certainty other than the existence of thought appeared first on Statistical Modeling, Causal Inference, and Social Science.

12 Jun 16:27

Nathalie Lawhead

Edenovellis
tetrageddon.com is a trip

Nathalie Lawhead

Who are you, and what do you do?

I started out as a net artist in the 90's. I remember when the World Wide Web was still a new thing, and I had to explain to people what that is, and why it's the future. Now I laugh, but I was told often that it's a niche, it will die, and I'm ruining my future. I love being right. :)

In those days I was making net art. I loved the experimental nature of the internet as art. I still do. I was fascinated by the fact that art can be intelligent, move, react, and anybody in the world can see my work. Museums, or galleries were no longer in charge of an artist's success.

Eventually people started calling what I do games. I absolutely hated the label. Not so much because of "snobby artist syndrome" but because I saw how people got confused. Often I would hear them complaining that "this game doesn't make sense", "what am I supposed to do?". "Game" meant that there was a goal, mission, or clear agenda. This is really the opposite of art. Where you are required to sit down and form your own conclusions about it. To me, art is about a conversation between the observer and art. It speaks to you, which requires a lot of contemplation.

The label "game" seemed to come with a lot of conservative expectations. It caused more confusion than it was worth. Looking back, I think a lot has changed in how people perceive games. People have often said that "games are art", but for something to be art we have to be willing to (as consumers) actually perceive it as art. Approach (or interact with) it the same way as we approach art. We have to be open minded.

I think it is amazing that we're at a point where games are exactly that, and this is thanks to players who are willing to experience something totally new, experimental, and creative. I'm also at the point where I'm very proud to call myself a "game designer" over an "artist".

You can see my work at tetrageddon.com

What hardware do you use?

I have your classic setup of desktop, and mobile. I've moved to an Apple setup, but I used to be all about PC. I miss it.

I'd love to actually say that I use all sorts of cool shit, and interface with satellites, and drop the word "cloud" a few times for extra credit, but my setup has always been very basic.

I think with enough determination you can make amazing work on, or with, anything. There's this really cool GDC talk about "A Year of Constraints", by MsMinotaur, and it's about how limitations improve development, and motivate you. I couldn't agree more! Money is always an issue, as any indie dev knows, and I often used outdated software or old machines to work from. I ended up creating some really cool visual styles from working around all those constraints. I think the biggest asset you can have is creative determination. Restrictions encourage out of the box thinking, and you have to try something different. It makes it hard to copy trends.

So, I think, high-end tools are really the last thing on the list for making something good.

The last PC I had was a really old one that used to overheat so I took the case apart and had a room fan blowing on it so it wouldn't constantly shut itself off or crash. It still did, but I got into the habit of saving my work every thirty seconds. Good times!

Now I have a nice iMac, but I still miss that machine. Not that I'm against high-end gadgetry. I want the latest and greatest just as much as anyone else, but it's not necessary.

And what software?

Xcode, Final Cut Pro, Adobe Creative Cloud (Edge, Photoshop, Flash, After Effects), and lots of open source stuff. I use AIR a lot. I love HTML5, and love Flash even more. They work really great together. I use FL Studio for music and sound.

I enjoy working with anything AI. I've used Program E, AIML, a lot for silly little extras. I'm hoping to have time to set up Caffe (deep learning framework) and incorporate that somehow.

Things I use/do almost daily is PHP, Javascript, MySQL, HTML, ActionScript.. It's all so fun. I love it.

What would be your dream setup?

Dream would be some way of zapping ideas straight from my brain into the computer. Like I think of it and the game is just done. I have so many ideas but all this menial "getting it done" stuff takes a lot of time.

Ok. Something less sci-fi..

I really need to get back to PC so I can develop for it a lot better. To make games, you also have to play them, so a high end PC is on my list of things to get when I become rich.

I've also been trying to get into the PlayStation developer program. It would be amazing if I could get my hands on their devkit.

Also, would like to develop for VR..

Now that I'm thinking about it, it would be great if someone just came along and dumped a whole bunch of hardware on me (including the PlayStation devkit).

Although, what I have now is pretty good too. :)

11 Jun 15:21

Elon Musk’s Hyperloop is actually being built in California next year

by Sebastian Anthony

Hyperloop concept art from HTT

11 more images in gallery

It beggars belief, but it appears that Elon Musk's Hyperloop is actually going to be built. The first test track will only be five miles long, and it won't operate at the supersonic speeds that Musk envisioned, but still, it's coming—Musk's "cross between a Concorde, railgun, and an air hockey table" really is coming.

Back in January, Elon Musk said that he planned to build a Hyperloop test track "soon" and that Texas was "the leading candidate." Curiously, nothing more has been said by Musk on the matter since. Then, in February, Hyperloop Transport Technologies (HTT)—an organisation that is unaffiliated with Musk—said that it had struck a deal to build a five-mile Hyperloop in California.

HTT is a research company that was founded soon after Musk's original Hyperloop thesis was published in 2013. The structure of HTT is somewhat interesting: it has employees, but it also uses crowdsourced engineering talent from across the US that is being paid in stock options. The CEO is a guy called Dirk Ahlborn, who founded JumpStartFund—an online platform that facilitates with building crowd-powered projects; basically, he took his own service and used it to build HTT.

Read 8 remaining paragraphs | Comments

Edenovellis

Shared posts

History

Content-based filtering

Collaborative Filtering

Current Approach

Overview

Part 1: How to model an article based on its text.

Part 2: How to update the model based on audience reading patterns.

Part 3: How to describe readers based on their reading history.

Conclusion

Supporting a new history

Who are you, and what do you do?

What hardware do you use?

And what software?

What would be your dream setup?