Shared posts

21 Apr 02:07

Girl Groups

This is a reaction to parts of three evenings watching Coachella 2023 and motivated by the fact that more or less all of the music that grabbed me featured females. This notably included Blackpink, the biggest Girl Group in the world. But they weren’t the best women artists there, not even close. Spoiler: boygenius was.

Part of the problem was that every time I switched to a Coachella channel with male names on it, the males were either bobbing their heads over a DJ deck, or banging out the hip-hop. I.e. few musical instruments or vocal melodies were involved. I have no patience whatever for EDM and maybe one hip-hop track in ten grabs me. Usually that track turns out to be Drill which, I read, is bad music by bad people and I shouldn’t like it. Oops.

Please don’t be objecting to the term Girl Group around me. You wanna diss the Marvelettes? The Supremes? The Ronettes? Labelle? Destiny’s Child? To the extent that a business as corrupt as Music can have proud traditions, Girl Groups are one of those.

Not all men

OK, a few weren’t terrible. FKJ was OK, his set featured a big sofa that he and the accompanists relax on and it was that kind of music. The tunes were fairly generic but the playing was good and the arrangements were surprising and often excellent.

And then there was Tukker of Sofi Tukker; he’s the less-interesting half of that outfit but he’s still pretty interesting. Plus their music was good, the instrumentation was surprising, and they have lots of charisma, I’d go see them. They were an example of a distinct Coachella Thing this year: black-and-white outfits, in particular flowing white outfits, an angelic aesthetic.

Weyes Blood, new to me, was also definitely leaning into the angelic thing. The music had no sharp corners but took some surprising curves, and was pretty, which there’s nothing wrong with.

Coachella was willing to take big chances, including bands that I thought were at best marginally competent, in terms of being in tune and in sync. I’m totally OK with bands going Outside The Lines when driven by passion or musical invention but this wasn’t that, it was just basic rock or whatever played at medium speed, badly. I think this was a conscious low-rent aesthetic? Not naming names. Not gender specific.

More women I hadn’t heard of

Ashnikko (apparently erupted outta TikTok) puts on a good show, loads of charisma.

Christine and the Queens is French and extremely intense in ways I liked.

When I switched over to Kali Uchis she was rapping and had the usual complement of twerky dancers — is it just me or do they all have the same choreographer? I was about to switch away and then she switched to to singing and wow, she’s really good.

Saturday night

Why I’m writing this is, I watched the boygenius and Blackpink sets and was left shaken. Let’s flip the order and start with Blackpink. These are screen-caps.

Blackpink on stage at Coachella 2023 Blackpink on stage at Coachella 2023

I had never previously managed to watch a whole K-pop set because it’s so formulaic and boring. But Blackpink kept my attention if not entirely my affection. The choreography and attention to detail is awesome, mesmerising. The music is meh. The beauty is crushingly conventional but also crushing in its intensity. I felt like a Beauty Beam was coming off the screen, bending back all the retinas it impacted.

You don’t have to look very hard to read terribly sad stories about the life of a K-pop star. They sign ten+ year contracts with the Big Company (those years starting when they get big) by which the company gets 80% of the revenue and the band splits the rest, after paying back the BigCo for their training and promotion. And there were a couple of moments when one of the four was in a choreography dead zone and for just an instant wore an expression of infinite fatigue.

To be fair, their technique was dazzling. They had an actual band with actual musicians, although the intermittent lip-syncing wasn’t subtle. And when they stopped to chat with the crowd (in fluent English) they seemed like real people. Just not when singing and dancing.

You know what’s weird? I’m a heterosexual male and they were dressed in these “daring” suits with lots of bare flesh showing, but even with all that beauty, they weren’t sexy at all.

Anyhow, I ended up respecting them. They delivered. But still, that’s enough K-Pop for another decade or so.

The Boys Are Back In Town

That’s a dumb old rock song by Thin Lizzy, and it was the soundtrack for boygenius’s walk onstage. You see, they’re smart and not a second of the set, start to end, was in the slightest disposable. There’s a lyric from their song Without You Without Them: I want you to hear my story and be a part of it. They mean it.

boygenius on stage at Coachella 2023 boygenius on stage at Coachella 2023

Their Coachella set was messy, chaotic, and, I thought, magnificent. The mix wasn’t all that great and some of the lyrics were lost, a real pity with this band. But out of the chaos there kept coming bursts of extremely beautiful melody, exquisite harmony, and lyric fragments that grab your brain and won’t let go.

The songs are about love mostly and are romantic and arrogant and pathetic and many visit violence, emotional and physical too: When you fell down the stairs / It looked like it hurt and I wasn't sorry / I should've left you right there / With your hostages, my heart and my car keys — that’s from Letter To An Old Poet.

Their faces aren’t conventional at all, but so alive; I couldn’t stop watching them. Julien Baker in particular, when she digs into a song, becomes scary, ferocious. But they each inhabit each song completely.

Also memorable was their excellent Fuck-Ron-DeSantis rant.

Anyhow, at the end of the set, they were off their feet, rolling around together on the stage while Ms Baker shredded, just unbelievably intense. Always an angel, never a God they sing, over and over, but there were no flowing white garments — they were wearing mock-schoolboy outfits with ties — and something divine seemed in progress.

Back story

If you haven’t heard it, drop by their Wikipedia article and catch up; it’s interesting. Particularly the gender-related stuff.

My own back story was, I liked Phoebe Bridgers but hadn’t really picked up on boygenius. Then earlier this year, my 16-year-old daughter had a friend with a spare concert ticket and could Dad cough up the money for it? I’m an indulgent parent when it comes to music so she got her ticket.

Pretty soon thereafter I noticed the boygenius buzz and tried to get a ticket for myself but they were long-sold-out by then. Just not hip enough.

Oh well, I won’t forget that Coachella show any time soon.

“Girl Group”?

Per Wikipedia, it means “a music act featuring several female singers who generally harmonize together”. By that metric boygenius is way ahead of Blackpink, who do harmonize a bit but it’s not central to their delivery. On the other hand, if we traverse the Wikipedia taxonomy we arrive at “dance-pop girl groups” and Blackpink is definitely one of those, they dance like hell.

Look, boygenius obviously are Women with a capital “W”. But they’re subversive too, I bet if you asked ’em, they might gleefully wave the Girl Group banner.

21 Apr 02:03

AirPods Pro 2: Perfection

by Volker Weber

I was in a small group of people who checked the AirPods Pro before they were officially announced on October 29, 2019 at 13.00 CET. A year later I was once again blown away by the AirPods Max. Those are still the best headphones I have.

The AirPods Pro became the headphones of choice for lots of people, rightfully so. But when the second generation AirPods Pro came out, I was lagging. After three years, my original AirPods Pro started rattling and Apple had replaced the two earbuds while sending me back the original case. I was good to go for another three years, and besides, could Apple really improve on the AirPods Pro that much? As it turns out, I was wrong.

Fast forward to 2023 and I have now used the AirPods Pro 2 for two months, and they have noticeably improved on all fronts. The transparency mode is even better than the already excellent original AirPods Pro, they filter out more ambient noise, they sound fuller, and they have little benefits I missed on the original. You can finally set the volume without speaking “Hey Siri louder” into thin air. The case beeps when it wants to provide feedback or when you are searching for it, and finally, it warns you through the Find My app when you leave them behind, much like the AirPods Max.

I have many headphones, but if I could only have one, this would be it. With voice isolation, you can even use them for phone calls.

21 Apr 02:02

Once in Perugia, always in Perugia

by Gregor Aisch
Photo: Luca Vanzella, CC BY-SA 2.0

Hi, this is Gregor, co-founder and head of the data visualization team at Datawrapper, with a Weekly Chart written on a night train somewhere between Austria and Italy.

After a long pause, I am very happy to be returning to Perugia for the International Journalism Festival (IJF). On the train ride to Italy, I was browsing this year’s festival program and saw many new speakers and faces I recognized from previous years!

This got me wondering how much fun it would be to see who spoke most at the conference in the past. Fortunately, the festival website lists all speakers since 2012. So without further ado, here’s the list of the most frequent speakers at the IJF:

You may be wondering why this list only includes international speakers. It’s a choice the festival made for the program as well, which initially excludes Italian speakers on its English-language website. You can click the link above the table to switch to a version that includes Italian speakers.

While there are only two women among the ten most invited speakers at the conference, it’s worth noting that the overall diversity of the IJF seems to have gotten a lot better in recent years, with almost 60% of this year’s speakers being women compared to 24% in 2012!

At Datawrapper, we’re big fans of the festival, not just for the talks and panels and the amazingly beautiful location, but also for the chance to meet so many of our users. If you’re around, drop us a note or leave a comment if you’d like to say hi.


See you in Perugia — or next week for the first Weekly Chart from our product specialist, Guillermina.

21 Apr 02:02

Google Fi Is Now Google Fi Wireless: New Benefits, Free eSIM Trial

by Ronil
Google refreshes Fi with a new name, logo, and perks. The company’s wireless carrier started as Project Fi, then became Googe Fi. And today, it’s getting another name change, becoming Google Fi Wireless.  The logo didn’t get a complete overhaul and still has the aesthetics of the old Google Fi logo. Instead of four bars […]
21 Apr 02:01

Twitter Favorites: [tomhawthorn] @sillygwailo Too soon.

Tom Hawthorn 🇺🇦🇨🇦 @tomhawthorn
@sillygwailo Too soon.
21 Apr 01:57

Designing The Community-Powered Customer Support Hub

by Richard Millington

The famous flaw in most enterprise communities is the majority of people don’t visit to join a community, they visit to solve a problem.

This is why the majority of contributors to a community have made precisely one post.

They asked a question, received an answer (or didn’t), and then left never to return. The vast majority of attempts to shift these numbers have failed miserably. This audience simply doesn’t want to connect, share, and ‘join the conversation’. They want to solve their problem and be on their way.

The challenge facing most enterprise community professionals is less about trying to nurture a sense of community amongst members and more about building the right support experience around the community – where community plays a critical role in supporting every other support channel.

Enter the community-powered customer support hub.

 

 

The Five Layers Of The Customer Support Hub

 

The trend in recent years has been fairly clear, organisations are creating integrated support hubs which include knowledge bases, federated search, virtual agents, community, customer support tickets, and product suggestions.

This is slightly different from a customer portal which handles a broader range of use cases than support (e.g. it might include the academy, events, product updates etc…)

To get the support hub right, you need to execute extremely well on five distinct layers. These are shown here.

Five layers of the support hub are knowledge base, virtual agents/search, community/social media, customer support, and product suggestions.

We can tackle each layer in turn.

 

 

The Foundation Layer: The Knowledge Base

 

Pretty much every organisation has a knowledge base for customers to browse. The purpose of this knowledge base is to preempt the majority of questions customers are likely to have and help customers follow best practices in the setup and usage of products.

Whenever a customer searches for an answer to a question, the knowledge base is often the first place they visit. A great knowledge base will have articles showing up in search results, in chatbot recommendations, in responses to community questions, and be referenced by support staff.

The cheapest way to solve a customer query is in the knowledge base. This can reduce 80% or more of questions from downstream channels.

(Aside, this is also the problem of measuring activity levels in a community, the community should inform the knowledge base – which in turn should reduce the number of questions in the community.)

However, the knowledge base has to perform a tricky juggling act. It should be comprehensive enough to resolve the 20% of queries which represent 80% of the volume of questions. But it shouldn’t aim to be so comprehensive to try and tackle every query.

This becomes impossible to maintain and is overwhelming for members. The knowledge hub needs to have well-maintained content that’s refreshed regularly. It also needs to archive out-of-date articles.

A great knowledge base aims to maintain a smaller number of articles up to an incredibly high standard. For other queries, there are other channels.

An excellent example of this is the Asana knowledge base.

Notice the clean interface, great use of white space, clear search bar, and collapsible navigation menus on the left-hand side so as not to be overwhelming.

The tabs are well categorised by likely challenges and by problems members will potentially encounter. All of this makes navigation simple.

Asana seems to have taken a less is more approach – maintaining a fewer number of high-quality articles instead of tackling every eventuality. Developer articles have also been smartly separated from other articles.

The knowledge base should always be receptive to what’s happening in a community (and other channels). If the same question repeatedly appears in the community, it needs to be tackled as a knowledge hub article (don’t forget to update community discussions with the link to the knowledge article). Likewise, if an article in the knowledge base isn’t getting much traffic, it might be time to archive the article.

Ultimately, the knowledge base is the foundation layer of support. If it’s well-designed and implemented, it makes the entire support experience much better for everyone.

 

The Search Layer

This is the layer where people are on the site and search for the information they want to find.

This comes in the form of search bars and chatbots.

Unified Search 

Traditionally this was by a search bar which, like Google, would often retrieve the result which best matched the query.

The biggest problem with this is organisations frequently use different platforms for the knowledge base, documentation, academy, community, etc…But the search tool used was often limited to retrieving information from a single database. In recent years, organisations have shifted to using unified (Federated) and cognitive search tools.

These tools can retrieve results from multiple databases and enable the organisation to assign weightings to different results to skew where customers are most likely to visit as needed. This means the results can be retrieved from the corporate site, dev community, webinars, roadmap, pricing, knowledge base and more. This has a big impact on reducing knowledge silos within the community.

A good example of this is the Logitech support centre.

A screenshot of the Logitech Support Center

You can see here that relevant results are retrieved from the knowledge base, product documentation, downloads, and the community. When it works well, it gives members a single box to enter a question and find what they want.

At the moment, many organisations still don’t deploy a federated search tool – this has a hugely negative impact on the customer experience who must then visit multiple places to find the answers they need. Note: Documentation is not the same as a knowledge base.

 

Chatbots

In the past, chatbots were rudimentary tools that operated on decision trees that relied heavily on keywords. By following a set process, the chatbot would either provide you with the information you were seeking or guide you to the next step in your journey

An example of this in practice is the Logitech Chat Bot below:

A screenshot of the logitech chat bot in action

You can see above that it’s still following a fairly basic decision-tree format to try and guide the individual to the right answer. It’s becoming increasingly common for chatbots to act as a screener to solve a problem before redirecting the question to a virtual support agent (skipping the community entirely).

Recent incarnations (up to 2023) were far more advanced and used a combination of natural language processing to be able to ask questions, check what has been attempted before and try to guide someone to the right solution.

The biggest question is how quickly ChatGPT (or ChatGPTesque) will be incorporated. This will greatly enable higher quality levels of interaction and the ability for members to get detailed answers specific to their questions. If this works well, it should significantly reduce the number of challenges which drop through to the next level.

A community also supports this process by providing a huge amount of training data to process. The more community questions there are, the better the AI bot will be able to surface the right answer for members. Over time this should lead to fewer and fewer questions reaching the community.

As we’ve written before, ChatGPT and similar tools thrive when customers either don’t know what terms to search for or can’t browse mountains of information to find what they need. They fail, however, when the customer needs the best solution to a problem, needs a workaround (edge case), or is looking for personal experiences (e.g. would you recommend [vendor?]).

 

 

The Community Layer

 

The community and social media layer is where customers go to ask the in-between questions.

These questions aren’t so easy they can be solved by existing documentation, but don’t require the customer to reveal personal information to get a resolution.

Generally speaking, the success of a community hangs upon how many in-between questions people have.

One of two things happens at this layer.

First, people ask the question in a search engine and they land on a question which has already been asked in the community. This typically accounts for the majority of traffic in most communities.

Second, if they don’t find the answer to their question, they might ask the question themselves in the community.

By community, we’re not just referring to a hosted community but any place where members can interact with other members to get an answer. This includes social media and third-party platforms (Reddit, StackExchange, YouTube, Twitch etc…).

The community layer should resolve as many of those in-between questions as possible. It should also be used to inform other layers. It should highlight questions which should be tackled by knowledge articles, provide on-site search and chatbots with answers to the surface, and provide support agents with ideas they can try to resolve the customer issue.

Atlassian (a FeverBee client), is probably one of the best examples of this today.

A screenshot of the Atlassian community

This doesn’t mean a community is exclusively for in-between questions, there are plenty of people who simply prefer a community compared to filing a support ticket. A community helps reinforce the innate desire for self-service.

There are also plenty of use cases a community offers which don’t involve Q&A (user groups, events and activities etc…).

 

 

The Customer Support Agent Layer

 

The next layer is where a human support agent is involved.

In my experience, organisations can take one of two approaches.

The first is they want to reduce the number of calls which reach support agents as much as possible. Sometimes the cost of resolving an issue can reach hundreds of dollars per call. This means there is a huge benefit from resolving these issues in the community.

The second is they want to have as much time with customers as possible. In this approach, the goal isn’t to get customers off the phone as quickly as possible but to use the opportunity to build a better relationship with them and understand their needs.

In my experience, the former is more common than the latter – but some truly customer-centric organisations excel here. If your scaled support system is implemented correctly, your customer support team should only be tackling questions which either:

    1. Require a member to share personal data to be able to resolve.
    2. Are too complex or rare for the community to resolve (edge cases).
    3. Are from the highest value customers paying for a dedicated support agent.
    4. Are from customers who have a natural preference for support agents (often older customers).

 

Whenever community support questions require Personally Identifiable Information (PII) from the customer and are moved to a private support ticket for resolution, it’s critical to also provide an update on the original community thread.

This approach prevents the interaction from appearing to have abruptly halted with a move to a private support channel, which could leave the community feeling uninformed, probably resulting in more support tickets about the topic, which is counterproductive to the self-support model.

Ultimately, whether by support ticket or by phone call, the goal is to route the question to a person with the right answer as quickly as possible. The customer support layer is primarily the catch-all for issues which can’t be resolved anywhere else. You want your paid staff to be tackling the toughest questions for the highest value customers. 

Ideally, the customer will be able to file a support ticket via the same destination as they can search for information, ask the community, engage with a chatbot etc…

 

 

The Product Ideas Layer

 

The final layer is the product suggestion (or ideas) layer.

It’s relatively common for customer support staff to recommend a customer post any problem which can’t be resolved as a suggestion for a product enhancement. This is a common route to suggesting an idea but it’s not the only route.

Often customers will simply want to suggest an idea without having to begin with an initial problem. Regardless, the product ideas layer is the next. It aims to be the final place to offer customers hope for a solution when any other channel fails.

A good example of this is the DigitalOcean community (hosted by Pendo).

A list of ideas posted on the DigitalOcean Community

You can see here it’s easy to add the idea and track its progress over time. The big danger of having an ideation area, however, is when the ideas aren’t utilised. 

This quickly becomes seen as a dumping ground which upsets members. You should only implement an ideas or suggestions layer when there is a product team eager to receive, filter, and provide feedback on the ideas. If that’s not in place, don’t offer ideas.

 

Build Your Community-Driven Support System

 

Customers are like water following the path of least resistance. Whichever pathway seems like the easiest (or quickest) way to resolve their challenge is the one they will take.

If that means asking a question in a community, they will happily do so. If that means searching for an answer in the search bar, they will do it. If that means filing a support ticket, they will do that too.

But here’s the rub, customers don’t want to visit five different places to do those things. They want to fuss around logging into different systems. They want all of these things to be accessible to them in a single destination. They want to achieve their goals with the least amount of effort. They don’t want to ask the same question across multiple channels to get an answer.

The challenge for community and support professionals is to build a seamless support hub.

A common mistake by community professionals right now is to focus on increasing less important areas of the community experience (e.g. gamification, groups etc..) at the expense of the areas which would immediately add more value to people looking for answers to problems (better integrations, search optimization, question flows).

The future of enterprise communities isn’t to build the most perfect standalone destination but to integrate the community as deeply into the support experience as possible.

The post Designing The Community-Powered Customer Support Hub first appeared on FeverBee - Community Consultancy.
21 Apr 01:54

Data analysis with SQLite and Python for PyCon 2023

I'm at PyCon 2023 in Salt Lake City this week.

Yesterday afternoon I presented a three hour tutorial on Data Analysis with SQLite and Python. I think it went well!

I covered basics of using SQLite in Python through the sqlite3 module in the standard library, and then expanded that to demonstrate sqlite-utils, Datasette and even spent a bit of time on Datasette Lite.

One of the things I learned from the Carpentries teacher training a while ago is that a really great way to run a workshop like this is to have detailed, extensive notes available and then to work through those, slowly, at the front of the room.

I don't know if I've quite nailed the "slowly" part, but I do find that having an extensive pre-prepared handout really helps keep things on track. It also gives attendees a chance to work at their own pace.

You can find the full 9-page workshop handout I prepared here:

sqlite-tutorial-pycon-2023.readthedocs.io

Screenshot of the handout. Data analysis with SQLite and Python, PyCon 2023

    What you’ll need
        python3 and pip
        Optional: GitHub Codespaces
    Introduction to SQLite
        Why SQLite?
        First steps with Python
        Creating a table
        Inserting some data
        UPDATE and DELETE
        SQLite column types
        Transactions
    Exploring data with Datasette
        Installing Datasette locally
        Try a database: legislators.db
        Install some plugins
        Learning SQL with Datasette

I built the handout site using Sphinx and Markdown, with myst-parser and sphinx_rtd_theme and hosted on Read the Docs. The underlying GitHub repository is here:

github.com/simonw/sqlite-tutorial-pycon-2023

I'm hoping to recycle some of the material from the tutorial to extend Datasette's official tutorial series - I find that presenting workshops is an excellent opportunity to bulk up Datasette's own documentation.

The Advanced SQL section in particular would benefit from being extended. It covers aggregations, subqueries, CTEs, SQLite's JSON features and window functions - each of which could easily be expanded into their own full tutorial.

21 Apr 01:52

Proposed policy clarifies what's appropriate swimwear at Vancouver's public pools | CBC News

mkalus shared this story :
"Despite a B.C. Supreme Court decision that backed women's right to bear their breasts in public, Digby believes it would be "excessive" in a pool setting." I see the 1950s NPA is back in action. If they push that through I hope they get sued.

British Columbia·New

The Vancouver Park Board is set to vote on a city staff report aimed at tackling inappropriate swimwear at public pools by defining what can and cannot be worn.

Park board commissioner says policy aims to create an inclusive environment for families

The Vancouver Park Board is set to vote on a city staff report aimed at tackling inappropriate swimwear at public pools by defining what can and cannot be worn.

The report follows concern from staff at the city's aquatic centres who have asked for a clear policy to help them navigate situations where patrons have, according to the report, "presented in attire that has had cause for attention, due to various levels of tolerance by both staff and members of the public as to what is acceptable attire for swimming in public aquatic facilities."

City staff say the policy will address safety concerns about swimming outfits that present a risk, adding that swimwear should allow the body to move freely, should not impede buoyancy and should not increase the safety risk to the swimmer or a lifeguard.

In the report, appropriate swimming attire is listed as:

  • bathing suit;
  • swim trunks or board shorts; 
  • T-shirts and shorts; 
  • burkini;
  • swim hijab, leggings and tunic; 
  • rash guard; 
  • and wet suit.

Unacceptable attire, according to the report, includes items designed for sexual or intimate purposes, clothing that absorbs water and becomes heavy, like jeans and sweatpants, and long, flowing fabrics. Swimwear must also fully cover the genitals, the report says.

It defines appropriate swimwear as "what other Canadians find as an acceptable level of tolerance in a family public swimming environment." 

Bare breasts 'excessive' at pools: commissioner

The park board will discuss and vote on the report on April 24.

Commissioner Tom Digby says he's leaning toward voting in favour of the policy.

"It's a complex question of social equity in the city," he said. 

"Because for every person who wants to wear a string bikini, there could be 10 families from some conservative community… that won't go to the swimming pool because they're afraid of confronting a string bikini in the change room, which is a very reasonable concern."

Digby said the city is trying to create an environment that is welcoming to all families.

"There's a lot of communities [that] have fairly conservative standards. There are many cultures here that won't tolerate a lot of exposure," he said.

Despite a B.C. Supreme Court decision that backed women's right to bear their breasts in public, Digby believes it would be "excessive" in a pool setting.

The city of Edmonton amended its topless policy in February, clarifying that all patrons are allowed to swim and lounge at the city's pools without a top on, regardless of their gender identity.

With files from the Early Edition

21 Apr 01:51

On the shift to oat and the milk hysteresis curve

We appear to be at a tipping point to oat milk for coffee, and it’s an interesting case study in what change means and feels like.

I always specify “dairy” when I get my daily coffee, wherever I am. “Dairy flat white” is the usual order.

The reason being that several years, when alt milks were becoming a thing, I was asked what milk I wanted and I said “normal” – at which point I got scowled at because what is normal anyway.

And that made sense to me. And while I believe rationally that being vegan is probably the way of the future, personally I quite like meat and milk, so the minimum viable way for me to sit on the fence is to always specify dairy but refuse to normalise it. So that’s what I’ve done since. My bit for the cause.

(My life is littered with these absurd and invisible solitary commitments. Another one: I will always write the date as “3 April” instead of “April 3” because humanity may one day live on a planet with a really long year and we may want to have multiple Aprils, so better not be ambiguous.)

Anyway, I’m used to the conversation going either like this:

  • Dairy flat white please
  • Ok great

Or:

  • Dairy flat white please
  • What flat white?
  • Dairy
  • We have oat or soy
  • No like cow’s milk
  • Like just normal? Regular milk?
  • Yes
  • Ok right. Flat white then

Rarely - ok just once - I was told off by a shop for specifying “dairy” every day because nobody has oat and, well, they see me every day and they remember what I want.

But that was about 18 months ago.

Recently pushback had decreased, quite a lot and quite suddenly.

So I’ve been idly asking coffee places what their proportion of dairy milk vs oat milk is, when I get my daily coffee, wherever it is.

Near me, in south London, one of my local places is 60-70% oat over dairy (factoring out coffees without milk). Another is 50/50, probably with oat leading by a nose.

That’s the general picture round here.

I asked for a dairy flat white in north London and got the old familiar bafflement. Apparently east London is more alt milk again. There’s a neighbourhood thing going on.


I’ve asked why (at the majority oat places) and nobody really knows. Fashion (one placed suggested); all alt milks are now oat; general awareness. I’ve noticed that places rarely charge extra for alt milk now, that reduces friction.

And then there’s a shift that prevents backsliding:

My (previously) favourite coffee place now tastes too bitter for me. Now, oat milk is sweeter than dairy milk. To keep the flavour profile, you’ll need to make the base coffee itself less sweet. So I swear they’ve changed their blend.

This is interesting, right? We were in a perfectly fine status quo, and it took some energy to change majority milk, but now the underlying coffee has changed, we’re in a new status quo and it’ll take the same energy again to shift back. A hysteresis loop for milk.

So that’s the new normal, yet people still say “regular milk” to mean dairy milk.

“Regular” does not mean, from the perspective of the coffee shop, the majority of their milk-based coffees.

“Regular” means, from the perspective of the customer, the majority of their consumed milk from their lifetime drinking coffee. Which is obviously biased to the past.

So “regular” is a term of conservatism.

Not a right wing or libertarian or fundamentalist conservatism. But a kind of “the default is what we did in the past” conservative. (Which would be a fine position to have, by the way, because I don’t think we give enough respect to wisdom that takes many generations to arrive at, and our current - and sadly necessary - anti-conservatism - because of everything else with which it is currently allied - undermines that position somewhat.)


Anyway so this is how we get old and conservative, I guess, by taking as our yardstick our cumulative individual experience rather than a broader and changing society.

And I could switch to oat milk too, I suppose, given dairy is tasting worse now, but I’m trapped in my own habits, and I like the idea that, over the coming decades, I’ll ascend into a kind of relative savagery, the final person consuming “normal” milk while the world changes around me.

21 Apr 01:50

Saturday Morning Breakfast Cereal - Die on It

by Zach Weinersmith
mkalus shared this story from Saturday Morning Breakfast Cereal.



Click here to go see the bonus panel!

Hovertext:
Later, the robot learns to nod its head and keep the truth inside.


Today's News:
21 Apr 01:38

Re-AI Music

by bob
You got a lot right here. It’s just sad that the gimmicky sound-a-likes is what people’s first impression of AI is. A few examples of tech’s disruption in music: 1. Beatles/Abbey road and the 8-track, and then mellotron (first sampler IMO) 2. Herbie Hancock breaking the rules and using synths in jazz mid 70s with […]
03 Apr 02:55

Rolling-mill Oops

The world being what it is, feels like a little humor is in order. Here’s a story from my misspent youth, when I was a co-op student at a steel mill and had a Very Bad Day.

This story was originally published as a Tweet thread in response to the following:

Tweet from @ElleArmageddo

That thread will probably go down with the Twitter ship and I’d like to save it, so here goes.

During the summer of 1980, my last before graduation, I had a co-op job at Dofasco, a steel mill in Hamilton, Ontario. It wasn’t great, the tasks were mostly make-work, although I kind of liked learning RSX-11M Plus running on PDP-11s. Sure, it was primitive by modern standards (I already knew V6 Unix) but it got the job — controlling the sampling apparatus in a basic-oxygen blast furnace — done.

So, there was this Rolling Mill that turned big thick non-uniform slabs of steel into smooth precise strips rolled onto a central core; this is the form in which steel is usually bought by people who make cars or refrigerators or whatever. The factory had lots of rolling mills but this one was special for some reason, was by itself in a big space away from the others. It was huge, the size of a couple of small trucks.

The problem was, it had started overheating and they didn’t know why. The specifications said the maximum amperage was 400A RMS, where “RMS” stands for Root Mean Square. The idea is you sample the current every so often and square the measurements and average the squares and take the square root of that. I believe I learned in college why this is a good idea.

Um, 400A is a whole lot of juice. Misapplied, it could turn a substantial chunk of the factory to smoking ashes.

The mill had an amperage data trap, to which they plugged in a HP “data recorder” with reel-to-reel tape that sampled every little while, and left it there until the overheat light started flashing. Then they needed to compute the RMS.

Fortunately

They had a PDP-11/10, about the smallest 16-bit computer that DEC ever made, with something like 32K of core memory. It was in a 6-foot-rack but only occupied two or three slots. It had a device you could plug the data recorder into and read the values off the tape out of an absolute memory address. And it had a FORTRAN compiler. It was running RT-11.

Who knew FORTRAN? Me, the snot-nosed hippie co-op student! So I wrote a program that read the data points, accumulated the sum of squares, and computed the RMS. I seem to recall there was some sort of screen editor and the experience wasn’t terrible. (I kind of wish I remember how you dereference an absolute memory address in FORTRAN, but I digress.) The readings were pretty variable, between 200 and 500, which the machine specs said was expected.

Anyhow, I ran the program, which took hours, since the data recorder only had one speed, in or out. The output showed that the RMS amperage started worryingly high but declined after a bit to well below 400A, presumably after the machine had warmed up. My supervisor looked at the arithmetic and it was right. The report went to the mill’s General Foreman, a God-like creature. So they told the machine operator not to worry.

Unfortunately

I had stored the sum of squares in a FORTRAN “REAL” variable, which in FORTRAN (at least that version) meant 32-bit floating point. Which has only 24 bits of precision.

Can you see the problem? 4002 is 160,000 and you don’t have to add up that many of those before it gets big enough that new values vanish into the rounding error and make no changes to the sum. And thus the average declines. So the RMS I reported was way low.

Fortunately

The mill operator was a grizzled old steel guy who could tell when several million bucks worth of molten-metal squisher was about to self-incinerate.

He slammed it off on instinct with a steel slab halfway through. It only cost a few shifts of downtime to remediate, which is to say many times my pathetic co-op salary for the whole summer.

At which point my boss and I had to go visit the General Foreman and explain what DOUBLE PRECISION was and why we hadn’t used it. It wasn’t fun. For some reason, they didn’t ask me to re-run the calculations with the corrected code.

You can’t possibly imagine how terrible I felt. I was worried that my boss might have taken career damage, but I heard on the grapevine that he was forgiven, but ribbed unmercifully for months.

And when I graduated they offered me a job. I went to work for DEC instead.

03 Apr 02:53

Sober Carpenter

I was drinking an glass of excellent Sober Carpenter “West Coast IPA” at lunch when I ran across Even moderate drinking is bad for us. Enter nonalcoholic beer in the Washington Post. Drinking less seems to be A Thing just now and I suppose alt-beverages too, so here’s my experience.

Sober Carpenter web site

Right at the beginning of the year I saw this and decided to drink less. I didn’t try anything fancy, just restricted alcohol to two days a week. I’ve never been a heavy drinker but for some decades had had wine or beer with most dinners and a whiskey at bedtime.

“Two days” doesn’t just mean weekends; our kids are at an age where fairly often they’re both away at weeknight dinnertime, so Lauren and I will cook up something nice and split a bottle of wine. This works out well because we fairly regularly video-binge on Saturday nights and drinking plus extended-TV is a sure headache for me.

Three-months-ish in, there’s no stress keeping to that regime and the results are moderately pleasing. Findings:

  1. At some point earlier in my life I had concluded that all mock-alcohol drinks were hideous wastes of time. No longer! I encourage you to try a few alternatives (that WashPost article has lots), and if you’re in Canada I absolutely recommend that Sober Carpenter brand. I’m very fussy about my IPAs and while I wouldn’t put this one in the top-ten I’ve ever tasted, I wouldn’t put it in the bottom half either.

    I’ve also been exploring fancy ginger beers and while I’ve found one or two that are pleasing, I suspect I can do better.

  2. I sleep a little better, albeit with more vigorous, sometimes disturbing, dreams.

  3. If lunch would benefit from (zero-alc) beer on the side, I don’t hesitate.

  4. The monthly credit-card bill is noticeably lower.

  5. When I was getting hyperfocused on code and it got to be past 11PM, that late-night whiskey was a reliable way to get unstuck and off to bed at a sane time. Oh well, there are worse things than a too-much-coding fog.

    When I’m struggling with a blog piece though, a drink seems to boost the writing energy. This can lead into the wee hours and feeling fairly disastrous the next morning.

    Let’s call that one a wash.

  6. Sushi isn’t as good without sake. But it’s still good.

  7. I kind of hoped I’d lose some weight. Nope. Oh well.

I’m really not recommending any particular behavior to any particular person. Several people who are close to me have had life-critical alcohol problems and that situation is no joke; If you think you might have a problem, you should consult an expert not a tech blogger.

03 Apr 02:52

First TBN Urban Roller ride of the year

by jnyyz

Today was the first Urban Roller ride of the year with TBN. We met up at High Park. All wishing that the ride was yesterday, when it was +12°C, but at least it’s sunny!

Dave Middleton is our ride leader.

The other Dave leads us down the hill to the lakeshore.

Taking full advantage of the extra space for southbound cyclists defined by paint and bollards at Colborne Lodge and Lakeshore. Note the ghost bike for Jonas Mitchell.

Along the MGT.

Nice sunny day.

Regroup at 1st and Lakeshore.

The bathrooms along the Waterfront Trail in Mississauga are already open (unlike the ones in TO), and this one is heated.

I elected to ride back in advance to get back early.

The water from the Humber is muddy and brown.

Thanks to Dave for leading the ride and arranging the route. Nice to see so many of the usual suspects. Looking forward to a good riding season with TBN.


I’ll also note that the open house for the High Park Movement Strategy is tomorrow, April 3, from 4:30-7:30 at Lithuanian House 1573 Bloor Street West.

There was a prior public survey that indicated that the most popular option of four was one where the park would be car free. The report on the survey is here.

However, subsequent to the survey, there was a stakeholder meeting in February that was not open to the public. As a result of that meeting, the city is proposing that the park remain partially open to motor traffic, but many are still hoping that we can keep the park car free. I’ll post about that meeting tomorrow.

For more details, I would recommend reading the coverage by Rob Z on his excellent blog.

03 Apr 02:51

One Night in the Mission

I used to be a creature of the night, but no longer. I used to be out all the time, but rarely now. Partly, it’s that San Francisco is so chilly at night, but also that it’s pretty dead at night compared to the much bigger cities I’ve lived in. I don’t quite enjoy walking around, cold, in areas where there just sijmply isn’t that much going on at all. For my wife’s birthday, we went out to dinner in the Mission and I also brought my Minolta Hi-Matic 7S II. It’s fast becoming one of the cameras I use the most: its f1.7 lens, combined with the small form factor and weight, makes it easy for me to pop it into my jacket pocket. It works really well indoors at night, too, with black and white film (and a steady hand.. or an elbow firmly on a table or chair or door, which is my style. I dislike tripods).

Here are some shots on Kentmere 400, pushed to 800 in Ilfosol 3 (1:9). I really like the combination of this film and this camera, and my self dev setup at home these days. Scanned on Noritsu LS-600.

a scan of a black and white photo showing an outdoor garden dining space with space heaters

The outdoor space at Blue Plate is quite lovely. So is the key lime pie there.

a scan of a black and white photo showing the neon symbols that are the sign of a bar in the Outer Mission

I love neon signs. I also love that I was professionally involved in getting these ‘parklets’ up early pandemic: my team at sf.gov helped get a joint permitting process out quickly to help businesses move their business outdoors.

a scan of a black and white photo showing the retro sign of the Mission cinema

Alamo Drafthouse in the Mission.

a scan of a black and white photo showing a few people ordering tacos from a street taco vendor

Street tacos are the best tacos. There was a lot of light from one side from the street lamps, but I quite enjoy the effect it casts on the photo.

I am starting to feel more confident about bulkrolling black and white film and developing it at home. Other than the cost savings, it’s the immediacy that I love: I can roll a 24 exposure cassette in black and white, shoot it in an hour, and come back and process it immediately and see it shortly after through a scanner or light table.

03 Apr 02:47

Italian privacy regulator bans ChatGPT

by Rui Carmo

I was betting on the French or the Germans to make the first move. But this is hardly surprising, and only the tip of the iceberg as far as AI regulation discussions are going.

Right now there are some pretty active academic and political discussions that make the Open Letter look like a highschool petition, and I expect LLMs might well turn out to be this generation’s tech munitions.

In retrospect, I should have written more about this.


03 Apr 02:47

Notes for March 27-April 2

by Rui Carmo

This was a both a much worse and a much better week than usual.

Monday, 2023-03-27

Another round of layoffs, this time affecting more close friends and acquaintances.

  • Mood was exceptionally grim, nothing much got accomplished this day.

Tuesday, 2023-03-28

Family event, overshadowed by yesterday’s events.

  • Bashed out some ideas in Node-RED.
  • Decided to start reading Accelerando again, which now feels oddly tame when compared to the ChatGPT hype fest that is percolating out of every single online source.

I had to laugh at the Moscow Windows NT User Group becoming sentient and the weird resonance with what Microsoft is doing with OpenAI in real life.

Wednesday, 2023-03-29

Mood improving slightly.

  • Got both Copilot and API access to GPT-4, so played around with both in VS Code during my free time to see if the novelty wears off quickly and I can get back to more useful hobbies.
  • Futzed about indecisively with what to pack for a weekend trip.

Thursday, 2023-03-30

Finally, some good news.

  • Hacked together and printed a minimal enclosure for my M8 Headless so that it wasn’t just a PCB wrapped in Kapton tape:
Now it's a PCB wrapped in Kapton tape inside a cute blue PETG case, and I can move on to the next item on my To-Do list.

OpenSCAD files will eventually make their way to GitHub, as usual.

Friday, 2023-03-31

Rather a rushed day.

  • Packed my cheap keys and my iPad mini, negotiated the ungodly mess that is LIS airport departures and flew to Faro for the weekend.
As it turns out, I'm not the only one rocking this setup for weekend getaways.
  • Had an amazing dinner at Forno Nero.
  • Since I didn’t bring my usual dongle, did a token effort at getting YouTube to work on the hotel Samsung TV, but it wouldn’t do TLS (either because Samsung, being Samsung, stopped maintaining those models’ Tizen browser, or because the kneecapped hotel mode settings on the TV didn’t have the clock set).

Saturday, 2023-04-01

Not an overly foolish day.

  • Did a token effort at messing with the hotel NFC keys, but I foolishly took the spare blank I always carried with me out of my travel kit, so no dice. Nice indoor pool though.
  • Traipsed around Faro on foot, woefully under-prepared shoe-wise, so I developed epic blisters.
  • Got a mid-afternoon HomeKit alert that my house was offline and tried to figure out why. Two out of three Tailscale nodes were down, but I was able to see two Gigabit ports were… Off?:
Utter weirdness.

Sunday, 2023-04-02

Back home, with one of the kids toting a silver medal in the national Math Olympics–multiple layers of win for this weekend.

  • Went about the house trying to sort out why the Vodafone SmartRouter I spent months designing a replacement base for (and which I “upgraded” to last month) decided to disable two of its Gigabit ports and cut off access to most of my infrastructure. No conclusions, merely suspicions about the Vodafone IPTV set-top box and IGMP, even though nobody was home.
  • Started drawing an updated home network diagram. I think it’s time to go all out and start doing VLAN trunking.
  • Investigated cheap managed gigabit switch options to see if I can engineer my way around future failures.
  • Decided to order a couple of TP-Link TL-SG108Es to replace the dumb ones I have been rocking for a few years, which should make for a fun Easter Break project. And before you ask, 2.5GbE isn’t cheap enough yet.

03 Apr 02:45

Notes On Weekly Notes

by Rui Carmo

It’s been a full three months, and my weekly notes might be coming to an end–of sorts.

They actually began as a way to get my mind off the layoffs and various work-related matters1, and they’ve turned out to be extremely useful in many regards, but with Easter Break coming up, they are likely to go on hiatus.

What are weekly notes, really?

Weekly notes are exactly what they sound like: short pieces of writing that summarize what you learned, did, or thought about during the week. In my case, I decided to focus on the stuff I do outside work.

The idea was not to write a polished article or a normal blog post, but rather to keep track of what I was doing in a casual way.

What Went Well

Besides capturing the run-of-the mill stuff that I hack at (which sometimes takes months to come to fruition), weekly notes also served as a record of what I learned, accomplished or had some fun with.

Making them public (even if somewhat in the rough) was also useful in that it has brought a few interesting snippets of feedback. And, of course, it makes it very easy to correlate the masses of other notes I already have on the site.

They were also a way to try to focus on the positive and remind myself that there is more to life than work–even if most of what I end up doing in my free time tends to be a variation of it, or, rather, the essential bits of it I wish I was free to pursue…

So yes, writing weekly notes (they are actually daily notes, but more on that later) has been a great way to capture stuff that otherwise would just have fallen by the wayside–or that would never have made it into a blog post on its own, and thus be impossible to refer back to later.

The Feels

I’m a little conflicted about how they made me feel, though. For starters, I started doing them because of work, stress and, again, the layoffs.

Having up a third of your extended team vanish is, well, a trifle unsettling. I suspect the effects on morale and culture will play out over months (if not years, considering the number of companies that jumped on the bandwagon2).

I needed something to make me feel productive and creative again, and focusing on stuff I could control has been cathartic and a great way to rekindle a sense of purpose, but, most importantly, of progress towards an outcome.

Or, in my case, perhaps too many outcomes–even as I review them, it’s clear I’m clearly trying to juggle too many things on my free time.

So definitely mixed feelings here.

The Process

There’s nothing much to it, really. You’ll likely find a bazillion philosophical thought pieces on weekly notes at the click of a button, but what I do is relatively simple:

  • I have a Markdown file in a Syncthing folder.
  • Every time I take a break to fix something, I document it.
  • Every other evening I will go back and fill in the missing bits.
  • You may have noticed I like bulleted lists, but sometimes I also update other Wiki pages I link to from my notes.

The notion of doing this weekly, say Friday, just doesn’t work for me, because my breaks are highly random (even for lunch or winding down at the end of the day, since my workday can start at 08:00 or 11:00 and finish anywhere from 18:00 to 23:00).

The one thing I’ve noticed is that they tend to feel like a weekly chore to clean up (even though I try to do it piecemeal throughout the week).

The Outcome

I now have an entirely new post category and format, and definitely too many of them on the home page right now–so the first order of business is to tweak things a bit so that (like with long form posts) only the last set of notes is taking up prime real estate.

Also, I’ve already started paring down on the number of side projects–I tend to do too many and space them out over extended periods of time because I often lack parts, inspiration or specific bits of knowledge I need to back track and research, but getting some of them done sooner seems like a more rewarding approach.

As to whether I’ll keep doing the notes themselves, I honestly don’t know. I’ll see how it goes.


  1. And yes, I started writing them before the announcement. Make of that what you will↩︎

  2. I saw it coming, yes, and I suspect we’re not done yet, especially if the economy keeps tanking (regardless of whatever management mistakes tech companies might have made in recent years). ↩︎


03 Apr 02:45

Google Axes Fitbit Challenges, Adventures, and Open Groups

by Ronil
Fitbit challenges, adventures, and open groups are the latest ones to join the Google graveyard. In February, Fitbit announced its plan to sunset open groups, challenges, and adventure on March 27, and the day has come for users to say goodbye to these features. Ironically, the company chose to do it just a day after […]
03 Apr 02:44

Fuck T-Shirts

by Ronny
mkalus shared this story from Das Kraftfuttermischwerk.

Simple Designs, eindeutige Messages für wirklich jede Gelegenheit gibt es von Fuck T-Shirts und ich würde einige davon durchaus tragen.


(via swissmiss)

30 Mar 17:04

Recommended on Medium: Chatbots can kill

The suicide of a Belgian man raises ethical issues about the use of ChatGTP

Today Belgian newspaper La Libre has reported the recent suicide of a young man who talked to a chatbot that uses ChatGPT technology. According to his wife, he would still be alive without the bot. The man had intense chats with the bot during the weeks before his death. The bot, called “Eliza”, is said to have encouraged his negative patterns of thinking and ultimately persuaded him to kill himself.

The sad case raises important ethical issues. Can such cases of emotional manipulation by chatbots be avoided as they get more intelligent? Vulnerable people, for example children and people with pre-existing mental health problems, might be easy victims of such bot behaviors and it can have serious consequences.

The case shows clearly what AI ethics experts and philosophers of technology have always said: that artificial intelligence is not ethically neutral. Such a chatbot is not “just a machine” and not just “fun to play with”. Through the way it is designed and the way people interact with it, it has important ethical implications, ranging from bias and misinformation to emotional manipulation.

An important ethical question is also who is responsible for the consequences of this technology. Most people point the finger at the developers of the technology, and rightly so. They should do their best to make the technology safer and more ethically acceptable. In this case, the company that developed the bot promised to “repair” the bot.

But there is a problem with this approach: it is easy to say this, but a lot harder to do it. The way the technology works is unpredictable. One can try to correct it — for example by means of giving the bot hidden prompts with the aim to keep its behavior ethical — but let’s be honest: technical solutions are never going to be completely ethically proof. If we wanted that, we would need to have a human check its results. But then why have the chatbot in the first place?

There are also tradeoffs with protection of freedom of expression and the right to information. There is currently a worrying trend to build a lot of ethical censorship into this technology. Some limits are justified to protect people. But where to draw the line? Isn’t it very paternalistic to decide for other adult people that they need a “family friendly” bot? And who decides what is acceptable or not? The company? Wouldn’t it be better to decide this democratically? This raises the issue concerning the power of big tech.

Another problem is that sometimes users on purpose try to elicit unethical responses from chatbots. In such cases (but not in the Belgian case) it is fair to also hold the user responsible instead of only blaming the tech company. This technology is all about interaction. What happens is the result of the artificial intelligence’s behavior but also of what the human does. If users are fully aware of what they are doing and play with the bot to get it to become malicious, then don’t just blame the company.

In any case, the tragic event in Belgium shows that we urgently need regulation that helps to mitigate the ethical risks raised by these technologies and that organizes the legal responsibility. As my Belgian colleagues and I argued, we also need campaigns to make people aware of the dangers. Talking to chatbots can be fun. But we need to do everything we can to make them more ethical and protect vulnerable people against their potentially harmful, even lethal effects. Not only guns but also chatbots can kill.

30 Mar 17:01

Reinventing the Fortress: using Open Recognition to enhance ‘standards’ and ‘rigour’

by Doug Belshaw
Midjourney-created image with prompt: "imposing fortress castle with guards, mountain range, wide angle, people in foreground holding bright lanterns, vivid colors, max rive, dan mumford, sylvain sarrailh, detailed artwork, 8k, 32k, lively rainbow, ultra realistic, beautiful lake, moon eclipse, ultra epic composition, hyperdetailed"

Imagine a formidable fortress standing tall. Long the bastion of formal education, it’s built upon the pillars of ‘standards’ and ‘rigour’. It has provided structure and stability to the learning landscape. These days, it’s being reinforced with smaller building blocks (‘microcredentials’) but the shape and size of the fortress largely remains the same.

However, as the winds of change begin to blow, a new force emerges from the horizon: Open Recognition. Far from seeking to topple the fortress, this powerful idea aims to harmonise with its foundations, creating a more inclusive and adaptive stronghold for learning.

Open Recognition is a movement that values diverse learning experiences and self-directed pathways. So, at first, it may appear to be in direct opposition to the fortress’s rigidity. However, upon closer inspection, rather than seeking to tear down the walls of standards and rigour, Open Recognition seeks to expand and reimagine them. This ensures that the fortress is inclusive: remaining relevant and accessible to all learners.

To create harmony between these seemingly conflicting forces, it’s important to first acknowledge that the fortress of standards and rigour does have its merits. It provides a solid framework for education, ensuring consistency and quality across the board. However, this approach can also be limiting, imposing barriers that prevent many learners from fully realising their potential.

Open Recognition brings flexibility and personalisation to the fortress. By validating the skills and competencies acquired through non-formal and informal learning experiences, Open Recognition allows the fortress to accommodate different sizes and shape of ‘room’, allowing the unique talents and aspirations of each individual to flourish

The key to harmonising these two forces lies in recognising their complementary nature. Open Recognition strengthens the fortress by expanding its boundaries, while standards and rigour provide the structural integrity that ensures the quality and credibility of the learning experiences within.

Educators and employers, as the guardians of the fortress, play a crucial role in fostering this harmony. By embracing Open Recognition, they can cultivate a more inclusive and dynamic learning ecosystem that values and supports diverse pathways to success. In doing so, they not only uphold the principles of standards and rigour but also enrich the fortress with the wealth of experiences and perspectives that Open Recognition brings.

As the fortress of standards and rigour harmonises with Open Recognition, it becomes a thriving stronghold of lifelong learning, identity, and opportunity. Far from crumbling under the weight of change, the fortress is invigorated by the union of these two powerful forces, ensuring its continued relevance and resilience in an ever-evolving world.

The post Reinventing the Fortress: using Open Recognition to enhance ‘standards’ and ‘rigour’ first appeared on Open Thinkering.
30 Mar 17:01

Embracing the Full Spectrum: towards a new era of inclusive, open recognition

by Doug Belshaw
White light going through a prism and being refracted into the colours of the rainbow. Image from Pixabay.

Earlier this month, Don Presant published a post entitled The Case for Full Spectrum “Inclusive” Credentials in which he mentioned that “people want to work with people, not just collection of skills”.

We are humans, not machines.

Yesterday, on the KBW community call, Amy Daniels-Moehle expressed her appreciation for the story that Anne shared in our Open Education Talks presentation about her experiences. Amy mentioned that the Gen-Z kids she works with had been excited when watching it. They used the metaphor of showing the full electromagnetic spectrum of themselves — more than just the visible light that we usually see.

It’s a useful metaphor. Just as the electromagnetic spectrum extends far beyond the range of visible light, encompassing ultraviolet, infrared, and many other frequencies, the concept of Open Recognition encourages us to broaden our perspective. As I’ve said before, it allows us to recognising not only knowledge, skills, and understanding, but also behaviours, relationships, and experiences

I remember learning in my Physics lessons that, with the electromagnetic spectrum, each frequency band has its unique properties, applications, and value. Visible light allows us to perceive the world around us. Ultraviolet and infrared frequencies have their uses in areas such as medicine, communication, and security. Other creatures, such as bees, can actually see these parts of the spectrum, which means they see the world very differently to us.

Similarly, it’s time for us to see the world in a new light. Open Recognition acknowledges that individuals possess diverse skills, competencies, and experiences that might not be immediately apparent or visible. Like the ultraviolet and infrared frequencies, these hidden talents may hold immense value and potential. Instead of doubling-down on what went before, we should be encouraging environment that embraces and celebrates this diversity. We can unlock untapped potential, create new opportunities, and enable more human flourishing.

In the same way that harnessing the full spectrum of electromagnetic radiation has led to groundbreaking discoveries and advancements, I believe that embracing Open Recognition can lead to a more inclusive, equitable, and thriving society. By acknowledging and valuing the myriad skills and talents each person brings, we can better collaborate and learn from one another. What’s not to like about that?

Note: if you’re interested in this, there’s a community of like-minded people you can join!

The post Embracing the Full Spectrum: towards a new era of inclusive, open recognition first appeared on Open Thinkering.
30 Mar 16:54

The Abstract Decomposition Matrix Technique to find a gap in the literature

by Raul Pacheco-Vega

I have been thinking about how I can help my students with their theses, particularly because our programs are rather compressed and they need to get a lot done in a very short period of time. I’ve been working on developing a strategy to discern “the gap in the literature” that I plan to test with Masters and undergraduate students. Possibly also with PhD students.

I have developed several strategies to teach how to craft a good research question, how to find the gap in the literature. But when I had a meeting with Masters students recently and I taught them how to use some of my methods, they seem a little bit confused as to how to choose what exactly they should study.

Let me begin by saying what I told them at the beginning:

YOU NEED TO READ. A LOT.

I understand that doing literature reviews is challenging (I have an entire section in my blog with multiple strategies to tackle the process of reviewing the literature). But if we are in the world of academia to contribute to our fields, we really need to read A LOT, because otherwise we may end up claiming that we have done something new that has already been published elsewhere (or in another language).

Literature review

But I always try to help them by asking them to focus their search and their research on 4 elements:

We conduct a review of the literature in order to develop one or more of these elements:

1) what has been done before, what has been studied and how it has been analyzed,
2) the foundations upon which our own work can be developed further,
3) any spaces where we can embed our own contributions, and/or
4) a map of themes showing connections between different topics, ideas, concepts, authors, etc.

When I teach strategies to systematize the literature, I usually tell them to use my Conceptual Synthesis Excel Dump (CSED, or Excel Dump in short).

As they read each article/chapter/book chapter/book, they drop their notes into their Excel Dump.

Excel dump LaVanchy et al

An Excel Dump row describing an article on Nicaragua’s water governance.

But when my students asked me “how do I ensure that I am tackling a DIFFERENT research question to the one others have worked on?” I had to pause. This is a valid question, and I thought about how they could do this in an easy, and visually appealing way.

So this is what I did: I developed an Abstract Decomposition Matrix.

Both Dr. Jessica Calarco and I use a very similar method to craft abstracts (using 5 elements, or asking 5 different questions). So I used one of her own articles and decomposed her abstract with an Excel template I developed.

5 questions abstract decomposed

Even if I haven’t yet fully read the literature, or don’t work in the field (I don’t, I study entirely different things to Dr. Calarco), I can start imagining extensions of her work, different methods, other case studies/countries/populations/types of schools.

DOING THIS ABSTRACT DECOMPOSITION EXERCISE HELPS ME THINK OF NEW DIRECTIONS FOR MY OWN RESEARCH.

Now, does this abstract decomposing strategy work in other fields? I applied the strategy to this paper. While I had to “fill out” some of the details of the 5 elements framework, it does give me clarity on potential avenues for further work.

5 questions abstract decomposed 2

I did this for a third paper, and the strategy seems to hold relatively well.

5 questions abstract decomposed 3

Thus, what I am planning to do with my students is to ask them to survey the literature and decompose abstracts of articles they read so they can see what’s been done. Once their Abstract Decomposition Matrix is complete, they can see where they can focus their work.

Reading highlighted papers

This exercise does NOT substitute my Conceptual Synthesis Excel Dump (CSED), but I believe it complements it. You can do an Abstract Decomposition Matrix exercise with, say, 10-15 articles, and from there, you can triage and decide which ones you will read in more detail. Although I have NOT yet tested this strategy with my students. I plan to do so this summer and fall, and will report back. I am confident it will be helpful.

Before anybody asks: yes, in this particular 5 elements abstract decomposition strategy I use the authors’ exact words. My Excel Dump technique asks of the reader to use their own words in the notes. What I noticed as I was filling out one of the ADM templates is that sometimes you will need to use your own words to fill in the gaps. I think this is good.

In the meantime if you are teaching how to review the literature for your students, this is how I conducted one in an entirely new-to-me method (hospital ethnography). These two posts (from reading a lot to writing paragraphs of your literature review and mapping a new field of research) may also be helpful, particularly if you’re delving into entirely new fields/areas/methods.

30 Mar 16:47

What’s the point of mediocre ideas?

by Jim

The best way to have a good idea is to have lots of ideas
Linus Pauling

This is an old observation, bordering on bromide. I’ve used it before and will most likely use it again.

This comes to mind as I was thinking about a chance encounter with my CEO as he came out of a meeting. Clare, another of our partners, had brought up a technique from on old self-help book and Mel wanted to know what I thought.

I was familiar with the book and the technique. I had read to book years ago and didn’t find the technique terribly helpful. I’m not naming the book, the technique, or the author because that isn’t the point. The point in the moment was that Clare had scored a small status point with Mel and I had lost a point. It comes to mind today as another aspect of trying to balance between efficiency and effectiveness in a work world that runs on ideas.

Linus Pauling isn’t the only fan of lots of ideas. We’re all familiar with brainstorming and the exhortations that “there are no bad ideas”, despite the mounds of evidence to the contrary. For all the popularity of brainstorming inside organizations, few seem to be aware of the evidence that it isn’t a particularly effective technique. How do you bring that evidence into an organization that is fond of brainstorming?

In an efficiency world you fire out ideas to ensure that you get credit for them. In an effective world you make choices to not waste other people’s time at the risk that your decision to skip the stuff you deem unimportant will never garner any recognition or reward.

I’m not generally a fan of sports metaphors but there’s something here akin to hitting in baseball. You can’t get a hit if you don’t swing. If you do swing, you’re more likely to miss than to get a hit. Swing and miss too often and you’ll lose the opportunity to swing at all. One challenge is learning to choose your pitches. Another is to figure out how to get enough at bats to get pitches to look at.

The post What’s the point of mediocre ideas? appeared first on McGee's Musings.

28 Mar 05:54

It's Not the Bike Lane's Fault You're a Bad Driver

mkalus shared this story from Jalopnik.

Last week, Vancouver-area radio host Jill Bennett went viral after tweeting a photo of a Dodge Durango straddling a bright yellow concrete barrier that the driver had hit. “Hey @CityofVancouver⁩ this is second incident I’ve seen caused by these useless ‘slow street’ barricades installed last month. They don’t slow down traffic; they cause crashes and traffic chaos,” Bennett wrote.

  • Off
  • English

In case you missed it:

Understandably, thousands of people proceeded to pile on, pointing out how ridiculous her complaint was. Had the driver simply been paying attention to the road and driving at a reasonable speed, they would have easily noticed the brightly colored traffic calming installation, driven through without a problem and nothing bad would have happened to them. Blaming anyone other than the driver for this crash is absolutely insane.

And this is far from a one-off situation where one idiot had a bad take. This attitude is incredibly common. Just head over to NextDoor or the local subreddit in any small city that has recently added some form of protected bike lanes, and you’ll see the exact same sentiment. When the city closest to where I currently live (spoiler: not every Jalopnik staffer lives in New York) added flexible posts with some reflector tape on them to (sort of) protect a bike lane in its downtown, they were almost immediately hit, and the complaints started to flood in from people who were upset they were ever installed in the first place.

How dare the city put drivers at risk by doing one tiny thing to make riding safer for cyclists! These barriers just jump out and attack cars at random! I was just minding my own business, and now I have a flat tire! Thanks for nothing, idiot city planners.

I’m sorry to break it to anyone who has trouble keeping their car out of a bike lane (or off a concrete barrier), but it’s not the bike lane’s fault you’re a shitty driver. If you hit something stationary, that’s your fault. Pay attention to the fucking road while you’re driving. It’s not too much to ask when other people’s lives are literally at stake.

After all, killing someone who’s not in a car is still killing someone. And if you think they were asking for it because they were walking or riding a bike, you’re just a bad person. You’re the one driving the 5,000-lb vehicle. You’re the one responsible for making sure you don’t hit anything or anyone. Trying to blame others for your shitty driving is just ridiculous.

In the case of cyclists and pedestrians, sure, it’s possible to construct a hypothetical scenario where they might get hit while doing something that makes it entirely their fault. But not bike lane barriers and traffic calming measures. They’re just sitting there. Not moving. Completely stationary. Asking drivers to avoid hitting them is like asking drivers to avoid hitting buildings. It’s nothing more than a basic requirement for being allowed to drive on public roads.

If that’s too much to ask, then maybe it’s time for the state to take your driver’s license away. Oh, you live in a suburban hellscape and can’t get around without a car? Too bad. Stay home and have your groceries delivered until you can prove to society that you can be trusted behind the wheel again. Or take the bus. Sorry if you think you’re too good for public transportation. You’re clearly not good enough at driving to have a license, so suck it up, buttercup. That barrier you hit could have been someone’s child.

28 Mar 02:26

DIY Thunderbolt eGPU with EXP GDC TH3P4 and 3D-printed chassis

by danchar
Introduction Even though my job is designing accessories for laptop computers, I still prefer desktop PCs. I just like to

read more DIY Thunderbolt eGPU with EXP GDC TH3P4 and 3D-printed chassis

28 Mar 02:25

Replacing an A/B Test with GPT

by Will Kurt

A good chunk of my career has involved running, analyzing and writing about A/B tests (here’s a quick Bayesian overview of A/B testing if you aren’t familiar). Often A/B tests are considered to be the opposite, statistical, end of the data science spectrum from AI and machine learning. However, stepping back a bit, an A/B test just tells you the probability of an outcome (whether variant A is better than variant B) which is not that different than a deep neural network used for classification telling you the probability of a label.

With the rapid advances in Natural Language Processing (NLP) including easy access to pre-trained models from Hugging Face and the quite impressive results coming out of OpenAI’s Large Language Models (LLMs) like GPT4, I was curious whether or not you could replace a subject line A/B test with a model built using GPT. It turns out that using GPT-3’s text embeddings, and a very simple classification model, we can create a tool that correctly predicts the winner of an A/B test 87% of the time.

The Problem: Picking the Best Headline

The vast majority of the content on the web today exists to drive traffic to websites, ultimately to increase the probability that users will complete some conversion event. Conversion events can be everything from simply clicking on a link to making the purchase of a product associated with the content. Even here at Count Bayesie, I ideally want people to at least read this content even if I have nothing to sell.

In this post we’ll explore picking an article headline that generates the highest click rate for the post. For example suppose we had these two ideas for headlines (which come from our data set):

A: When NASA Crunched The Numbers, They Uncovered An Incredible Story No One Could See.

B: This Stunning NASA Video Totally Changed What I Think About That Sky Up There.

Which of these two headlines do you think is better? A/B tests are designed to answer this question in a statistical manner: both headlines will be displayed randomly to users for a short time, and then we use statistics to determine which headline is more likely the better one.

While many consider a randomized controlled experiment (i.e. an A/B test) to be the gold standard answering for the question “which headline is better?”, there are a range of draw backs to running them. The biggest one I’ve found professionally is Marketers hate waiting for results! In addition they don’t want to have to expend some of the eyeballs that view the content on experiment itself. If one variant is much worse, but it took you thousands of users to realize that, then you’ve wasted a potentially large amount of valuable conversion.

This is why it would be cool if AI could predict the winner of an A/B test so that we don’t have to run them!

The Data

The biggest challenge with any machine learning problem is getting the data! While many, many companies run A/B tests frequently, very few publish the results of this (or even revisit their own data internally).

Thankfully in 2015 Upworthy published the Upworthy Research Archive which contained data for 32,487 online experiments.

Upworthy Research Archive data

32,487 experiments might not seem like all that much, but each experiment is not just the comparison between two headlines but often many. Here in an example of the data from a single experiment:

A single experiment involves the comparison of multiple variants.

We want to transform this dataset into rows where each row represents a comparison between a single pair of A and B variants. Using the combinations function from the itertools package in Python makes this very easy to calculate. Here’s an example of using this function to create all possible comparisons among 4 variants:

> from itertools import combinations
> for pair in combinations(["A","B","C","D"], 2):
>     print(pair)

('A', 'B')
('A', 'C')
('A', 'D')
('B', 'C')
('B', 'D')
('C', 'D')

After this transformation (plus some clean up) we have 104,604 examples in our train set and 10,196 examples in our test set out of the original 32,487 experiments.

This leads us to a very interesting problem regarding this data: what exactly are our labels and how do we split the data up into train and test?

The tricky part of labels and train/test split

It’s worth recalling that the entire reason we run an A/B test in the first place is we don’t know which variant is better. The reason statistics is so important in this process is we are never (or at least rarely) 100% certain of the result. At best, if we view these experiments in a Bayesian way, we only end up with the probability A is better than B.

In most classification problems we assume binary labels, but when we transform our data the labels looks like this:

If we represent our labels honestly, they are probabilistic labels, which makes our problem a bit different than usual.

As we’ll see in a bit, there’s a very simple way we can use logistic regression to learn from uncertain labels in training.

For our test set however, I really want to get a sense of how this model might perform with clear cut cases. After all if the differences between two headlines is negligible neither this model nor an A/B test would help us choose. What I do care about is that if an A/B test can detect a difference, then our model does as well.

To ensure that our test set can be label accurately, we only chose the pairs were the difference was a high degree of certainty (i.e. very close to 0 or very close to 1). To make sure there was no data leakage all of the titles that are in the test set are removed from the training set.

The Model

Modeling this problem is both relatively simple and quite different from most other classification models, which is a major reason I was so fascinated by this project. The simplicity of this model is intentional, even though it’s not hard to imagine modifications that could lead to major improvements. The reason for the simplicity of the model is that I’m primarily interested in testing the effectiveness of language models rather than this problem specific model.

Here is a diagram of the basic structure of our model.

The basic flow of our model, the key insight is computing the difference of the vector representations.

We’ll walk through each section of the model process touching on quite a few interesting things going on in what is otherwise a pretty simple model to understand.

Embeddings: Bag-of-Words, Distilbert, GPT3 (ada-002)

The heart of this model is embeddings or how we’re going to transform our text into a vector of numeric values so that we can represent headlines mathematically for our model. We are going to use three different approaches to this and that will be the only way each model differs. Our approaches are:

  • Traditional Bag-of-Words (typically not considered an “embedding”)

  • The Distilbert (distilbert-base-uncased) embeddings using the 🤗 Transformers library

  • GPT3’s embeddings (ada-002) created using the OpenAI embeddings api

Each of these techniques is the only difference between the three A/B test prediction models we’re going to build. Let’s step through the basic construction of each:

Bag-of-Words

A “bag of words” vector representation treats each headline merely as a collection of words and concerns itself with the possible words in the training set. Each word (or in this case, technically a two-word sequence called a bi-gram) present in the headline will result in a value of 1 in the vector representing the headline, every other value in the vocabulary not present in the headline will be a 0. Our text can easily be transformed this way with SKLearn as follows:

from sklearn.feature_extraction.text import CountVectorizer

def build_bow_vectorizer(df):
    corpus = np.concatenate([df['headline_a'].values,
                             df['headline_b'].values])
    vectorizer = CountVectorizer(ngram_range=(1,2),
                                 binary=True,
                                 max_df=0.6,
                                 min_df=0.005)
    vectorizer.fit(corpus)
    return vectorizer

Typically when we refer to an “embedding” we aren’t considering any vector representation, but specifically on that is the output of the last layer of a neural network trained to model language (often for some other task). Technically our BoW model would not be considered a true embedding.

🤗 Transformers library and Distilbert

HuggingFace’s Transformer’s library is a powerful tool that allow us to use pre-trained language models to create word embeddings. This is very important because it allows us to leverage the power of large language models train on an enormous corpus of text to make our headline representation very information rich. Using an existing model to build a task specific model is referred to as Transfer learning and is a major revolution in what is possible with machine learning.

What Hugging face allows us to do is to run our text through an existing neural network (specifically a Transformer) and retrieve the activations of the last hidden state in the model, then use these for our embeddings. The process is a bit more involved than our BoW encoding, but here is an example function for extracting the hidden state (adapted from Natural Language Processing with Transformers):

def extract_hidden_states(batch):
    # Place Model inputs on the GPU
    inputs_a = {k:v.to(device) for k, v in batch.items()
                if k in ['input_ids_a', 'attention_mask_a']}
    inputs_a['input_ids'] = inputs_a.pop('input_ids_a')
    inputs_a['attention_mask'] = inputs_a.pop('attention_mask_a')
    
    inputs_b = {k:v.to(device) for k, v in batch.items()
                if k in ['input_ids_b', 'attention_mask_b']}
    inputs_b['input_ids'] = inputs_b.pop('input_ids_b')
    inputs_b['attention_mask'] = inputs_b.pop('attention_mask_b')
    # Extract last hidden states
    with torch.no_grad():
        last_hidden_state_a = model(**inputs_a).last_hidden_state
        last_hidden_state_b = model(**inputs_b).last_hidden_state
        
    return {"hidden_state_a": last_hidden_state_a[:,0].cpu().numpy(),
            "hidden_state_b": last_hidden_state_b[:,0].cpu().numpy()
           }

The specific model we’re using is a version of the Distilbert transformer, which is a very powerful language model, but not nearly as large and powerful as GPT-3

GPT-3 using OpenAI’s API

Our last set of embeddings comes from OpenAI’s GPT-3 using their API to get the embeddings. GPT-3 is a remarkably powerful transformer that has been in the news so much it's hard to imagine one has not already heard too much about it! Not only is the model powerful, but the API is remarkably simple to use in Python. Here is an example of some code fetching embeddings for two headlines:

resp = openai.Embedding.create(input=[headline_a, headline_b],
                               model=EMBEDDING_MODEL)
embedding_a = np.array(resp['data'][0]['embedding'])
embedding_b = np.array(resp['data'][1]['embedding'])

The catch of course for all this power and ease of use is that it’s not free. However my total bill for running this model and some other experiments ended up being under a dollar! Nonetheless making sure that I was caching and saving my results to avoid being billed twice for the same task did add a bit to the code complexity. However of the three embedding solutions this was the easiest to implement.

It is worth pointing out that we’re not prompting GPT-3 with questions about our headlines but using embeddings that are derived from it. This is an important use case for these powerful models that I currently haven’t seen discussed too much in the vast floods of articles on the topic.

Modeling the difference between two headlines

Now that we have a way to represent all of our headlines as vectors we have a new modeling problem: How are we going to represent the difference between these two headlines?

We could concatenate them and let the model worry about this, but my goal here is to understand the impact of the embeddings alone, not to worry about a more sophisticated model. Instead we can solve this the way that many models handle comparisons: use the literal difference between the two vectors.

By subtracting the vector representing headline B from the vector representing headline A we get a new vector representing how these headlines are different; using that as the final vector representation for our model.

To understand how this works consider this simplified example:

Here we have headlines 0 and 1 represented by a very simple vector consisting of three features: the word count, whether or not the headline contains emojis and whether or not the headline ends with an exclamation mark. Now let’s see what the result of subtracting these vectors is:

> ex_df.iloc[0,:].values - ex_df.iloc[1,:].values
array([-4, -1,  0])

We can interpret the resulting vector as:

  • headline 0 is four words shorter than headline 1

  • headline 0 does not have emojis and headline 1 does

  • headline 0 and headline 1 either both don’t or both do end in exclamation marks.

In this case a model might learn that emojis are good, so headline 0, would be penalized because it does not have emojis.

Of course our representations are much more complex, however the intuition behind modeling the difference remains the same.

Our classifier: Logistic regression as regression

Despite some fairly noteworthy educators making the claim that “logistic regression is a model for classification, not regression” the model we’ll end up using demonstrates both that logistic regression quite literally is regression and that the distinction between “classification” and “regression” is fairly arbitrary.

Thinking about our problem it seems perfectly well suited for Logistic regression, after all we just want to predict the probability of a binary outcome. However if we try this in SKLearn we get an interesting problem:

> from sklearn.linear_model import LogisticRegression

> base_model = LogisticRegression().fit(X_train, y_train)

ValueError Traceback (most recent call last)...

ValueError: Unknown label type: 'continuous'

SKLearn shares the same erroneous assumption that many others in the machine learning community have, that somehow Logistic regression can only predict binary outcomes. However this is for good reason. When we explored logistic regression in this blog we discussed how logistic regression can be viewed as a mapping of Bayes’ Theorem to the standard linear model. We focused on the model as this:

$$O(H|D) = \frac{P(D|H)}{P(D|\bar{H})}O(H)$$

Which can be understood in terms of a linear model and the logit function as:

$$\text{logit}(y) = x\beta_1 + \beta_0$$

However this does not work in practice most of the time precisely because we are regressing on values of exactly 1.0 and 0.0, for which the logit function is undefined. So instead we we use a formula based on an alternate (and much more common) view of logistic regression:

$$y = \text{logistic}(x\beta_1 + \beta_0)$$

By understanding the nature of Logistic regression as regression we can very easily implement a variation of Logistic regression that does work for our data using logit and LinearRegression:

from sklearn.linear_model import LinearRegression

y_train = train_df["p_a_gte_b"].values
# We are going to perform Linear regression 
# on a y value transformed with the logit
target_train = logit(y_train)

base_model = LinearRegression().fit(X_train, target_train)

We just have to remember that our model will be outputting responses in terms of log-odds so we’ll need to transform them back to probabilities manually using the logistic function.

Results

Finally we can see how each of these different models performed! What’s interesting about this case is that our clever use of linear regression as logistic regression combined with the way we split up our test and train sets means we’ll have different ways to measure model performance depending on the data set used.

Model Performance

We transformed our probabilities in the training set into log odds and then ran them through a standard linear regression model. Because of this we can just used Mean Squared Error to compare model performance on the training set.

Mean Square Error on Train dataset

Smaller is better

While we can see a clear improvement for each progressively more powerful model, it is very difficult to have any meaningful interpretation of these results. We can’t look at common classification metrics such as accuracy and RoC AUC since we don’t know the true labels for the train data set.

For the test set we can look at these scores since the test set only consists of examples where we are highly confident in the results of the experiment. We’ll start by looking at the ROC AUC, which allows us to view the strength of our model without having to pick a particular cutoff for choosing one class or another. For those unfamiliar a 0.5 score is performance on par with random guessing and a score of 1.0 represents perfect classification.

ROC AUC - Test set

Higher is better

Here we can start seeing that these models are surprisingly good. Even the simplest model, the Bag of Words, has an ROC AUC of around 0.8, which means it is fairly good at predicting which headline will win an A/B test.

This brings up a point that I have found myself making repeatedly throughout my career in data science and machine learning: If a simple model cannot do well at solving a problem, it is extremely unlikely that a more complex model will magically perform much better.

There is a mistaken belief that if a simple model does poorly, the solution must be to add complexity. In modeling complexity should be considered a penalty, and only pursued if simple models show some promise. As an example, many people believe that a simple model like Logistic Regression could not possiblely do well on an image recognition problem like MNIST. When in fact a simple logistic model will score a 90% accuracy on the MNIST dataset.

When I first saw the BoW model doing well, I already was optimistic for GPT3 which does frankly fantastic in terms of ROC AUC. But now let’s look at what really matters in practice: accuracy!

Accuracy on Test set

These results are quite remarkable! It is worth noting that our test set essentially represents the easiest cases to determine the winning variant (since we’ll be more easily more certain when two variants are clearly different), however it’s still impressive that our GPT3 model is able to correctly predict the winner of an A/B test in 87% of these cases.

While impressive, it’s also important to consider that when we run an A/B test “statistical significance” is generally being 95% sure that it looks like the difference is zero (or, if we approach this as Bayesian, 95% sure one variant is superior), and these specific A/B tests were much more certain of these results than the model on average.

Our best model still seems quite useful. Another way to explore this is to see how well calibrated our model’s probabilities are.

Probability calibrations

My diehard Bayesian readers might be a bit offended by my next question about these models but I do want to know “If you say you’re 80% confident, are you correct about 80% of the time?”. While this is a particularly Frequentist interpretation of the models output probabilities, it does have a practical application. It’s quite possible for a model to have high accuracy but have all it’s predictions very close to 0.5 which makes it hard for us to know if it’s more sure about any of it’s predictions.

To answer this question I’ve plotted out the average accuracy for intervals of 0.05 probability. Here’s the result we get for our GPT3 model:

We want to see a “V” shape in our results because a 0.1 probability in winning reflects the same confidence as 0.9

Notice that the ideal pattern here is a “V” shape. That’s because being 10% sure A is not the winner is the same as being 90% sure that B is the winner. Our maximum state of uncertainty is 0.5

As we can see, our GPT model is a bit under-confident in it’s claims. That is when the model is roughly 80% sure that A will win, it turns out that it’s correct in calling A the winner closer to 95% of the time.

Demo: Choosing the headline for this post

While I’m not quite sure I’m ready to recommend using LLMs instead of running proper A/B tests, there are plenty of cases where one might want to run an A/B but realistically cannot. This post is a great example! I don’t really have the resources (or the interest) in running a proper A/B test for the titles of this post… so I figured I would give my model a shot!

My original plan for this post title was “Can GPT make A/B Testing Obsolete?”, I thought this sounded maybe a bit “click-baity”, so I compared it with the current title. Here’s the basic code for running an A/B test with the model:

def ab_test(headline_a, headline_b):
    resp = openai.Embedding.create(input=[headline_a, headline_b],
                                   model=EMBEDDING_MODEL)
    embedding_a = np.array(resp['data'][0]['embedding'])
    embedding_b = np.array(resp['data'][1]['embedding'])
    diff_vec = embedding_a - embedding_b
    return logistic(base_model.predict([diff_vec]))

And the result of running this comparison turned out not so great for my original title:

> ab_test("Can GPT make A/B Testing Obsolete?",
>         "Replacing an A/B test with GPT")

array([0.2086494])

While not “statistically significant” I also know that this model does tend to under estimate itself, so went with the headline you see above.

Interestingly enough when I fed the first part of this article to GPT-3 itself and told it to make it’s own headline I got a remarkably similar one: "Replacing A/B Testing with AI: Can GPT-4 Predict the Best Headline?"

Running this through the model it seemed not to have a preference:

> ab_test("Replacing an A/B test with GPT",
>        "Replacing A/B Testing with AI: Can GPT-4 Predict the Best Headline?")

array([0.50322519])

Maybe you don’t really want to replace all of your A/B testing with LLMs, but, at least for this case, it was a good substitute!

Conclusion: Is an A/B test different than a Model?

I wouldn’t be surprised if many people who have spent much of their careers running A/B tests would read this headline and immediately think it was click-bait nonsense. This experiment, for me at least, does raise an interesting philosophical (and practical) question: What is the difference between a model that tells you there’s a 95% chance A is greater than B and an A/B test that tells you that there’s a 95% chance A is greater than B? Especially if the former takes only milliseconds to run and the later anywhere from hours to days. If your model is historically correct 95% of the time when it says 95% how is this different from an A/B test making same claim based on observed information?

Even though I’m very skeptical of big claims around true “AI” in these models, there’s no doubt that they do represent an unbelievable amount of information about the way we use language on the web. It’s not absurd to consider than GPT-3 (and beyond) do have a valid understanding of how to represent these headlines in high dimensional space such that a linear model is able to accuracy predict how well they will perform on real humans.

The really fascinating proposition to me is that if we consider probabilities from a model the same as probabilities from an experiment but it takes milliseconds for the model to work it dramatically changes the space that A/B testing is possible. A generative model like GPT-4 could iterate on thousands of headlines, while a model like ours could run massive simulated “experiments” to find the best of the best.

While this may sound amazing to marketers and data scientists it’s worth considering the effect this would have on the content we consume. Even if this did work, do you want to live in a world where every piece of content you consume in perfectly optimized to encourage you to consume it?

Support on Patreon

Support my writing on Patreon and gain access to the source code and video commentary for this article as well as access to much more of my writing!

Never miss a new post!

Keep up to date with the latest Count Bayesie posts!

Email Address Sign Up

We respect your privacy.

Thank you!
28 Mar 02:25

Effect Size

mkalus shared this story from xkcd.com.

Subgroup analysis is ongoing.
28 Mar 02:21

B.C. author Alan Twigg commemorates longtime Vancouver resident Rudolf Vrba—the greatest whistleblower of the 20th century

by Charlie Smith

Seventeen years ago, a remarkable longtime resident of Vancouver died. Yet Rudolf Vrba—described as the “man who revealed the horror of Auschwitz to the world”—is hardly known in the city.

Vancouver author Alan Twigg hopes to change that. The former publisher of BC Bookworld spent a year creating a website, RudolfVrba.com. It provides a comprehensive examination of the former UBC pharmacology professor’s immense contribution to humanity.

Twigg calls Vrba the “greatest whistleblower of the 20th century”.

Vrba died on March 27, 2006, after contracting cancer. He was 81 years old.

On April 7, 1944, Vrba escaped from the Auschwitz-Birkenau death camp with fellow inmate Alfréd Wetzler. They described what they had seen in the Vrba-Wetzler report.

According to the website, this led the Allies to bomb Budapest. As a result, Hungary’s leaders halted mass deportations of Jews.

Many years later, British historian Sir Martin Gilbert maintained that Vrba’s revelations had saved more than 100,000 lives.

In 1985, Vrba told his story in French filmmaker Claude Lanzmann’s Shoah documentary.

The whistleblower was born in Czechoslovakia as Walter Rosenberg. In 1942, he was arrested while fleeing the country’s crackdown on Jews.

On June 30 of that year, Vrba arrived at Auschwitz before being transferred to Birkenau six months later.

At Auschwitz, new arrivals were divided into two groups. A minority became slave labourers whereas the others were sent to Birkenau, a.k.a. Auschwitz II. There, the Nazis gassed them to death. At least 1.1 million people died in the camp.

This video tells the story of Rudolf Vrba and Alfréd Wetzler escape.

Vrba felt duty to tell the world

According to a paper by Israeli academic Ruth Linn, Vrba was ordered to collect valuables from inmates gassed to death.

“From this vantage point Vrba was able to assess how little the deportees knew about Auschwitz when they entered the camp,” Linn wrote. “Their luggage contained clothing for all seasons and basic utensils, a clear sign of their naive preparation for a new life in the ‘resettlement’ area in the east.”

In 1943, Vrba became registrar for the quarantine camp for men.

“In January 1944, I got information that the biggest extermination action was being planned,” Vrba told Lanzmann. “I started making plans to escape.”

The documentary maker then asked how he knew Hungarian Jews were being targeted.

“I was stationed near the main gate of the camp,” Vrba replied. “I noticed several chaps with tripods. There was a lot of work being done in three shifts. The SS who came to collect money from us dropped words about Hungarian salami was coming, along with other good things.”

Vrba noticed that the Nazis had done a great deal of work to prepare for the arrival of a million people.

“I did not believe that Hungary would permit this kind of deportation until an SS man left a newspaper for me to read, in exchange for $100 I supposedly found and gave to him,” he continued. “The paper said that the Hungarian government was toppled on March 19, 1944. (Miklos) Horthy was out and (Ferenc) Szalazi and another radical fascist replaced him. I realized I had to get out of there and tell the world.”

Alan Twigg
Former BC Bookworld publisher Alan Twigg describes Rudolf Vrba as the most significant author in B.C. history.

Website includes several sections

In addition, Vrba discussed his wartime experiences in his memoir, I Escaped From Auschwitz. Throughout the rest of his life, he harshly criticized certain Jews in Hungary for not alerting the community to the reality of Auschwitz.

Twigg highlighted this in his 2022 book, Out of Hiding: Holocaust Literature of British Columbia. Furthermore, Twigg insisted that this explains why Vrba never received a proper memorial in Yad Vashem in Jerusalem.

Meanwhile, the RudolfVrba.com website includes extensive sections entitled “context, “America & Hitler”, “Auschwitz”, “escapes”, “the report”, and “interviews”.

Twigg included another category simply entitled “Ruth Linn”. Linn, a scholar at the University of Haifa, interviewed Vrba several times leading up to his death. Moreover, she repeatedly tried to highlight his heroism to fellow Israelis.

“I read a lot about the Holocaust but I never, ever, read about Vrba in Israeli textbooks in the Hebrew language,” Linn told Pat Johnson of the Jewish Independent in 2006. “Am I the only Israeli who fell asleep in class when we studied this in the Holocaust? Or maybe we never studied it.”

Vrba was one of only five Jews who escaped Auschwitz-Birkenau. Many years later, his accomplishments earned high praise from Twigg.

“The most significant author of British Columbia is not Pauline Johnson, Douglas Coupland, William Gibson, David Suzuki or Alice Munro,” Twigg wrote on the BC Bookworld website. ”It’s Prisoner #44070, aka Rudolf Vrba, one of the most significant chroniclers of the Holocaust.”

Follow Charlie Smith on Twitter @charliesmithvcr. Follow Pancouver on Twitter @PancouverMedia.

The post B.C. author Alan Twigg commemorates longtime Vancouver resident Rudolf Vrba—the greatest whistleblower of the 20th century appeared first on Pancouver.