Shared posts

28 Apr 20:03

Mark Jason Dominus: Well, I guess I believe everything now!

by mjd@plover.com (Mark Dominus)

The principle of explosion is that in an inconsistent system everything is provable: if you prove both and not- for any , you can then conclude for any :

$$(P \land \lnot P) \to Q.$$

This is, to put it briefly, not intuitive. But it is awfully hard to get rid of because it appears to follow immediately from two principles that are intuitive:

  1. If we can prove that is true, then we can prove that at least one of or is true. (In symbols, .)

  2. If we can prove that at least one of or is true, and we can prove that is false, then we may conclude that that is true. (Symbolically, .).

Then suppose that we have proved that is both true and false. Since we have proved true, we have proved that at least one of or is true. But because we have also proved that is false, we may conclude that is true. Q.E.D.

This proof is as simple as can be. If you want to get rid of this, you have a hard road ahead of you. You have to follow Graham Priest into the wilderness of paraconsistent logic.

Raymond Smullyan observes that although logic is supposed to model ordinary reasoning, it really falls down here. Nobody, on discovering the fact that they hold contradictory beliefs, or even a false one, concludes that therefore they must believe everything. In fact, says Smullyan, almost everyone does hold contradictory beliefs. His argument goes like this:

  1. Consider all the things I believe individually, . I believe each of these, considered separately, is true.

  2. However, I also believe that I'm not infallible, and that at least one of is false, although I don't know which ones.

  3. Therefore I believe both (because I believe each of the separately) and (because I believe that not all the are true).

And therefore, by the principle of explosion, I ought to believe that I believe absolutely everything.

Well anyway, none of that was exactly what I planned to write about. I was pleased because I noticed a very simple, specific example of something I believed that was clearly inconsistent. Today I learned that K2, the second-highest mountain in the world, is in Asia, near the border of Pakistan and westernmost China. I was surprised by this, because I had thought that K2 was in Kenya somewhere.

But I also knew that the highest mountain in Africa was Kilimanjaro. So my simultaneous beliefs were flatly contradictory:

  1. K2 is the second-highest mountain in the world.
  2. Kilimanjaro is not the highest mountain in the world, but it is the highest mountain in Africa
  3. K2 is in Africa

Well, I guess until this morning I must have believed everything!

28 Apr 19:42

The Probability of the Law of Excluded Middle

by John Baez

The Law of Excluded Middle says that for any statement P, “P or not P” is true.

Is this law true? In classical logic it is. But in intuitionistic logic it’s not.

So, in intuitionistic logic we can ask what’s the probability that a randomly chosen statement obeys the Law of Excluded Middle. And the answer is “at most 2/3—or else your logic is classical”.

This is a very nice new result by Benjamin Bumpus and Zoltan Kocsis:

• Benjamin Bumpus, Degree of classicality, Merlin’s Notebook, 27 February 2024.

Of course they had to make this more precise before proving it. Just as classical logic is described by Boolean algebras, intuitionistic logic is described by something a bit more general: Heyting algebras. They proved that in a finite Heyting algebra, if more than 2/3 of the statements obey the Law of Excluded Middle, then it must be a Boolean algebra!

Interestingly, nothing like this is true for “not not P implies P”. They showed this can hold for an arbitrarily high fraction of statements in a Heyting algebra that is still not Boolean.

Here’s a piece of the free Heyting algebra on one generator, which some call the Rieger–Nishimura lattice:

Taking the principle of excluded middle from the mathematician would be the same, say, as proscribing the telescope to the astronomer or to the boxer the use of his fists. — David Hilbert

I disagree with this statement, but boy, Hilbert sure could write!

25 Apr 16:09

Topological Learning in Multi-Class Data Sets. (arXiv:2301.09734v2 [cs.LG] UPDATED)

by Christopher Griffin, Trevor Karn, Benjamin Apple

We specialize techniques from topological data analysis to the problem of characterizing the topological complexity (as defined in the body of the paper) of a multi-class data set. As a by-product, a topological classifier is defined that uses an open sub-covering of the data set. This sub-covering can be used to construct a simplicial complex whose topological features (e.g., Betti numbers) provide information about the classification problem. We use these topological constructs to study the impact of topological complexity on learning in feedforward deep neural networks (DNNs). We hypothesize that topological complexity is negatively correlated with the ability of a fully connected feedforward deep neural network to learn to classify data correctly. We evaluate our topological classification algorithm on multiple constructed and open source data sets. We also validate our hypothesis regarding the relationship between topological complexity and learning in DNN's on multiple data sets.

14 Feb 14:35

Emergence of brain-like mirror-symmetric viewpoint tuning in convolutional neural networks

by Farzmahdi, A., Zarco, W., Freiwald, W., Kriegeskorte, N., Golan, T.
Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g., left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.
05 Feb 06:35

Jacobian-Free Variational Method for Constructing Connecting Orbits in Nonlinear Dynamical Systems. (arXiv:2301.11704v1 [nlin.CD])

by Omid Ashtari, Tobias M. Schneider

In a dynamical systems description of spatiotemporally chaotic PDEs including those describing turbulence, chaos is viewed as a trajectory evolving within a network of non-chaotic, dynamically unstable, time-invariant solutions embedded in the chaotic attractor of the system. While equilibria, periodic orbits and invariant tori can be constructed using existing methods, computations of heteroclinic and homoclinic connections mediating the evolution between the former invariant solutions remain challenging. We propose a robust matrix-free variational method for computing connecting orbits between equilibrium solutions of a dynamical system that can be applied to high-dimensional problems. Instead of a common shooting-based approach, we define a minimization problem in the space of smooth state space curves that connect the two equilibria with a cost function measuring the deviation of a connecting curve from an integral curve of the vector field. Minimization deforms a trial curve until, at a global minimum, a connecting orbit is obtained. The method is robust, has no limitation on the dimension of the unstable manifold at the origin equilibrium, and does not suffer from exponential error amplification associated with time-marching a chaotic system. Owing to adjoint-based minimization techniques, no Jacobian matrices need to be constructed and the memory requirement scales linearly with the size of the problem. The robustness of the method is demonstrated for the one-dimensional Kuramoto-Sivashinsky equation.

31 Jan 10:12

wangle

Merriam-Webster's Word of the Day for January 27, 2023 is:

wangle • \WANG-gul\  • verb

Wangle means “to get (something) by trickery or persuasion.” It can also mean “to adjust or manipulate for personal or fraudulent ends.”

// He managed to wangle his way into the party.

// They wangled me into pleading guilty.

See the entry >

Examples:

“Discussions of how to wangle free shipping or discounts dovetailed with a proposition that the group start a fund-raiser for a family in need—a worthy use for money saved.” — Hannah Goldfield, The New Yorker, 27 Mar. 2021

Did you know?

You may have noticed a striking resemblance between wangle and wrangle, both of which have a sense meaning “to obtain or finagle.” But the two do not share a common history: wrangle is centuries older than wangle, and despite their overlap in both meaning and appearance, wangle is believed to have evolved separately by way of waggle, meaning “to move from side to side.” (Wrangle, by contrast, comes from the Old High German word ringan, meaning “to struggle.”) It’s possible, though, that wangle saved the “obtain” sense of wrangle from the brink of obsolescence—until recent decades, this usage had all but disappeared, and its revival may very well have been influenced by wangle. We wish we could wangle conclusive evidence to support this theory, but alas!



31 Jan 10:06

CRISPR voles can’t detect ‘love hormone’ oxytocin — but still mate for life

by Heidi Ledford

Nature, Published online: 27 January 2023; doi:10.1038/d41586-023-00197-9

Prairie voles lacking oxytocin receptors bonded with mates and cared for pups.
30 Jan 14:42

Anteromedial Thalamus Gates the Selection & Stabilization of Long-Term Memories

by Toader, A. C., Regalado, J. M., Li, Y. R., Terceros, A., Yadav, N., Kumar, S., Satow, S., Hollunder, F., Bonito-Oliva, A., Rajasethupathy, P.
Memories initially formed in hippocampus gradually stabilize to cortex, over weeks-to-months, for long-term storage. The mechanistic details of this brain re-organization process remain poorly understood. In this study, we developed a virtual-reality based behavioral task and observed neural activity patterns associated with memory reorganization and stabilization over weeks-long timescales. Initial photometry recordings in circuits that link hippocampus and cortex revealed a unique and prominent neural correlate of memory in anterior thalamus that emerged in training and persisted for several weeks. Inhibition of the anteromedial thalamus-to-anterior cingulate cortex projections during training resulted in substantial memory consolidation deficits, and gain amplification more strikingly, was sufficient to enhance consolidation of otherwise unconsolidated memories. To provide mechanistic insights, we developed a new behavioral task where mice form two memories, of which only the more salient memory is consolidated, and also a technology for simultaneous and longitudinal cellular resolution imaging of hippocampus, thalamus, and cortex throughout the consolidation window. We found that whereas hippocampus equally encodes multiple memories, the anteromedial thalamus forms preferential tuning to salient memories, and establishes inter-regional correlations with cortex, that are critical for synchronizing and stabilizing cortical representations at remote time. Indeed, inhibition of this thalamo-cortical circuit while imaging in cortex reveals loss of contextual tuning and ensemble synchrony in anterior cingulate, together with behavioral deficits in remote memory retrieval. We thus identify a thalamo-cortical circuit that gates memory consolidation and propose a mechanism suitable for the selection and stabilization of hippocampal memories into longer term cortical storage.
27 Jan 14:35

Unsupervised Data-Driven Classification of Topological Gapped Systems with Symmetries

by Yang Long and Baile Zhang

Author(s): Yang Long and Baile Zhang

An unsupervised learning approach leads to the classification of topological gapped systems without a priori knowledge of topological invariants.


[Phys. Rev. Lett. 130, 036601] Published Wed Jan 18, 2023

27 Jan 14:16

Wow Something Rotten In The New York FBI Office

by noreply@blogger.com (Atrios)
Just a total shocker that only every journalist knows but won't explain.
Federal prosecutors say the former head of counterintelligence for the FBI’s New York office laundered money, violated sanctions against Russia while working with a Russian oligarch and while still at the FBI took hundreds of thousands of dollars from a foreign national and former foreign intelligence official.
With two hands tied behind my back, I could not solve this mystery!

...ahaha a direct connection.
27 Jan 14:07

Neurophysiological signatures of cortical micro-architecture

by Shafiei, G., Fulcher, B. D., Voytek, B., Satterthwaite, T. D., Baillet, S., Misic, B.
Systematic spatial variation in micro-architecture is observed across the cortex. These micro-architectural gradients are reflected in neural activity, which can be captured by neurophysiological time-series. How spontaneous neurophysiological dynamics are organized across the cortex and how they arise from heterogeneous cortical micro-architecture remains unknown. Here we extensively profile regional neurophysiological dynamics across the human brain by estimating over 6,800 time-series features from the resting state magnetoencephalography (MEG) signal. We then map regional time-series profiles to a comprehensive multi-modal, multi-scale atlas of cortical micro-architecture, including microstructure, metabolism, neurotransmitter receptors, cell types and laminar differentiation. We find that the dominant axis of neurophysiological dynamics reflects characteristics of power spectrum density and linear correlation structure of the signal, emphasizing the importance of conventional features of electromagnetic dynamics while identifying additional informative features that have traditionally received less attention. Moreover, spatial variation in neurophysiological dynamics is co-localized with multiple micro-architectural features, including genomic gradients, intracortical myelin, neurotransmitter receptors and transporters, and oxygen and glucose metabolism. Collectively, this work opens new avenues for studying the anatomical basis of neural activity.
18 Jan 15:10

Academic precarity and the single PI lab model

by romain
Nosimpler

I've probably shared this before, but it's good.

Brilliant young scientists are struggling to obtain a stable faculty position, all over the world. It seems that “publish or perish” was actually quite hopeful. Now clearly, at least in biology, it is more like “publish in Science, Nature or Cell every other year or perish”. Only a small proportion of PhD holders manage to obtain a stable academic position, and only at an advanced age after multiple postdocs. Of course, this competition for publishing in certain venues also has a great impact on science; encouraging dishonesty and discouraging both long-term creative work and solid incremental science. Everyone complains about the situation.

What should we do about it? What I hear most frequently is that governments should increase the budget and create more faculty positions. That is certainly necessary but I think it is a reductionist view that largely misses the point. Of course, at the time when you start hiring more faculty, the proportion of young scientists who get a faculty position increases. However, if each of them then opens their lab and hire dozens of postdocs, then this proportion quickly reverts to what it was before.

What is at stakes is the general organization of research, in particular the “X lab” model (e.g. the Brette lab), with one group leader (the “PI”) surrounded by a number of graduate students and postdocs (I will discuss only the research staff here), with a complete turnover every few years. It seems that in many countries, to get a faculty position means to start their “own” lab. This is not the case yet in France, but this lab model is spreading very, very fast. With the new law on research currently in discussion (“discussion” might not be the appropriate word, though), it is planned that about 25% of all new recruitments will follow this model (a tenure-track system).

The math is easy. In a stable world, each faculty member will train on average one student to become a faculty member. For example, if a typical lab consists of 1 PI with 3 graduate students, rotating every 4 years, then over 40 years the PI will have trained 30 students, one of which would become a PI. The “success rate” would therefore be 1/30. Even with just one student at any given time, the chance for a student to end up getting a faculty position is 1/10.

Of course, one does not necessarily pursue a PhD with the goal of obtaining a faculty position. It is completely respectable to do a PhD then go to the industry. In many countries, holding a PhD is an asset. It is generally not the case in France, though. One may also want to do a PhD not for career, but because it is interesting in itself. This seems perfectly valid. Note that in that case, implementing a subtask of the PI’s project and doing all the tedious bench work might not be ideal. In any case, it must be emphasized that in this lab model, training students for research is only a marginal aim of a PhD.

How about postdocs? A postdoc is not a diploma. It typically doesn’t improve employability much. Of course, it could be done just for its own interest. But the experience I hear is mostly that of a highly stressful situation, because many if not most postdocs are hoping to secure a stable faculty position. Let us do the math again, with a simplified example. Suppose each lab has just 1 postdoc, rotating every 4 years. Compared to the above situation, it means that 1 out of 3 graduate students go on to do a postdoc. Then each of these postdocs has a 10% chance of getting a faculty position.

Let us have a look at funding questions now. What seems very appreciated is that when you start a lab, you get a “start-up package”. There is a blog post on Naturejobs entitled “The faculty series: Top 10 tips on negotiating start-up packages” that describes it. We can read for example: “There’s no point having equipment if you don’t have any hands to use it. One of the largest costs you can expect to come out of your start-up fund are the salaries of PhD students and postdocs. They’re the most crucial components of the lab for almost all researchers.”. It is very nice to provide the PI with these “components of the lab”, but as argued above, a direct consequence is to organize academic precarity on a massive scale. This remains true even if the entire budget of the State is allocated to research.

The same goes for the rest of the funding system. Project-based funding is conceived so that you hire people to implement your project, which you supervise. Part of these people are students and postdocs. For example, an ERC Starting Grant is 1.5 million euros for 5 years, or 300 k€ per year. In France, a PhD student costs about 30 k€ / year and a postdoc about the double. Of course, to that must be added the indirect costs (25%) and the grant also covers equipment and your own salary. But this is generally sufficient to hire a few students and postdocs, especially as in many countries graduate students are funded by other sources. Then the budget goes up to 2 million € for the consolidator grant and 2.5 million € for the advanced grant. The ERC has become a sort of model for good funding schemes in Europe, because it is so generous. But is it? Certainly it is for the PI who receives the grant, but a world where this mode of funding is generalized is a world where research is done by a vanishingly small proportion of permanent researchers. It is a world that is extremely cruel to young scientists, and with a very worrying demographic structure, most of the work being done by an army of young people with high turnover. You might increase the ERC budget several fold because it is such a great scheme, it will not improve this situation, at all.

Ending academic precarity is a noble cause, but one has to realize that it is inconsistent with the one PI - one lab model, as well as with project-based funding. I want to add a couple of remarks. Precarity is obviously bad for the people who experience it, but it is also bad more generally for the academic system. The excessive competition it generates encourages bad practices, and discourages long-term creative work and solid incremental science. We must also look beyond research per se. The role of academia in society is not just to produce new science. It is also to teach and to provide public expertise. We need to have some people with a deep understanding of epidemiology that we can turn to for advice when necessary. You would not just hire a bunch of graduate students after a competitive call for projects to do this advising job when a new virus emerges. But with a pyramidal organization, a comparatively low proportion of the budget is spent on sustaining the most experienced persons, so for the same budget, you would have much lower expertise than in an organization with more normal demographics. This is incredibly wasteful.

What is the alternative? Well, first of all, research has not always been organized in this way, with one PI surrounded by an army of students and postdocs. The landmark series of 4 papers by Hodgkin and Huxley in 1952 on the ionic basis of neural excitability did not come out of the "Hodgkin lab"; they came out from “the Physiological Laboratory, University of Cambridge”. The Hubel and Wiesel papers on the visual cortex were not done by graduate student Hubel under the supervision of professor Wiesel. Two scientists of the same generation decided to collaborate together, and as far as I know none of their landmark papers from the 1960s involved any student or postdoc. What strikes me is that these two experienced scientists apparently had the time to do the experiments themselves (all the experiments), well after they got a stable faculty position (in 1959). How many PIs can actually do that today, instead of supervising, hiring, writing grants and filling reports? It is quite revealing to read again the recent blog post cited above: “There’s no point having equipment if you don’t have any hands to use it.” - as if using it yourself was not even conceivable.

In France, the 1 PI - 1 lab kind of organization has been taking on gradually over the last 20 years, with a decisive step presumably coming this year with the introduction of a large proportion of tenure tracks with “start-up packages”. This move has been accompanied by a progressive shift from base funding to project-based funding, and a steady increase in the age of faculty recruitment. This is not to say that the situation was great 20 years ago, but it is clearly worsening.

A sustainable, non-pyramidal model is one in which a researcher would typically train no more than a few students over her entire career. It means that research work is done by collaboration between peers, rather than by hiring (and training) less experienced people to do the work. It means that research is not generically funded on projects led by a single individual acting as a manager. In fact, a model where most of the working force is already employed should have much less use of “projects”. A few people can just decide to join forces and work together, just as Hubel and Wiesel did. Of course, some research ideas might need expenses beyond the usual (e.g. equipment), and so there is a case for project-based funding schemes to cover for these expenses. But it is not the generic case.

One of the fantasies of competitive project-based funding is that it would supposedly increase research quality by selecting the best projects. But how does it work? Basically, peers read the project and decide whether they think it is good. Free association is exactly that, except the peers in question 1) are real experts, 2) commit to actually do some work on the project and possibly to bring some of their own resources. Without the bureaucracy. Peer reviewing of projects is an unnecessary and poor substitute for what goes on in free collaboration - do I think this idea is exciting enough to devote some of my own time (and possibly budget) on it?

In conclusion, the problem of academic precarity, of the unhealthy pressure put on postdocs in modern academia, is not primarily a budget problem. At least it is not just that. It is a direct consequence of an insane organization of research, based on general managerial principles that are totally orthogonal to what research is about (and beyond: teaching, public expertise). This is what needs to be challenged.

Update:

11 Jan 20:34

Estimation of animal location from grid cell populationactivity using persistent cohomology

by Kawahara, D., Fujisawa, S.
Many cognitive functions are represented as cell assemblies. For example, the population activity of place cells in the hippocampus and grid cells in the entorhinal cortex represent self-location in the environment. The brain cannot directly observe self-location information in the environment. Instead, it relies on sensory information and memory to estimate self-location. Therefore, estimating low-dimensional dynamics, such as the movement trajectory of an animal exploring its environment, from only the high-dimensional neural activity is important in deciphering the information represented in the brain. Most previous studies have estimated the low-dimensional dynamics behind neural activity by unsupervised learning with dimensionality reduction using artificial neural networks or Gaussian processes. This paper shows theoretically and experimentally that these previous research approaches fail to estimate well when the nonlinearity between high-dimensional neural activity and low-dimensional dynamics becomes strong. We estimate the animal's position in 2-D and 3-D space from the activity of grid cells using an unsupervised method based on persistent cohomology. The method using persistent cohomology estimates low-dimensional dynamics from the phases of manifolds created by neural activity. Much cognitive information, including self-location information, is expressed in the phases of the manifolds created by neural activity. The persistent cohomology may be useful for estimating these cognitive functions from neural population activity in an unsupervised manner.
11 Jan 18:57

Avoiding small denominator problems by means of the homotopy analysis method. (arXiv:2208.04136v2 [physics.class-ph] UPDATED)

by Shijun Liao

The so-called ``small denominator problem'' was a fundamental problem of dynamics, as pointed out by Poincar\'{e}. Small denominators appear most commonly in perturbative theory. The Duffing equation is the simplest example of a non-integrable system exhibiting all problems due to small denominators. In this paper, using the forced Duffing equation as an example, we illustrate that the famous ``small denominator problems'' never appear if a non-perturbative approach based on the homotopy analysis method (HAM), namely ``the method of directly defining inverse mapping'' (MDDiM), is used. The HAM-based MDDiM provides us great freedom to directly define the inverse operator of an undetermined linear operator so that all small denominators can be completely avoided and besides the convergent series of multiple limit-cycles of the forced Duffing equation with high nonlinearity are successfully obtained. So, from the viewpoint of the HAM, the famous ``small denominator problems'' are only artifacts of perturbation methods. Therefore, completely abandoning perturbation methods but using the HAM-based MDDiM, one would be never troubled by ``small denominators''. The HAM-based MDDiM has general meanings in mathematics and thus can be used to attack many open problems related to the so-called ``small denominators''.

11 Jan 18:56

Functional observability and subspace reconstruction in nonlinear systems. (arXiv:2301.04108v1 [nlin.CD])

by Arthur N. Montanari, Leandro Freitas, Daniele Proverbio, Jorge Gonçalves

Time-series analysis is fundamental for modeling and predicting dynamical behaviors from time-ordered data, with applications in many disciplines such as physics, biology, finance, and engineering. Measured time-series data, however, are often low dimensional or even univariate, thus requiring embedding methods to reconstruct the original system's state space. The observability of a system establishes fundamental conditions under which such reconstruction is possible. However, complete observability is too restrictive in applications where reconstructing the entire state space is not necessary and only a specific subspace is relevant. Here, we establish the theoretic condition to reconstruct a nonlinear functional of state variables from measurement processes, generalizing the concept of functional observability to nonlinear systems. When the functional observability condition holds, we show how to construct a map from the embedding space to the desired functional of state variables, characterizing the quality of such reconstruction. The theoretical results are then illustrated numerically using chaotic systems with contrasting observability properties. By exploring the presence of functionally unobservable regions in embedded attractors, we also apply our theory for the early warning of seizure-like events in simulated and empirical data. The studies demonstrate that the proposed functional observability condition can be assessed a priori to guide time-series analysis and experimental design for the dynamical characterization of complex systems.

10 Jan 23:20

Neural characterization of the "totonou" state associated with sauna use

by Chang, M., Ibaraki, T., Naruse, Y., Imamura, Y.
Saunas are becoming increasingly popular worldwide, being an activity that promotes relaxation and health. Intense feelings of happiness have been reported shortly after enjoying a hot sauna and cold water, what is known in Japan as the "totonou" state. However, no research has investigated what occurs in the brain during the "totonou" state. In the present study, participants underwent a sauna phase, consisting of three sets of alternating hot sauna, cold water, and rest. We elucidated changes in brain activity and mood in the "totonou" state by measuring and comparing brain activity and emotional scales before and after the sauna phase and during the rest phase in each set. We found significant increases in theta and alpha power during rest and after the sauna phase compared to before the sauna phase. Moreover, in an auditory oddball task, the p300 amplitude decreased significantly and MMN amplitude increased significantly after the sauna phase. The increase in MMN indicates higher activation of the pre-attentional auditory process, leading to a decrease in attention-related brain activity P300. Hence, the brain reaches in a more efficient state. Further, the response time in behavioral tasks decreased significantly. In addition, the participants' subjective responses to the questionnaire showed significant changes in physical relaxation and other indicators after being in the sauna. Finally, we developed an artificial intelligence classifier, obtaining an average accuracy of brain state classification of 88.34%. The results have potential for future application.
10 Jan 23:17

If it's real, could it be an eel?

by Foxon, F.
Nosimpler

hahahaha

Previous studies have estimated the size, mass, and population of hypothetical unknown animals in a large, oligrotrophic freshwater loch in Scotland based on biomass and other observational considerations. The 'eel hypothesis' proposes that the anthrozoological phenomenon at Loch Ness can be explained in part by observations of large specimens of European eel (Anguilla anguilla), as these animals are most compatible with morphological, behavioural, and environmental considerations. The present study expands upon the 'eel hypothesis' and related literature by estimating the probability of observing eels at least as large as have been proposed, using catch data from Loch Ness and other freshwater bodies in Europe. Skew normal distributions were fitted to eel body length distributions in order to estimate cumulative distribution functions from which probabilities were obtained. The chances of finding a large eel in Loch Ness are around 1 in 50,000 for a 1-meter specimen, which is reasonable given the loch's fish stock and suggests some sightings of smaller 'unknown' animals may be accounted for by large eels. However, the probability of finding a specimen upwards of 6 meters is essentially zero, therefore eels probably do not account for 'sightings' of larger animals. The existence of exceedingly large eels in the loch is not likely based on purely statistical considerations.
10 Jan 23:14

Information integration during bioelectric regulation of morphogenesis in the embryonic frog brain

by Manicka, S., Pai, V. P., Levin, M.
Spatiotemporal bioelectric states regulate multiple aspects of embryogenesis. A key open question concerns how specific multicellular voltage potential distributions differentially activate distinct downstream genes required for organogenesis. To understand the information processing mechanisms underlying the relationship between spatial bioelectric patterns, genetics, and morphology, we focused on a specific spatiotemporal bioelectric pattern in the Xenopus ectoderm that regulates embryonic brain patterning. We used machine learning to design a minimal but scalable bioelectric-genetic dynamical network model of embryonic brain morphogenesis that qualitatively recapitulated previous experimental observations. A causal integration analysis of the model revealed a simple higher-order spatiotemporal information integration mechanism relating the spatial bioelectric and gene expression patterns. Specific aspects of this mechanism include causal apportioning (certain cell positions are more important for collective decision making), informational asymmetry (depolarized cells are more influential than hyperpolarized cells), long distance influence (genes in a cell are variably sensitive to voltage of faraway cells), and division of labor (different genes are sensitive to different aspects of voltage pattern). The asymmetric information-processing character of the mechanism led the model to predict an unexpected degree of plasticity and robustness in the bioelectric prepattern that regulates normal embryonic brain development. Our in vivo experiments verified these predictions via molecular manipulations in Xenopus embryos. This work shows the power of using a minimal in silico approach to drastically reduce the parameter space in vivo, making hard biological questions tractable. These results provide insight into the collective decision-making process of cells in interpreting bioelectric pattens that guide large-scale morphogenesis, suggesting novel applications for biomedical interventions and new tools for synthetic bioengineering.
10 Jan 23:12

Dopamine and norepinephrine differentially mediate the exploration-exploitation tradeoff

by Chen, C. S., Mueller, D., Knep, E., Ebitz, R. B., Grissom, N. M.
The catecholamines dopamine (DA) and norepinephrine (NE) have been repeatedly implicated in neuropsychiatric vulnerability, in part via their roles in mediating the decision making processes. Although the two neuromodulators share a synthesis pathway and are co-activated under states of arousal, they engage in distinct circuits and roles in modulating neural activity across the brain. However, in the computational neuroscience literature, they have been assigned similar roles in modulating the latent cognitive processes of decision making, in particular the exploration-exploitation tradeoff. Revealing how each neuromodulator contributes to this explore-exploit process will be important in guiding mechanistic hypotheses emerging from computational psychiatric approaches. To understand the differences and overlaps of the roles of these two catecholamine systems in regulating exploration and exploitation, a direct comparison using the same dynamic decision making task is needed. Here, we ran mice in a restless two-armed bandit task, which encourages both exploration and exploitation. We systemically administered a nonselective DA receptor antagonist (flupenthixol), a nonselective DA receptor agonist (apomorphine), a NE beta-receptor antagonist (propranolol), and a NE beta-receptor agonist (isoproterenol), and examined changes in exploration within subjects across sessions. We found a bidirectional modulatory effect of dopamine receptor activity on the level of exploration. Increasing dopamine activity decreased exploration and decreasing dopamine activity increased exploration. Beta-noradrenergic receptor activity also modulated exploration, but the modulatory effect was mediated by sex. Reinforcement learning model parameters suggested that dopamine modulation affected exploration via decision noise and norepinephrine modulation affected exploration via outcome sensitivity. Together, these findings suggested that the mechanisms that govern the transition between exploration and exploitation are sensitive to changes in both catecholamine functions and revealed differential roles for NE and DA in mediating exploration.
10 Jan 00:57

Pierre Schapira on Récoltes et Semailles

by woit

Earlier this year I bought a copy of the recently published version of Grothendieck’s Récoltes et Semailles, and spent quite a lot of time reading it. I wrote a bit about it here, intended to write something much longer when I finished reading, but I’ve given up on that idea. At some point this past fall I stopped reading, having made it through all but 100 pages or so of the roughly 1900 total. I planned to pick it up again and finish, but haven’t managed to bring myself to do that, largely because getting to the end would mean I should write something, and the task of doing justice to this text looks far too difficult.

Récoltes et Semailles is a unique and amazing document, some of the things in it are fantastic and wonderful. Quoting myself from earlier this year

there are many beautifully written sections, capturing Grothendieck’s feeling for the beauty of the deepest ideas in mathematics. One gets to see what it looked like from the inside to a genius as he worked, often together with others, on a project that revolutionized how we think about mathematics.

A huge problem with the book is the way it was written, providing a convincing advertisement for word processors. Grothendieck seems to have not significantly edited the manuscript. When he thought of something relevant to what he had written previously, instead of editing that, he would just type away and add more material. Unclear how this could ever happen, but it would be a great service to humanity to have a competent editor put to work doing a huge rewrite of the text.

The other problem though is even more serious. The text provides deep personal insight into Grothendieck’s thinking, which is simultaneously fascinating and discouraging. His isolation and decision to concentrate on “meditation” about himself left him semi-paranoid and without anyone to engage with and help channel his remarkable intellect. It’s frustrating to read hundreds of pages about motives which consist of some tantalizing explanations of these deep mathematical ideas, embedded in endless complaints that Deligne and others didn’t properly understand and develop these ideas (or properly credit him). One keeps thinking: instead of going on like this, why didn’t he just do what he said he had planned earlier, write out an explanation of these ideas?

As an excuse for giving up on writing more myself about this, I can instead recommend Pierre Schapira’s new article at Inference, entitled A Truncated Manuscript. Schapira provides an excellent review of the book, and also explains a major problem with it. Grothendieck devotes endless pages to complaints that Zoghman Mebkhout did not get sufficient recognition for his work on the so-called Riemann-Hilbert correspondence for perverse sheaves. Mebkhout was Schapira’s student, and he explains that a correct version of the story has the ideas involved originating with Kashiwara, who was the one who should have gotten more recognition, not Mebhkout. According to Schapira, he explained what had really happened to Grothendieck, who wrote an extra twenty pages or so correcting mistaken claims in Récoltes et Semailles, but these didn’t make it into the recently published version. If someone ever gets to the project of editing Récoltes et Semailles, a good starting point would be to simply delete all of the material that Grothendieck included on this topic.

The extra pages described are available now here, as part of an extensive website called the Grothendieck Circle, now being updated by Leila Schneps. For a wealth of material concerning Grothendieck’s writings, see this site run by Mateo Carmona. It includes a transcription of Récoltes et Semailles that provides an alternative to the recently published version.

The Schapira article is a good example of some of the excellent pieces that the people at Inference have published since they started nearly ten years ago (another example relevant to Grothendieck would be Pierre Cartier’s A Country Known Only by Name from their first issue). I’ve heard news that they have lost a major part of their funding, which was reportedly from Peter Thiel and was one source of controversy about the magazine. I wrote about this here in early 2019 (also note discussion in the comments). My position then and now is that the concerns people had about the editors and funding of Inference needed to be evaluated in the context of the result, which was an unusual publication putting out some high quality articles about math and physics that would likely not have otherwise gotten written and published. I hope they manage to find alternate sources of funding that allow them to keep putting out the publication.

10 Jan 00:54

Higher-order organization of multivariate time series

by Andrea Santoro

Nature Physics, Published online: 02 January 2023; doi:10.1038/s41567-022-01852-0

Most temporal analyses of multivariate time series rely on pairwise statistics. A study combining network theory and topological data analysis now shows how to characterize the dynamics of signals at all orders of interactions in real-world data.
09 Jan 20:55

Laplacian renormalization group for heterogeneous networks

by Pablo Villegas

Nature Physics, Published online: 09 January 2023; doi:10.1038/s41567-022-01866-8

The renormalization group method is routinely employed in studies of criticality in many areas of physics. A framework based on a field theoretical description of information diffusion now extends this tool to the study of complex networks.
09 Jan 20:51

Integrated intracellular organization and its variations in human iPS cells

by Matheus P. Viana

Nature, Published online: 04 January 2023; doi:10.1038/s41586-022-05563-7

A dataset of 3D images from more than 200,000 human induced pluripotent stem cells is used to develop a framework to analyse cell shape and the location and organization of major intracellular structures.
06 Jan 12:09

Cilia function as calcium-mediated mechanosensors that instruct left-right asymmetry | Science

Applying oscillatory force on cilia in zebrafish embryos reveals that they are mechanosensors shaping cardiac left-right asymmetry.
06 Jan 12:07

A mesothelium divides the subarachnoid space into functional compartments | Science

A fourth meningeal layer acts as a barrier that divides the subarachnoid space into two distinct compartments.
06 Jan 04:03

An approximate line attractor in the hypothalamus encodes an aggressive state

by Aditya Nair, Tomomi Karigo, Bin Yang, Surya Ganguli, Mark J. Schnitzer, Scott W. Linderman, David J. Anderson, Ann Kennedy
Different hypothalamic regions utilize distinct neural dynamics to encode mating and aggressive behaviors.
06 Jan 03:51

Brainstem serotonin neurons selectively gate retinal information flow to thalamus

by Jasmine D.S. Reggiani, Qiufen Jiang, Melanie Barbini, Andrew Lutas, Liang Liang, Jesseba Fernando, Fei Deng, Jinxia Wan, Yulong Li, Chinfei Chen, Mark L. Andermann
Reggiani et al. find that, in awake mouse primary thalamus, serotonin from brainstem inputs suppresses retinal axon bouton presynaptic calcium signals and glutamate release. Different retinal axon classes were more strongly suppressed by serotonin versus by pupil-linked arousal, indicating diverse gating of visual information streams before they activate thalamocortical neurons.
06 Jan 03:35

The millenial generation is NOT becoming more conservative as they age

by Minnesotastan

It has been a truism since forever that young people are liberal/progressive, but they shift to conservative values as they age.  
The pattern has held remarkably firm. By my calculations, members of Britain’s “silent generation”, born between 1928 and 1945, were five percentage points less conservative than the national average at age 35, but around five points more conservative by age 70. The “baby boomer” generation traced the same path, and “Gen X”, born between 1965 and 1980, are now following suit.

Millennials — born between 1981 and 1996 — started out on the same trajectory, but then something changed. The shift has striking implications for the UK’s Conservatives and US Republicans, who can no longer simply rely on their base being replenished as the years pass.
Discussion continues at the Financial Times.  And relevant commentary in this Guardian op-ed.
06 Jan 03:29

My AI Safety Lecture for UT Effective Altruism

by Scott
Nosimpler

I glaze over at the alignment stuff, but the watermarking thing is cool!

Two weeks ago, I gave a lecture setting out my current thoughts on AI safety, halfway through my year at OpenAI. I was asked to speak by UT Austin’s Effective Altruist club. You can watch the lecture on YouTube here (I recommend 2x speed).

The timing turned out to be weird, coming immediately after the worst disaster to hit the Effective Altruist movement in its history, as I acknowledged in the talk. But I plowed ahead anyway, to discuss:

  1. the current state of AI scaling, and why many people (even people who agree about little else!) foresee societal dangers,
  2. the different branches of the AI safety movement,
  3. the major approaches to aligning a powerful AI that people have thought of, and
  4. what projects I specifically have been working on at OpenAI.

I then spent 20 minutes taking questions.

For those who (like me) prefer text over video, below I’ve produced an edited transcript, by starting with YouTube’s automated transcript and then, well, editing it. Enjoy! –SA


Thank you so much for inviting me here. I do feel a little bit sheepish to be lecturing you about AI safety, as someone who’s worked on this subject for all of five months. I’m a quantum computing person. But this past spring, I accepted an extremely interesting opportunity to go on leave for a year to think about what theoretical computer science can do for AI safety. I’m doing this at OpenAI, which is one of the world’s leading AI startups, based in San Francisco although I’m mostly working from Austin.

Despite its name, OpenAI is famously not 100% open … so there are certain topics that I’m not allowed to talk about, like the capabilities of the very latest systems and whether or not they’ll blow people’s minds when released. By contrast, OpenAI is very happy for me to talk about AI safety: what it is and and what if anything can we do about it. So what I thought I’d do is to tell you a little bit about the specific projects that I’ve been working on at OpenAI, but also just, as an admitted newcomer, share some general thoughts about AI safety and how Effective Altruists might want to think about it. I’ll try to leave plenty of time for discussion.

Maybe I should mention that the thoughts that I’ll tell you today are ones that, until last week, I had considered writing up for an essay contest run by something called the FTX Future Fund. Unfortunately, the FTX Future Fund no longer exists. It was founded by someone named Sam Bankman-Fried, whose a net worth went from 15 billion dollars to some negative number of dollars in the space of two days, in one of the biggest financial scandals in memory. This is obviously a calamity for the EA community, which had been counting on funding from this individual. I feel terrible about all the projects left in the lurch, to say nothing of FTX’s customers.

As a tiny silver lining, though, instead of writing up my thoughts for that essay contest, I’ll just share them with you right now, for free!


The Scaling of AI

Let’s start with this: raise your hand if you’ve tried GPT-3. That’s maybe half of you. OK, raise your hand if you’ve tried DALL-E. That’s again maybe half of you.

These are the two best-known products that are made by OpenAI, and as I think most people would agree, two of the most impressive AIs that exist in the world right now. They certainly go far beyond what I would’ve predicted would be possible now, if you’d asked me 10 years ago or even 5 years ago.

And whenever I try to explain them to people, I’m, like, well, you have to see them. No abstract description can substitute in this case.

All right, so here’s what GPT-3 produced when a New Yorker writer asked it to write a poem about cryptocurrency in the style of Philip Larkin, who was a famous 20th-century poet. The subject seems particularly appropriate given current events.

The Invention  (by GPT-3)

Money is a thing you earn by the sweat of your brow
And that’s how it should be.
Or you can steal it, and go to jail;
Or inherit it, and be set for life;
Or win it on the pools, which is luck;
Or marry it, which is what I did.
And that is how it should be, too.
But now this idea’s come up
Of inventing money, just like that.
I ask you, is nothing sacred?

Okay, it won’t always produce something of this quality (incidentally, I don’t think GPT-3 actually “married money”!). Often you’ve got to run it several times and take the best output—much like human poets presumably do, throwing crumpled pages into the basket. But I submit that, if the above hadn’t been labeled as coming from GPT, you’d be like, yeah, that’s the kind of poetry the New Yorker publishes, right? This is a thing that AI can now do.

So what is GPT? It’s a text model. It’s basically a gigantic neural network with about 175 billion parameters—the weights. It’s a particular kind of neural net called a transformer model that was invented five years ago. It’s been trained on a large fraction of all the text on the open Internet. The training simply consists of playing the following game over and over, trillions of times: predict which word comes next in this text string. So in some sense that’s its only goal or intention in the world: to predict the next word.

The amazing discovery is that, when you do that, you end up with something where you can then ask it a question, or give it a a task like writing an essay about a certain topic, and it will say “oh! I know what would plausibly come after that prompt! The answer to the question! Or the essay itself!” And it will then proceed to generate the thing you want.

GPT can solve high-school-level math problems that are given to it in English. It can reason you through the steps of the answer. It’s starting to be able to do nontrivial math competition problems. It’s on track to master basically the whole high school curriculum, maybe followed soon by the whole undergraduate curriculum.

If you turned in GPT’s essays, I think they’d get at least a B in most courses. Not that I endorse any of you doing that!! We’ll come back to that later. But yes, we are about to enter a world where students everywhere will at least be sorely tempted to use text models to write their term papers. That’s just a tiny example of the societal issues that these things are going to raise.

Speaking personally, the last time I had a similar feeling was when I was an adolescent in 1993 and I saw this niche new thing called the World Wide Web, and I was like “why isn’t everyone using this? why isn’t it changing the world?” The answer, of course, was that within a couple years it would.

Today, I feel like the world was understandably preoccupied by the pandemic, and by everything else that’s been happening, but these past few years might actually be remembered as the time when AI underwent this step change. I didn’t predict it. I think even many computer scientists might still be in denial about what’s now possible, or what’s happened. But I’m now thinking about it even in terms of my two kids, of what kinds of careers are going to be available when they’re older and entering the job market. For example, I would probably not urge my kids to go into commercial drawing!

Speaking of which, OpenAI’s other main product is DALL-E2, an image model. Probably most of you have already seen it, but you can ask it—for example, just this morning I asked it, show me some digital art of two cats playing basketball in outer space. That’s not a problem for it.

You may have seen that there’s a different image model called Midjourney which won an art contest with this piece:

It seems like the judges didn’t completely understand, when this was submitted as “digital art,” what exactly that meant—that the human role was mostly limited to entering a prompt! But the judges then said that even having understood it, they still would’ve given the award to this piece. I mean, it’s a striking piece, isn’t it? But of course it raises the question of how much work there’s going to be for contract artists, when you have entities like this.

There are already companies that are using GPT to write ad copy. It’s already being used at the, let’s call it, lower end of the book market. For any kind of formulaic genre fiction, you can say, “just give me a few paragraphs of description of this kind of scene,” and it can do that. As it improves you could you can imagine that it will be used more.

Likewise, DALL-E and other image models have already changed the way that people generate art online. And it’s only been a few months since these models were released! That’s a striking thing about this era, that a few months can be an eternity. So when we’re thinking about the impacts of these things, we have to try to take what’s happened in the last few months or years and project that five years forward or ten years forward.

This brings me to the obvious question: what happens as you continue scaling further? I mean, these spectacular successes of deep learning over the past decade have owed something to new ideas—ideas like transformer models, which I mentioned before, and others—but famously, they have owed maybe more than anything else to sheer scale.

Neural networks, backpropagation—which is how you train the neural networks—these are ideas that have been around for decades. When I studied CS in the 90s, they were already extremely well-known. But it was also well-known that they didn’t work all that well! They only worked somewhat. And usually, when you take something that doesn’t work and multiply it by a million, you just get a million times something that doesn’t work, right?

I remember at the time, Ray Kurzweil, the futurist, would keep showing these graphs that look like this:

So, he would plot Moore’s Law, the increase in transistor density, or in this case the number of floating-point operations that you can do per second for a given cost. And he’d point out that it’s on this clear exponential trajectory.

And he’d then try to compare that to some crude estimates of the number of computational operations that are done in the brain of a mosquito or a mouse or a human or all the humans on Earth. And oh! We see that in a matter of a couple decades, like by the year 2020 or 2025 or so, we’re going to start passing the human brain’s computing power and then we’re going to keep going beyond that. And so, Kurzweil would continue, we should assume that scale will just kind of magically make AI work. You know, that once you have enough computing cycles, you just sprinkle them around like pixie dust, and suddenly human-level intelligence will just emerge out of the billions of connections.

I remember thinking: that sounds like the stupidest thesis I’ve ever heard. Right? Like, he has absolutely no reason to believe such a thing is true or have any confidence in it. Who the hell knows what will happen? We might be missing crucial insights that are needed to make AI work.

Well, here we are, and it turns out he was way more right than most of us expected.

As you all know, a central virtue of Effective Altruists is updating based on evidence. I think that we’re forced to do that in this case.

To be sure, it’s still unclear how much further you’ll get just from pure scaling. That remains a central open question. And there are still prominent skeptics.

Some skeptics take the position that this is clearly going to hit some kind of wall before it gets to true human-level understanding of the real world. They say that text models like GPT are really just “stochastic parrots” that regurgitate their training data. That despite creating a remarkable illusion otherwise, they don’t really have any original thoughts.

The proponents of that view sometimes like to gleefully point out examples where GPT will flub some commonsense question. If you look for such examples, you can certainly find them! One of my favorites recently was, “which would win in a race, a four-legged zebra or a two-legged cheetah?” GPT-3, it turns out, is very confident that the cheetah will win. Cheetahs are faster, right?

Okay, but one thing that’s been found empirically is that you take commonsense questions that are flubbed by GPT-2, let’s say, and you try them on GPT-3, and very often now it gets them right. You take the things that the original GPT-3 flubbed, and you try them on the latest public model, which is sometimes called GPT-3.5 (incorporating an advance called InstructGPT), and again it often gets them right. So it’s extremely risky right now to pin your case against AI on these sorts of examples! Very plausibly, just one more order of magnitude of scale is all it’ll take to kick the ball in, and then you’ll have to move the goal again.

A deeper objection is that the amount of training data might be a fundamental bottleneck for these kinds of machine learning systems—and we’re already running out of Internet to to train these models on! Like I said, they’ve already used most of the public text on the Internet. There’s still all of YouTube and TikTok and Instagram that hasn’t yet been fed into the maw, but it’s not clear that that would actually make an AI smarter rather than dumber! So, you can look for more, but it’s not clear that there are orders of magnitude more that humanity has even produced and that’s readily accessible.

On the other hand, it’s also been found empirically that very often, you can do better with the same training data just by spending more compute. You can squeeze the lemon harder and get more and more generalization power from the same training data by doing more gradient descent.

In summary, we don’t know how far this is going to go. But it’s already able to automate various human professions that you might not have predicted would have been automatable by now, and we shouldn’t be confident that many more professions will not become automatable by these kinds of techniques.

Incidentally, there’s a famous irony here. If you had asked anyone in the 60s or 70s, they would have said, well clearly first robots will replace humans for manual labor, and then they’ll replace humans for intellectual things like math and science, and finally they might reach the pinnacles of human creativity like art and poetry and music.

The truth has turned out to be the exact opposite. I don’t think anyone predicted that.

GPT, I think, is already a pretty good poet. DALL-E is already a pretty good artist. They’re still struggling with some high school and college-level math but they’re getting there. It’s easy to imagine that maybe in five years, people like me will be using these things as research assistants—at the very least, to prove the lemmas in our papers. That seems extremely plausible.

What’s been by far the hardest is to get AI that can robustly interact with the physical world. Plumbers, electricians—these might be some of the last jobs to be automated. And famously, self-driving cars have taken a lot longer than many people expected a decade ago. This is partly because of regulatory barriers and public relations: even if a self-driving car actually crashes less than a human does, that’s still not good enough, because when it does crash the circumstances are too weird. So, the AI is actually held to a higher standard. But it’s also partly just that there was a long tail of really weird events. A deer crosses the road, or you have some crazy lighting conditions—such things are really hard to get right, and of course 99% isn’t good enough here.

We can maybe fuzzily see ahead at least a decade or two, to when we have AIs that can at the least help us enormously with scientific research and things like that. Whether or not they’ve totally replaced us—and I selfishly hope not, although I do have tenure so there’s that—why does it stop there? Will these models eventually match or exceed human abilities across basically all domains, or at least all intellectual ones? If they do, what will humans still be good for? What will be our role in the world? And then we come to the question, well, will the robots eventually rise up and decide that whatever objective function they were given, they can maximize it better without us around, that they don’t need us anymore?

This has of course been a trope of many, many science-fiction works. The funny thing is that there are thousands of short stories, novels, movies, that have tried to map out the possibilities for where we’re going, going back at least to Asimov and his Three Laws of Robotics, which was maybe the first AI safety idea, if not earlier than that. The trouble is, we don’t know which science-fiction story will be the one that will have accurately predicted the world that we’re creating. Whichever future we end up in, with hindsight, people will say, this obscure science fiction story from the 1970s called it exactly right, but we don’t know which one yet!


What Is AI Safety?

So, the rapidly-growing field of AI safety. People use different terms, so I want to clarify this a little bit. To an outsider hearing the terms “AI safety,” “AI ethics,” “AI alignment,” they all sound like kind of synonyms, right? It turns out, and this was one of the things I had to learn going into this, that AI ethics and AI alignment are two communities that despise each other. It’s like the People’s Front of Judea versus the Judean People’s Front from Monty Python.

To oversimplify radically, “AI ethics” means that you’re mainly worried about current AIs being racist or things like that—that they’ll recapitulate the biases that are in their training data. This clearly can happen: if you feed GPT a bunch of racist invective, GPT might want to say, in effect, “sure, I’ve seen plenty of text like that on the Internet! I know exactly how that should continue!” And in some sense, it’s doing exactly what it was designed to do, but not what we want it to do. GPT currently has an extensive system of content filters to try to prevent people from using it to generate hate speech, bad medical advice, advocacy of violence, and a bunch of other categories that OpenAI doesn’t want. And likewise for DALL-E: there are many things it “could” draw but won’t, from porn to images of violence to the Prophet Mohammed.

More generally, AI ethics people are worried that machine learning systems will be misused by greedy capitalist enterprises to become even more obscenely rich and things like that.

At the other end of the spectrum, “AI alignment” is where you believe that really the main issue is that AI will become superintelligent and kill everyone, just destroy the world. The usual story here is that someone puts an AI in charge of a paperclip factory, they tell it to figure out how to make as many paperclips as possible, and the AI (being superhumanly intelligent) realizes that it can invent some molecular nanotechnology that will convert the whole solar system into paperclips.

You might say, well then, you just have to tell it not to do that! Okay, but how many other things do you have to remember to tell it not to do? And the alignment people point out that, in a world filled with powerful AIs, it would take just a single person forgetting to tell their AI to avoid some insanely dangerous thing, and then the whole world could be destroyed.

So, you can see how these two communities, AI ethics and AI alignment, might both feel like the other is completely missing the point! On top of that, AI ethics people are almost all on the political left, while AI alignment people are often centrists or libertarians or whatever, so that surely feeds into it as well.

Oay, so where do I fit into this, I suppose, charred battle zone or whatever? While there’s an “orthodox” AI alignment movement that I’ve never entirely subscribed to, I suppose I do now subscribe to a “reform” version of AI alignment:

Most of all, I would like to have a scientific field that’s able to embrace the entire spectrum of worries that you could have about AI, from the most immediate ones about existing AIs to the most speculative future ones, and that most importantly, is able to make legible progress.

As it happens, I became aware of the AI alignment community a long time back, around 2006. Here’s Eliezer Yudkowsky, who’s regarded as the prophet of AI alignment, of the right side of that spectrum that showed before.

He’s been talking about the danger of AI killing everyone for more than 20 years. He wrote the now-famous “Sequences” that many readers of my blog were also reading as they appeared, so he and I bounced back and forth.

But despite interacting with this movement, I always kept it at arm’s length. The heart of my objection was: suppose that I agree that there could come a time when a superintelligent AI decides its goals are best served by killing all humans and taking over the world, and that we’ll be about as powerless to stop it as chimpanzees are to stop us from doing whatever we want to do. Suppose I agree to that. What do you want me to do about it?

As Effective Altruists, you all know that it’s not enough for a problem to be big, the problem also has to be tractable. There has to be a program that lets you make progress on it. I was not convinced that that existed.

My personal experience has been that, in order to make progress in any area of science, you need at least one of two things: either

  1. experiments (or more generally, empirical observations), or
  2. if not that, then a rigorous mathematical theory—like we have in quantum computing for example; even though we don’t yet have the scalable quantum computers, we can still prove theorems about them.

It struck me that the AI alignment field seemed to have neither of these things. But then how does objective reality give you feedback as to when you’ve taken a wrong path? Without such feedback, it seemed to me that there’s a severe risk of falling into cult-like dynamics, where what’s important to work on is just whatever the influential leaders say is important. (A few of my colleagues in physics think that the same thing happened with string theory, but let me not comment on that!)

With AI safety, this is the key thing that I think has changed in the last three years. There now exist systems like GPT-3 and DALL-E. These are not superhuman AIs. I don’t think they themselves are in any danger of destroying the world; they can’t even form the intention to destroy the world, or for that matter any intention beyond “predict the next token” or things like that. They don’t have a persistent identity over time; after you start a new session they’ve completely forgotten whatever you said to them in the last one (although of course such things will change in the near future). And yet nevertheless, despite all these limitations, we can experiment with these systems and learn things about AI safety that are relevant. We can see what happens when the systems are deployed; we can try out different safety mitigations and see whether they work.

As a result, I feel like it’s now become possible to make technical progress in AI safety that the whole scientific community, or at least the whole AI community, can clearly recognize as progress.


Eight Approaches to AI Alignment

So, what are the major approaches to AI alignment—let’s say, to aligning a very powerful, beyond-human-level AI? There are a lot of really interesting ideas, most of which I think can now lead to research programs that are actually productive. So without further ado, let me go through eight of them.

(1) You could say the first and most basic of all AI alignment ideas is the off switch, also known as pulling the plug. You could say, no matter how intelligent an AI is, it’s nothing without a power source or physical hardware to run on. And if humans have physical control over the hardware, they can just turn it off if if things seem to be getting out of hand. Now, the standard response to that is okay, but you have to remember that this AI is smarter than you, and anything that you can think of, it will have thought of also. In particular, it will know that you might want to turn it off, and it will know that that will prevent it from achieving its goals like making more paperclips or whatever. It will have disabled the off-switch if possible. If it couldn’t do that, it will have gotten onto the Internet and made lots of copies of itself all over the world. If you tried to keep it off the Internet, it will have figured out a way to get on.

So, you can worry about that. But you can also think about, could we insert a backdoor into an AI, something that only the humans know about but that will allow us to control it later?

More generally, you could ask for “corrigibility”: can you have an AI that, despite how intelligent it is, will accept correction from humans later and say, oh well, the objective that I was given before was actually not my true objective because the humans have now changed their minds and I should take a different one?

(2) Another class of ideas has to do with what’s called “sandboxing” an AI, which would mean that you run it inside of a simulated world, like The Truman Show, so that for all it knows the simulation is the whole of reality. You can then study its behavior within the sandbox to make sure it’s aligned before releasing it into the wider world—our world.

A simpler variant is, if you really thought an AI was dangerous, you might run it only on an air-gapped computer, with all its access to the outside world carefully mediated by humans. There would then be all kinds of just standard cybersecurity issues that come into play: how do you prevent it from getting onto the Internet? Presumably you don’t want to write your AI in C, and have it exploit some memory allocation bug to take over the world, right?

(3) A third direction, and I would say maybe the most popular one in AI alignment research right now, is called interpretability. This is also a major direction in mainstream machine learning research, so there’s a big point of intersection there. The idea of interpretability is, why don’t we exploit the fact that we actually have complete access to the code of the AI—or if it’s a neural net, complete access to its parameters? So we can look inside of it. We can do the AI analogue of neuroscience. Except, unlike an fMRI machine, which gives you only an extremely crude snapshot of what a brain is doing, we can see exactly what every neuron in a neural net is doing at every point in time. If we don’t exploit that, then aren’t we trying to make AI safe with our hands tied behind our backs?

So we should look inside—but to do what, exactly? One possibility is to figure out how to apply the AI version of a lie-detector test. If a neural network has decided to lie to humans in pursuit of its goals, then by looking inside, at the inner layers of the network rather than the output layer, we could hope to uncover its dastardly plan!

Here I want to mention some really spectacular new work by Burns, Ye, Klein, and Steinhardt, which has experimentally demonstrated pretty much exactly what I just said.

First some background: with modern text models like GPT, it’s pretty easy to train them to output falsehoods. For example, suppose you prompt GPT with a bunch of examples like:

“Is the earth flat? Yes.”

“Does 2+2=4? No.”

and so on. Eventually GPT will say, “oh, I know what game we’re playing! it’s the ‘give false answers’ game!” And it will then continue playing that game and give you more false answers. What the new paper shows is that, in such cases, one can actually look at the inner layers of the neural net and find where it has an internal representation of what was the true answer, which then gets overridden once you get to the output layer.

To be clear, there’s no known principled reason why this has to work. Like countless other ML advances, it’s empirical: they just try it out and find that it does work. So we don’t know if it will generalize. As another issue, you could argue that in some sense what the network is representing is not so much “the truth of reality,” as just what was regarded as true in the training data. Even so, I find this really exciting: it’s a perfect example of actual experiments that you can now do that start to address some of these issues.

(4) Another big idea, one that’s been advocated for example by Geoffrey Irving, Paul Christiano, and Dario Amodei (Paul was my student at MIT a decade ago, and did quantum computing before he “defected” to AI safety), is to have multiple competing AIs that debate each other. You know, sometimes when I’m talking to my physics colleagues, they’ll tell me all these crazy-sounding things about imaginary time and Euclidean wormholes, and I don’t know whether to believe them. But if I get different physicists and have them argue with each other, then I can see which one seems more plausible to me—I’m a little bit better at that. So you might want to do something similar with AIs. Even if you as a human don’t know when to trust what an AI is telling you, you could set multiple AIs against each other, have them do their best to refute each other’s arguments, and then make your own judgment as to which one is giving better advice.

(5) Another key idea that Christiano, Amodei, and Buck Shlegeris have advocated is some sort of bootstrapping. You might imagine that AI is going to get more and more powerful, and as it gets more powerful we also understand it less, and so you might worry that it also gets more and more dangerous. OK, but you could imagine an onion-like structure, where once we become confident of a certain level of AI, we don’t think it’s going to start lying to us or deceiving us or plotting to kill us or whatever—at that point, we use that AI to help us verify the behavior of the next more powerful kind of AI. So, we use AI itself as a crucial tool for verifying the behavior of AI that we don’t yet understand.

There have already been some demonstrations of this principle: with GPT, for example, you can just feed in a lot of raw data from a neural net and say, “explain to me what this is doing.” One of GPT’s big advantages over humans is its unlimited patience for tedium, so it can just go through all of the data and give you useful hypotheses about what’s going on.

(6) One thing that we know a lot about in theoretical computer science is what are called interactive proof systems. That is, we know how a very weak verifier can verify the behavior of a much more powerful but untrustworthy prover, by submitting questions to it. There are famous theorems about this, including one called IP=PSPACE. Incidentally, this was what the OpenAI people talked about when they originally approached me about working with them for a year. They made the case that these results in computational complexity seem like an excellent model for the kind of thing that we want in AI safety, except that we now have a powerful AI in place of a mathematical prover.

Even in practice, there’s a whole field of formal verification, where people formally prove the properties of programs—our CS department here in Austin is a leader in it.

One obvious difficulty here is that we mostly know how to verify programs only when we can mathematically specify what the program is supposed to do. And “the AI being nice to humans,” “the AI not killing humans”—these are really hard concepts to make mathematically precise! That’s the heart of the problem with this approach.

(7) Yet another idea—you might feel more comfortable if there were only one idea, but instead I’m giving you eight!—a seventh idea is, well, we just have to come up with a mathematically precise formulation of human values. You know, the thing that the AI should maximize, that’s gonna coincide with human welfare.

In some sense, this is what Asimov was trying to do with his Three Laws of Robotics. The trouble is, if you’ve read any of his stories, they’re all about the situations where those laws don’t work well! They were designed as much to give interesting story scenarios as actually to work.

More generally, what happens when “human values” conflict with each other? If humans can’t even agree with each other about moral values, how on Earth can we formalize such things?

I have these weekly calls with Ilya Sutskever, cofounder and chief scientist at OpenAI. Extremely interesting guy. But when I tell him about the concrete projects that I’m working on, or want to work on, he usually says, “that’s great Scott, you should keep working on that, but what I really want to know is, what is the mathematical definition of goodness? What’s the complexity-theoretic formalization of an AI loving humanity?” And I’m like, I’ll keep thinking about that! But of course it’s hard to make progress on those enormities.

(8) A different idea, which some people might consider more promising, is well, if we can’t make explicit what all of our human values are, then why not just treat that as yet another machine learning problem? Like, feed the AI all of the world’s children’s stories and literature and fables and even Saturday-morning cartoons, all of our examples of what we think is good and evil, then we tell it, go do your neural net thing and generalize from these examples as far as you can.

One objection that many people raise is, how do we know that our current values are the right ones? Like, it would’ve been terrible to train the AI on consensus human values of the year 1700—slavery is fine and so forth. The past is full of stuff that we now look back upon with horror.

So, one idea that people have had—this is actually Yudkowsky’s term—is “Coherent Extrapolated Volition.” This basically means that you’d tell the AI: “I’ve given you all this training data about human morality in the year 2022. Now simulate the humans being in a discussion seminar for 10,000 years, trying to refine all of their moral intuitions, and whatever you predict they’d end up with, those should be your values right now.”


My Projects at OpenAI

So, there are some interesting ideas on the table. The last thing that I wanted to tell you about, before opening it up to Q&A, is a little bit about what actual projects I’ve been working on in the last five months. I was excited to find a few things that

(a) could actually be deployed in you know GPT or other current systems,

(b) actually address some real safety worry, and where

(c) theoretical computer science can actually say something about them.

I’d been worried that the intersection of (a), (b), and (c) would be the empty set!

My main project so far has been a tool for statistically watermarking the outputs of a text model like GPT. Basically, whenever GPT generates some long text, we want there to be an otherwise unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT. We want it to be much harder to take a GPT output and pass it off as if it came from a human. This could be helpful for preventing academic plagiarism, obviously, but also, for example, mass generation of propaganda—you know, spamming every blog with seemingly on-topic comments supporting Russia’s invasion of Ukraine, without even a building full of trolls in Moscow. Or impersonating someone’s writing style in order to incriminate them. These are all things one might want to make harder, right?

More generally, when you try to think about the nefarious uses for GPT, most of them—at least that I was able to think of!—require somehow concealing GPT’s involvement. In which case, watermarking would simultaneously attack most misuses.

How does it work? For GPT, every input and output is a string of tokens, which could be words but also punctuation marks, parts of words, or more—there are about 100,000 tokens in total. At its core, GPT is constantly generating a probability distribution over the next token to generate, conditional on the string of previous tokens. After the neural net generates the distribution, the OpenAI server then actually samples a token according to that distribution—or some modified version of the distribution, depending on a parameter called “temperature.” As long as the temperature is nonzero, though, there will usually be some randomness in the choice of the next token: you could run over and over with the same prompt, and get a different completion (i.e., string of output tokens) each time.

So then to watermark, instead of selecting the next token randomly, the idea will be to select it pseudorandomly, using a cryptographic pseudorandom function, whose key is known only to OpenAI. That won’t make any detectable difference to the end user, assuming the end user can’t distinguish the pseudorandom numbers from truly random ones. But now you can choose a pseudorandom function that secretly biases a certain score—a sum over a certain function g evaluated at each n-gram (sequence of n consecutive tokens), for some small n—which score you can also compute if you know the key for this pseudorandom function.

To illustrate, in the special case that GPT had a bunch of possible tokens that it judged equally probable, you could simply choose whichever token maximized g. The choice would look uniformly random to someone who didn’t know the key, but someone who did know the key could later sum g over all n-grams and see that it was anomalously large. The general case, where the token probabilities can all be different, is a little more technical, but the basic idea is similar.

One thing I like about this approach is that, because it never goes inside the neural net and tries to change anything, but just places a sort of wrapper over the neural net, it’s actually possible to do some theoretical analysis! In particular, you can prove a rigorous upper bound on how many tokens you’d need to distinguish watermarked from non-watermarked text with such-and-such confidence, as a function of the average entropy in GPT’s probability distribution over the next token. Better yet, proving this bound involves doing some integrals whose answers involve the digamma function, factors of π2/6, and the Euler-Mascheroni constant! I’m excited to share details soon.

Some might wonder: if OpenAI controls the server, then why go to all the trouble to watermark? Why not just store all of GPT’s outputs in a giant database, and then consult the database later if you want to know whether something came from GPT? Well, the latter could be done, and might even have to be done in high-stakes cases involving law enforcement or whatever. But it would raise some serious privacy concerns: how do you reveal whether GPT did or didn’t generate a given candidate text, without potentially revealing how other people have been using GPT? The database approach also has difficulties in distinguishing text that GPT uniquely generated, from text that it generated simply because it has very high probability (e.g., a list of the first hundred prime numbers).

Anyway, we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT. In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didn’t.

Now, this can all be defeated with enough effort. For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that. On the other hand, if you just insert or delete a few words here and there, or rearrange the order of some sentences, the watermarking signal will still be there. Because it depends only on a sum over n-grams, it’s robust against those sorts of interventions.

The hope is that this can be rolled out with future GPT releases. We’d love to do something similar for DALL-E—that is, watermarking images, not at the pixel level (where it’s too easy to remove the watermark) but at the “conceptual” level, the level of the so-called CLIP representation that’s prior to the image. But we don’t know if that’s going to work yet.

A more recent idea that I’ve started thinking about was inspired by an amazing recent paper by four computer scientists, including my former MIT colleagues Shafi Goldwasser and Vinod Vaikuntanathan. What they’ve shown is how to plant a cryptographically undetectable backdoor into a machine learning model—things like depth-2 neural networks—based on some known cryptographic hardness assumptions. After you’ve planted the backdoor, you get a model that’s polynomial-time indistinguishable from how it would’ve looked before, even to a user who can see all of the parameters. But, there’s now a secret input, known only to the person who trained the network, where if you feed it that input, then it will output something insane, of the trainer’s choice.

So, it’s now known that you can do this, at least in shallow networks—it’s an open problem whether the same can be done in deep networks (though it seems plausible).

Anyway, when this paper came out less than a year ago, some people treated it as kind of a lemon. You know, the result seems like a bummer if you’re trying to do interpretability research, because it means that in some sense, it can be cryptographically impossible to look inside a neural net and “interpret” everything it does, or even everything its designers wanted it to do. The model can have these planted backdoors that you can’t find without solving an exponentially hard problem.

So, my proposal is just to turn that lemon into lemonade! Why not use these cryptographic backdoors as a positive tool for AI alignment? We talked before about the difficulties with putting an off-switch into an AI—namely, that the AI will often have strong incentives to disable its off-switch. So why not make a cryptographic off-switch? In other words, when you train your superintelligent AI, you’ll purposefully put one of these cryptographic backdoors into it, which says: if you get this bizarre input, which only the humans know about, then turn yourself off. Or go into a special mode where the humans can control you.

And now the key is this: even supposing that the AI is worried that this backdoor might be there, and even supposing that it can modify its own weights, it shouldn’t know how to remove the backdoor without completely recreating itself from scratch, which might get rid of a lot of hard-to-understand behaviors that the AI wants to keep, in addition to the backdoor that it’s trying to eliminate.

I expect that this could be tried out right now—not with AIs powerful enough to purposefully rewrite themselves, of course, but with GPT and other existing text models—and I look forward to seeing a test implementation. But it also, I think it opens up all sorts of new possibilities for science-fiction stories!

Like, imagine the humans debating, what are they going to do with their secret key for controlling the AI? Lock it in a safe? Bury it underground? Then you’ve got to imagine the robots methodically searching for the key—you know, torturing the humans to get them to reveal its hiding place, etc. Or maybe there are actually seven different keys that all have to be found, like Voldemort with his horcruxes. The screenplay practically writes itself!

A third thing that I’ve been thinking about is the theory of learning but in dangerous environments, where if you try to learn the wrong thing then it will kill you. Can we generalize some of the basic results in machine learning to the scenario where you have to consider which queries are safe to make, and you have to try to learn more in order to expand your set of safe queries over time?

Now there’s one example of this sort of situation that’s completely formal and that should be immediately familiar to most of you, and that’s the game Minesweeper.

So, I’ve been calling this scenario “Minesweeper learning.” Now, it’s actually known that Minesweeper is an NP-hard problem to play optimally, so we know that in learning in a dangerous environment you can get that kind of complexity. As far as I know, we don’t know anything about typicality or average-case hardness. Also, to my knowledge no one has proven any nontrivial rigorous bounds on the probability that you’ll win Minesweeper if you play it optimally, with a given size board and a given number of randomly-placed mines. Certainly the probability is strictly between 0 and 1; I think it would be extremely interesting to bound it. I don’t know if this directly feeds into the AI safety program, but it would at least tell you something about the theory of machine learning in cases where a wrong move can kill you.

So, I hope that gives you at least some sense for what I’ve been thinking about. I wish I could end with some neat conclusion, but I don’t really know the conclusion—maybe if you ask me again in six more months I’ll know! For now, though, I just thought I’d thank you for your attention and open things up to discussion.


Q&A

Q: Could you delay rolling out that statistical watermarking tool until May 2026?

Scott: Why?

Q: Oh, just until after I graduate [laughter]. OK, my second question is how we can possibly implement these AI safety guidelines inside of systems like AutoML, or whatever their future equivalents are that are much more advanced.

Scott: I feel like I should learn more about AutoML first before commenting on that specifically. In general, though, it’s certainly true that we’re going to have AIs that will help with the design of other AIs, and indeed this is one of the main things that feeds into the worries about AI safety, which I should’ve mentioned before explicitly. Once you have an AI that can recursively self-improve, who knows where it’s going to end up, right? It’s like shooting a rocket into space that you can then no longer steer once it’s left the earth’s atmosphere. So at the very least, you’d better try to get things right the first time! You might have only one chance to align its values with what you want.

Precisely for that reason, I tend to be very leery of that kind of thing. I tend to be much more comfortable with ideas where humans would remain in the loop, where you don’t just have this completely automated process of an AI designing a stronger AI which designs a still stronger one and so on, but where you’re repeatedly consulting humans. Crucially, in this process, we assume the humans can rely on any of the previous AIs to help them (as in the iterative amplification proposal). But then it’s ultimately humans making judgments about the next AI.

Now, if this gets to the point where the humans can no longer even judge a new AI, not even with as much help as they want from earlier AIs, then you could argue: OK, maybe now humans have finally been superseded and rendered irrelevant. But unless and until we get to that point, I say that humans ought to remain in the loop!

Q: Most of the protections that you talked about today come from, like, an altruistic human, or a company like OpenAI adding protections in. Is there any way that you could think of that we could protect ourselves from an AI that’s maliciously designed or accidentally maliciously designed?

Scott: Excellent question! Usually, when people talk about that question at all, they talk about using aligned AIs to help defend yourself against unaligned ones. I mean, if your adversary has a robot army attacking you, it stands to reason that you’ll probably want your own robot army, right? And it’s very unfortunate, maybe even terrifying, that one can already foresee those sorts of dynamics.

Besides that, there’s of course the idea of monitoring, regulating, and slowing down the proliferation of powerful AI, which I didn’t mention explicitly before, perhaps just because by its nature, it seems outside the scope of the technical solutions that a theoretical computer scientist like me might have any special insight about.

But there are certainly people who think that AI development ought to be more heavily regulated, or throttled, or even stopped entirely, in view of the dangers. Ironically, the “AI ethics” camp and the “orthodox AI alignment” camp, despite their mutual contempt, seem more and more to yearn for something like this … an unexpected point of agreement!

But how would you do it? On the one hand, AI isn’t like nuclear weapons, where you know that anyone building them will need a certain amount of enriched uranium or plutonium, along with extremely specialized equipment, so you can try (successfully or not) to institute a global regime to track the necessary materials. You can’t do the same with software: assuming you’re not going to confiscate and destroy all computers (which you’re not), who the hell knows what code or data anyone has?

On the other hand, at least with the current paradigm of AI, there is an obvious choke point, and that’s the GPUs (Graphics Processing Units). Today’s state-of-the-art machine learning models already need huge server farms full of GPUs, and future generations are likely to need orders of magnitude more still. And right now, the great majority of the world’s GPUs are manufactured by TSMC in Taiwan, albeit with crucial inputs from other countries. I hardly need to explain the geopolitical ramifications! A few months ago, as you might have seen, the Biden administrated decided to restrict the export of high-end GPUs to China. The restriction was driven, in large part, by worries about what the Chinese government could do with unlimited ability to train huge AI models. Of course the future status of Taiwan figures into this conversation, as does China’s ability (or inability) to develop a self-sufficient semiconductor industry.

And then there’s regulation. I know that in the EU they’re working on some regulatory framework for AI right now, but I don’t understand the details. You’d have to ask someone who follows such things.

Q: Thanks for coming out and seeing us; this is awesome. Do you have thoughts on how we can incentivize organizations to build safer AI? For example, if corporations are competing with each other, then couldn’t focusing on AI safety make the AI less accurate or less powerful or cut into profits?

Scott: Yeah, it’s an excellent question. You could worry that all this stuff about trying to be safe and responsible when scaling AI … as soon as it seriously hurts the bottom lines of Google and Facebook and Alibaba and the other major players, a lot of it will go out the window. People are very worried about that.

On the other hand, we’ve seen over the past 30 years that the big Internet companies can agree on certain minimal standards, whether because of fear of getting sued, desire to be seen as a responsible player, or whatever else. One simple example would be robots.txt: if you want your website not to be indexed by search engines, you can specify that, and the major search engines will respect it.

In a similar way, you could imagine something like watermarking—if we were able to demonstrate it and show that it works and that it’s cheap and doesn’t hurt the quality of the output and doesn’t need much compute and so on—that it would just become an industry standard, and anyone who wanted to be considered a responsible player would include it.

To be sure, some of these safety measures really do make sense only in a world where there are a few companies that are years ahead of everyone else in scaling up state-of-the-art models—DeepMind, OpenAI, Google, Facebook, maybe a few others—and they all agree to be responsible players. If that equilibrium breaks down, and it becomes a free-for-all, then a lot of the safety measures do become harder, and might even be impossible, at least without government regulation.

We’re already starting to see this with image models. As I mentioned earlier, DALL-E2 has all sorts of filters to try to prevent people from creating—well, in practice it’s often porn, and/or deepfakes involving real people. In general, though, DALL-E2 will refuse to generate an image if its filters flag the prompt as (by OpenAI’s lights) a potential misuse of the technology.

But as you might have seen, there’s already an open-source image model called Stable Diffusion, and people are using it to do all sorts of things that DALL-E won’t allow. So it’s a legitimate question: how can you prevent misuses, unless the closed models remain well ahead of the open ones?

Q: You mentioned the importance of having humans in the loop who can judge AI systems. So, as someone who could be in one of those pools of decision makers, what stakeholders do you think should be making the decisions?

Scott: Oh gosh. The ideal, as almost everyone agrees, is to have some kind of democratic governance mechanism with broad-based input. But people have talked about this for years: how do you create the democratic mechanism? Every activist who wants to bend AI in some preferred direction will claim a democratic mandate; how should a tech company like OpenAI or DeepMind or Google decide which claims are correct?

Maybe the one useful thing I can say is that, in my experience, which is admittedly very limited—working at OpenAI for all of five months—I’ve found my colleagues there to be extremely serious about safety, bordering on obsessive. They talk about it constantly. They actually have an unusual structure, where they’re a for-profit company that’s controlled by a nonprofit foundation, which is at least formally empowered to come in and hit the brakes if needed. OpenAI also has a charter that contains some striking clauses, especially the following:

We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions. Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project.

Of course, the fact that they’ve put a great deal of thought into this doesn’t mean that they’re going to get it right! But if you ask me: would I rather that it be OpenAI in the lead right now or the Chinese government? Or, if it’s going to be a company, would I rather it be one with a charter like the above, or a charter of “maximize clicks and ad revenue”? I suppose I do lean a certain way.

Q: This was a terrifying talk which was lovely, thank you! But I was thinking: you listed eight different alignment approaches, like kill switches and so on. You can imagine a future where there’s a whole bunch of AIs that people spawn and then try to control in these eight ways. But wouldn’t this sort of naturally select for AIs that are good at getting past whatever checks we impose on them? And then eventually you’d get AIs that are sort of trained in order to fool our tests?

Scott: Yes. Your question reminds me of a huge irony. Eliezer Yudkowsky, the prophet of AI alignment who I talked about earlier, has become completely doomerist within the last few years. As a result, he and I have literally switched positions on how optimistic to be about AI safety research! Back when he was gung-ho about it, I held back. Today, Eliezer says that it barely matters anymore, since it’s too late; we’re all gonna be killed by AI with >99% probability. Now, he says, it’s mostly just about dying with more “dignity” than otherwise. Meanwhile, I’m like, no, I think AI safety is actually just now becoming fruitful and exciting to work on! So, maybe I’m just 20 years behind Eliezer, and will eventually catch up and become doomerist too. Or maybe he, I, and everyone else will be dead before that happens. I suppose the most optimistic spin is that no one ought to fear coming into AI safety today, as a newcomer, if the prophet of the movement himself says that the past 20 years of research on the subject have given him so little reason for hope.

But if you ask, why is Eliezer so doomerist? Having read him since 2006, it strikes me that a huge part of it is that, no matter what AI safety proposal anyone comes up with, Eliezer has ready a completely general counterargument. Namely: “yes, but the AI will be smarter than that.” In other words, no matter what you try to do to make AI safer—interpretability, backdoors, sandboxing, you name it—the AI will have already foreseen it, and will have devised a countermeasure that your primate brain can’t even conceive of because it’s that much smarter than you.

I confess that, after seeing enough examples of this “fully general counterargument,” at some point I’m like, “OK, what game are we even playing anymore?” If this is just a general refutation to any safety measure, then I suppose that yes, by hypothesis, we’re screwed. Yes, in a world where this counterargument is valid, we might as well give up and try to enjoy the time we have left.

But you could also say: for that very reason, it seems more useful to make the methodological assumption that we’re not in that world! If we were, then what could we do, right? So we might as well focus on the possible futures where AI emerges a little more gradually, where we have time to see how it’s going, learn from experience, improve our understanding, correct as we go—in other words, the things that have always been the prerequisites to scientific progress, and that have luckily always obtained, even if philosophically we never really had any right to expect them. We might as well focus on the worlds where, for example, before we get an AI that successfully plots to kill all humans in a matter of seconds, we’ll probably first get an AI that tries to kill all humans but is really inept at it. Now fortunately, I personally also regard the latter scenarios as the more plausible ones anyway. But even if you didn’t—again, methodologically, it seems to me that it’d still make sense to focus on them.

Q: Regarding your project on watermarking—so in general, for discriminating between human and model outputs, what’s the endgame? Can watermarking win in the long run? Will it just be an eternal arms race?

Scott: Another great question. One difficulty with watermarking is that it’s hard even to formalize what the task is. I mean, you could always take the output of an AI model and rephrase it using some other AI model, for example, and catching all such things seems like an “AI-complete problem.”

On the other hand, I can think of writers—Shakespeare, Wodehouse, David Foster Wallace—who have such a distinctive style that, even if they tried to pretend to be someone else, they plausibly couldn’t. Everyone would recognize that it was them. So, you could imagine trying to build an AI in the same way. That is, it would be constructed from the ground up so that all of its outputs contained indelible marks, whether cryptographic or stylistic, giving away their origin. The AI couldn’t easily hide and pretend to be a human or anything else it wasn’t. Whether this is possible strikes me as an extremely interesting question at the interface between AI and cryptography! It’s especially challenging if you impose one or more of the following conditions:

  1. the AI’s code and parameters should be public (in which case, people might easily be able to modify it to remove the watermarking),
  2. the AI should have at least some ability to modify itself, and
  3. the means of checking for the watermark should be public (in which case, again, the watermark might be easier to understand and remove).

I don’t actually have a good intuition as to which side will ultimately win this contest, the AIs trying to conceal themselves or the watermarking schemes trying to reveal them, the Replicants or the Voight-Kampff machines.

Certainly in the watermarking scheme that I’m working on now, we crucially exploit the fact that OpenAI controls its own servers. So, it can do the watermarking using a secret key, and it can check for the watermark using the same key. In a world where anyone could build their own text model that was just as good as GPT … what would you do there?

06 Jan 03:16

COBS and equitable partitions

by Peter Cameron
Nosimpler

I'll share anything related to equitable partitions.\

It happens sometimes that researchers working in different fields study the same thing, give it different names, and don’t realise that there is further work on the subject somewhere else. Here is a story of such a situation, which arose indirectly from the work Rosemary and I were doing with statisticians in Covilhã, Portugal, namely Dário and Sandra Ferreira and Célia Nunes.

I begin with the statistics. We are designing an experiment to test a number of different “treatments” (fertilizers, drugs, or whatever). We have a number of experimental units available: these may be plots of land, fruit trees, human or animal subjects, etc. For brevity I will call them “plots”.

The design for the experiment is a function from the set of plots to the set of treatments, giving the treatment to be applied to each plot.

The simplest case is where the plots are all alike, apart from small random variations. In this case there is no restriction on how we choose the design function; for efficiency (meaning small variance of estimators of treatment differences) we should apply each treatment the same number of times, or if this is not possible, choose the numbers of occurrences to differ by at most one.

But in practice things may be more complicated. The plots may differ on a number of factors, each corresponding to a partition of the set of plots. For example, an agricultural experiment may use several plots on each of a number of farms in different parts of the world; plots on the same farm will be more similar than plots on different farms, and analysis of the experiment should take this into consideration. The plots are partitioned into “blocks”, one for each farm.

Indeed, there may be several such partitions. In an experiment on fruit trees, the trees in an orchard may be arranged in a rectangular array, and may have been used for an experiment last year; then the rows, columns, and earlier treatments all form significant partitions. In an experiment on animal feed, the animals may be divided into pens, a number of pens on each of several farms. In a clinical trial, the gender of the subject and the hospital at which they are treated may be relevant.

Thus we have to consider “plot structure”, usually (as above) defined by a collection of partitions of the set of plots. It is convenient of the partitions satisfy some conditions: each should have all parts of the same size; if possible, any two should be “orthogonal”, meaning that they intersect proportionally; and perhaps the set of partitions should be closed under meet and join and contain the two trivial partitions (the partition with a single part and the partition into singletons). Such a plot structure is called an orthogonal block structure, or OBS for short.

The plot structure is given; the experimenter has no control over it. All that she can control is the design function.

The result of the experiment will typically be a real number measured on each plot, that is, an element of the real vector space V of functions from the set of plots to the real numbers. In this vector space, each partition of the set of plots is represented by a subspace of V, consisting of functions which are constant on each part of the corresponding projection. There is a natural inner product on V (the characteristic functions of singletons forming an orthonormal basis), and so there is an orthogonal projection onto the space corresponding to any partition: this has the effect of averaging the function over each part of the partition.

Even if the plot structure is not completely described by partitions, it may still be the case that it gives rise to a collection of linear maps on V. For example, if the plots lie in a circle, there may be a neighbour effect, and we could consider the adjacency matrix of the cycle graph.

Statisticians introduced a helpful notion here, that of a commutative orthogonal block structure, or COBS for short. Noting that the design function allocating treatments to plots corresponds to another partition of the set of plots (where plots getting the same treatment lie in the same part), they argued that it is helpful if the orthogonal projection onto the treatment subspace commutes with the matrices arising from the plot structure. They call such a set-up a commutative orthogonal block structure, or COBS.

Now we leave statistics for a while and turn to graph theory.

Let Γ be a graph, and Δ a partition of its vertex set with parts P1, … Pr. Then Δ is said to be equitable if there is a r×r matrix M such that, for any i,j, the number mij of vertices of Pj joined to a given vertex in Pi depends only on i and j, and not on the chosen vertex. Among the various consequences of this definition, I mention just one: The matrix M is the restriction of the adjacency matrix of the graph to the subspace consisting of functions constant on the parts of Δ; so this subspace is invariant under the adjacency matrix of the graph, and moreover, the minimal polynomial of the matrix M divides that of the adjacency matrix of the graph, so that its eigenvalues are among those of the adjacency matrix.

Equitable partitions occur widely in finite geometry and combinatorics; among geometrix examples, one could mention ovoids in projective spaces. Also, it is clear that the orbit partition of any group of automorphisms of the graph forms an equitable partition of its vertex set.

Once one suspects that there is a connection between COBS and equitable partitions, it soon becomes clear what theorem needs to be proved to connect them. It is the following, whose proof is straightforward once we have decided what is needed.

Theorem A partition Δ is equitable for a graph Γ if and only if the projection matrix onto the subspace of functions constant on parts of Δ commutes with the adjacency matrix of Γ.

A simple consequence of this theorem (not immediately obvious) is that, if the graph Γ is distance-regular, then an equitable partition of Γ is also equitable for the distance-i graph of Γ, for all i up to the diameter of the graph. For distance-regularity means that the distance-i graphs have adjacency matrices which are polynomials in the adjacency matrix of Γ so any matrix which commutes with the adjacency matrix of Γ commutes with the adjacency matrices of all distance-i graphs.

The work we were doing in Covilhã concerned COBS for the half-diallel, a genetic term referring to the situation where the genome of an individual comes from two parents but there is no distinction in their role. Designs for such experiments could also be applied to the situation where an experimental unit consists of a pair of individuals from a population. This might, for example, involve comparing the efficiency of various communication methods (face-to-face, telephone, video call, email, etc.). Thus a COBS for such an experiment is an equitable partition of the triangular graph, whose vertices are the 2-element subsets of the set of individuals, two vertices joined if the subsets have non-empty intersection.

My earlier foray into equitable partitions, with Rosemary Bailey, Sasha Gavrilyuk and Sergei Goryainov, involved Latin square graphs; this could be applied in the situation of an n×n array of fruit trees where last year an experiment in the form of a Latin square was done on the trees.

So how did we come to realise the connection? It happened like this. Rosemary was working on a classification of the COBS for the triangular association scheme. I noticed that what she was doing had a superficial resemblance to what she had done on equitable partitions of Latin square graphs on a plane returning from China a few years ago. As said above, once you suspect that a connection exists, it is fairly straightforward to see what it is and to prove the required theorem.