The Old Reader

19 Aug 17:34

Thermal constraints on in vivo optogenetic manipulations

by Scott F. Owen

Nature Neuroscience, Published online: 17 June 2019; doi:10.1038/s41593-019-0422-3

Optogenetics has revolutionized neuroscience, but intracranial illumination can cause off-target effects. Owen et al. identify a temperature-sensitive potassium current that modulates neuronal activity and behavior independent of opsin expression.

29 Jul 01:51

Rapid Plasticity of Higher-Order Thalamocortical Inputs during Sensory Learning

by Nicholas J. Audette, Sarah M. Bernhard, Ajit Ray, Luke T. Stewart, Alison L. Barth

Audette et al. use automated training and in vitro electrophysiology to define cortical circuit changes during sensory-association learning. Pathway-specific analysis identifies higher-order thalamic inputs to sensory cortex as a site of synaptic potentiation during the earliest stages of learning.

05 Jul 16:20

Light-wave dynamic control of magnetism

by Florian Siegrist

Nature, Published online: 26 June 2019; doi:10.1038/s41586-019-1333-x

The magnetic properties of a ferromagnetic layer stack are controlled on attosecond timescales through optically induced spin and orbital momentum transfer, demonstrating a coherent regime of ultrafast magnetism.

27 Jun 16:41

High-dimensional geometry of population responses in visual cortex

by Carsen Stringer

Nature, Published online: 26 June 2019; doi:10.1038/s41586-019-1346-5

Analysis of the encoding of natural images by very large populations of neurons in the visual cortex of awake mice characterizes the high dimensional geometry of the neural responses.

27 Jun 14:59

Decoding of the other\'s focus of attention by a temporal cortex module

by Ramezanpour, H., Thier, P.

Faces attract the observer's attention towards objects and locations of interest for the other, thereby allowing the two agents to establish joint attention. Previous work has delineated a network of cortical "patches" in the macaque cortex, processing faces, eventually also extracting information on the other's gaze direction. Yet, the neural mechanism that links information on gaze direction, guiding the observer's attention to the relevant object has remained elusive. Here we present electrophysiological evidence for the existence of a distinct "gaze-following patch (GFP)" with neurons that establish this linkage in a highly flexible manner. The other's gaze and the object, singled out by the gaze, are linked only if this linkage is pertinent within the prevailing social context. The properties of these neurons establish the GFP as a key switch in controlling social interactions based on the other's gaze.

27 Jun 02:59

Meeting the Dialogue Challenge

by john

$MathML-enabled post (click for more details).$

guest post by Dan Shiebler and Alexis Toumi

This is the fourth post in a series from the Adjoint School of Applied Category Theory 2019. We discuss Grammars as Parsers: Meeting the Dialogue Challenge (2006) by Matthew Purver, Ronnie Cann and Ruth Kempson as part of a group project on categorical methods for natural language dialogue.

$MathML-enabled post (click for more details).$

Preliminary Warning: Misalignment in Dialogue

Traditional accounts of the mechanics of natural language have largely been focused on monologue and utterances in isolation: using formal languages and automata theory to characterise the set of grammatical or “well-formed” sentences. When faced with dialogue and informal conversation, this notion of grammaticality breaks down: people hesitate, correct themselves, interrupt each other, etc. In Toward a mechanistic psychology of dialogue (2004), Pickering and Garrod argue that dialogue represents a major challenge for psycholinguistics and give empirical evidence for interactive alignment as the psychological process underlying dialogue. We will aim at casting a categorical light on this dialogue challenge and on the computational model proposed by Purver, et al.

This computational linguistics article may look like the odd one out in the ACT reading list: it does not even mention category theory. Worse: the word “category” is actually used twice, but it doesn’t mean anything like a collection of arrows with composition; “adjunction” appears twice as well, but again nothing to do with pairs of functors and natural transformations. This is a typical example of a phenomenon psycholinguistics would call misalignment. If we imagine a dialogue between a linguist and a mathematician, they would use the same words to refer to different concepts, resulting in a failure of communication.

Context-Freeness and Monoidal Categories

It may now prove useful to give some historical context. In his Three models for the description of language (1956) and his monograph on Syntactic Structures (1957), Chomsky lays out a formal theory of natural language syntax with the context-free grammar (CFG) as its cornerstone. Formally, a CFG is given by a tuple G=(V,X,R,s)G = (V, X, R, s) where:

VV and XX are two finite sets, called the vocabulary and the nonterminal symbols respectively.
R⊆X×(V+X) ⋆R \subseteq X \times (V + X)^\star is a finite set of production rules and s∈Xs \in X is the start symbol, where the Kleene star (−) ⋆(-)^\star denotes the free monoid and (+)(+) denotes disjoint union.

A CFG generates a language L(G)={u∈V ⋆|s→ Ru}L(G) = \{\ u \in V^\star \ \vert \ s \to_R u \ \} where the rewriting relation (→ R)⊆(V+X) ⋆×(V+X) ⋆(\to_R) \subseteq (V + X)^\star \times (V + X)^\star is traditionally defined as the transitive closure of the following directed graph:

{(uxw,uvw)|u,w∈(V+X) ⋆,(x,v)∈R} \{ \ (u x w, \ u v w) \quad \vert \quad u, w \in (V + X)^\star, \ (x, v) \in R \ \}

One may recast this into the language of monoidal categories by redefining L(G)={u∈V ⋆|𝒞 R(s,u)≠∅}L(G) = \{ \ u \in V^\star \ \vert \ \mathcal{C}_R(s, u) \neq \emptyset \ \} where the transition relation RR is seen as a signature for the free monoidal category 𝒞 R\mathcal{C}_R. That is, a string u∈V ⋆u \in V^\star is grammatical whenever there exists an arrow from the start symbol ss to uu in 𝒞 R\mathcal{C}_R. A typical arrow r:s→ur : s \to u where u="Colourless green ideas sleep furiously"u = \text{"Colourless green ideas sleep furiously"} may be encoded as a syntax tree:

Syntax tree for "Colorless green ideas sleep furiously"

A year later in The mathematics of sentence structure (1958), Lambek characterised the same context-free languages with his Lambek calculus, i.e. the internal language of biclosed monoidal categories. Half a century later in From Word To Sentence (2008), he simplified his formalism from biclosed to rigid monoidal categories and pregroup grammars. First introduced in Lambek (1997), pregroups are partially ordered monoids (P,≤,⋅,1)(P, \leq, \cdot, 1) where every type t∈Pt \in P has left and right adjoints, i.e. a pair of objects ⋆t,t ⋆∈P{}^\star t, t^\star \in P with two pairs of inequations:

t⋅ ⋆t≤1≤ ⋆t⋅tt ⋆⋅t≤1≤t⋅t ⋆ t \cdot {}^\star t \leq 1 \leq {}^\star t \cdot t \qquad \qquad t {}^\star \cdot t \leq 1 \leq t \cdot t^\star

Formally, a pregroup grammar is given by a tuple G=(V,B,D,s)G = (V, B, D, s) where:

VV is a finite vocabulary, BB is a finite partially ordered set of basic types,
D⊆V×P BD \subseteq V \times P_B is a finite relation called the dictionary or lexicon, where P BP_B is the free pregroup generated by BB and s∈P Bs \in P_B is called the sentence type.

Again, we may define the language of GG as L(G)={u∈V ⋆|𝒞 D(u,s)≠∅}L(G) = \{ \ u \in V^\star \ \vert \ \mathcal{C}_D(u, s) \neq \emptyset \ \} where now the dictionary DD is taken as a signature for the free rigid monoidal category 𝒞 D\mathcal{C}_D. If we note that the generation and parsing problems form a duality where the output of the generation problem is the input to the parsing problem, we see that CFGs and pregroup grammars stand on opposite sides of this duality. That is, the start symbol is the domain of syntax trees while the sentence type is the codomain of pregroup reductions. The arrow r:u→sr : u \to s encoding grammaticality is no more a tree but a planar diagram, e.g.:

Pregroup reduction for "Colorless green ideas sleep furiously"

Context-free grammars and pregroup grammars share the same expressive power, i.e. they generate the same class of context-free languages. However, they are only weakly equivalent, i.e. the translations between them preserve only grammaticality and forget the structure of the grammatical reductions, syntax trees or string diagrams. There is still open debate about the syntactic structure that underlies human language.

Thus, we may realign the terminology of the linguist and of the category theorist as follows: the “syntactic categories” of categorial grammars are really objects in some form of closed category and the “adjunctions” are really adjunctions after all! Indeed, while in CFGs an adjunct may be defined as a subtree which may be removed without affecting grammaticality, in pregroup grammars the same phenomenon is encoded as an adjunction, e.g. between the types of “furiously” and “sleep”.

Natural Language and Functorial Semantics

We argue that it is of interest to realign “syntactic categories” and “adjunctions” the other way round, i.e. take their category theoretic meaning and apply them to linguistics. Again we give some historical context by looking at two parallel uses of the word “universal”: universal algebra and Lawvere theories on one side, universal grammar and Montague semantics on the other.

In Adjointness in Foundations (1969), Lawvere lays out a program for the foundations of mathematics by characterising existential and universal quantification as left and right adjoints to substitution. In his doctoral dissertation Functorial Semantics of Algebraic Theories (1963), he defines what are now known as Lawvere theories and their models as monoidal functors F:𝒯→SetF : \mathcal{T} \to Set where 𝒯\mathcal{T} is the syntactic category encoding the theory: a monoidal category where the tensor is the categorical product and each object is isomorphic to a cartesian power x nx^n for n∈ℕn \in \mathbb{N} and some fixed xx.

The following year in English as a formal language (1970) and Universal Grammar (1970), Montague sets out to apply the same principle of compositionality to natural language. Recast using category theory, Montague semantics is a monoidal functor F:(𝒞 R) op→SetF : (\mathcal{C}_R)^{op} \to Set from the free monoidal category generated by the transition relation of a context-free grammar G=(V,X,R,s)G = (V, X, R, s) such that words w∈Vw \in V are mapped to the singleton and the start symbol s∈Xs \in X to a set of closed logical formulae. Explicitly, FF is given by a lambda term for each transition rule, mapping grammatical sentences to logical sentences in a compositional way, e.g.:

Montague semantics for "Thetis loves a mortal"

Even though the use of higher-order logic in Montague semantics suggests a connection with Lawvere theories through the internal language of topos theory, to the best of our knowledge Montague is as little known to category theorists as Lawvere is to linguists.

In the next section, we discuss a second principle at play in natural language semantics: distributionality, as summarised in Firth’s oft-cited maxim: “You shall know a word by the company it keeps”.

Categorical Compositional Distributional Models

How do we define a word’s company categorically? As a first approximation, we may take some large collection of sentences C⊆V ⋆C \subseteq V^\star and look at the matrix E:|V|×|V|→ℕE : |V| \times |V| \to \mathbb{N} encoding how many times each pair of words appeared together, i.e. E(i,j)=∑ u∈C1 u(w i)×1 u(w j)E(i, j) = \sum_{u \in C} \mathbf{1}_u(w_i) \times \mathbf{1}_u(w_j) for 1 u:V→{0,1}\mathbf{1}_u : V \to \{0, 1\} the indicator function of the utterance u∈V ⋆u \in V^\star. Better approximations would be given by some renormalisation of this matrix such as TF-IDF or by information-theoretic measures like pointwise mutual information. The size of this matrix may be prohibitive and you may want to apply some dimensionality reduction to get some compressed encoding matrix E:|V|×n→SE : |V| \times n \to S for some hyper-parameter n∈ℕn \in \mathbb{N} and some suitable semiring SS.

We give a very brief presentation of the categorical compositional distributional (DisCoCat) models of Clark et al. (2010), which were the topic of a previous post from the ACT 2018 seminar. DisCoCat models may be defined as monoidal functors F:𝒞 D→Mat(S)F : \mathcal{C}_D \to \text{Mat}(S) where 𝒞 D\mathcal{C}_D is the free rigid monoidal category generated by a pregroup dictionary D⊆V×P BD \subseteq V \times P_B and Mat(S)\text{Mat}(S) is the category of matrices over SS with Kronecker product as tensor. Note that the encoding matrix E:|V|×n→SE : |V| \times n \to S is precisely the data defining a monoidal functor F:𝒞 D→Mat(S)F : \mathcal{C}_D \to \text{Mat}(S) such that F(t)=nF(t) = n for all (w,t)∈D(w, t) \in D, i.e. every word is of the same semantic type. In order to construct arbitrary DisCoCat models we need tensors of higher rank, e.g. the encoding of the verb (loves, ⋆n⋅s⋅n ⋆)∈D(loves, {}^\star n \cdot s \cdot n^\star) \in D will have dimension n×F(s)×nn \times F(s) \times n where we overload the notation for the noun type n∈P Bn \in P_B and the dimension of its image F(n)=n∈ℕF(n) = n \in \mathbb{N}.

DisCoCat models valued in the category Mat(𝔹)≃FinRel\text{Mat}(\mathbb{B}) \simeq \text{FinRel} of finite sets and relations yield a truth-theoretic semantics for pregroup grammars in terms of conjunctive queries, the regular logic fragment of relational databases. Although much weaker than the higher-order logic of Montague semantics, conjunctive queries have nice complexity-theoretic as well as category-theoretic properties. Indeed, a celebrated theorem from Chandra, Merlin (1977) — conjunctive query evaluation is NP-complete — has recently been recast in terms of free cartesian bicategories in Bonchi et al. (2018): queries are arrows and evaluation reduces to the existence of 2-arrows. The significance of this result for natural language processing is discussed in De Felice et al. (2019), to appear in the ACT 2019 proceedings.

Both the relational models and the vector space models valued in Mat(ℝ)≃FinVect ℝ\text{Mat}(\mathbb{R}) \simeq \text{FinVect}_{\mathbb{R}} yield some insight into the anatomy of word meanings through a common structure: the categories of matrices Mat(S)\text{Mat}(S) are all hypergraph categories, i.e. symmetric monoidal categories where each object is equipped with a special commutative Frobenius algebra in a coherent way. This allows us to model information flow and give a semantics to those functional words which appear in almost every context, hence for which distributionality gives practically no information content: auxiliary verbs such as “does”, coordinators e.g. “and” and “or”, relative and personal pronouns e.g. “that” and “them” — see the line of work by Sadrzadeh, Coecke and others [1, 2, 3, 4, 5].

As we discuss in the next section, Frobenius algebras can even be used to give a meaning to words that are missing, see Wijnolds, Sadrzadeh (2019).

Contextual Grammaticality with Dynamic Syntax

Dynamic Syntax, which emerged from work by Kempson et al., is based on a third linguistics principle, incrementality: in dialogue people speak and listen to only one word at a time. Modelling language generation and parsing as linear-time processes has two related justifications: on the computational side it allows to implement real-time applications like personal assistants; on the cognitive side there is empirical evidence that human parsing is largely incremental, see e.g. Philips, Linear Order and Constituency (2003).

Dynamic syntax (DS) takes a semantics-driven approach to grammaticality: a sequence of words is defined as grammatical precisely if it can be given a semantics in the model. DS replaces the lambda terms of Montague semantics by a set TT of semantic trees based on Hilbert’s epsilon calculus, with ∅∈T\varnothing \in T the initial empty tree and S⊆TS \subseteq T a subset of completed trees, i.e. closed logical formulae. In our following discussion of DS we use TyTy and FoFo to represent the tree operators on logical types and logical formulae respectively.

DS also defines a set AA of actions, specified in an imperative programming language based the logic of finite trees of Blackburn et al. (1994). A DS lexicon is then given by a function D:V→AD : V \to A assigning a lexical action to each word in the vocabulary w∈Vw \in V, e.g.

Dynamic syntax for "dislike"

DS semantic trees are annotated with some further structure: pointers, metavariables and requirements, as illustrated in the following example.

Dynamic syntax for "John likes Mary"

Pointers: The semantic trees are equipped with a pointer ⋄\diamond and actions may be specified locally to that pointer in terms of descendants ↓\downarrow and ancestors ↑\uparrow in the tree.
Metavariables: Metavariables are underspecified terms. They include relative pronouns such as “he” or “she” as well as subjects with indefinite scope like the word “someone” in the sentence “Everyone likes someone.”
Requirements: After parsing the subsentence “John likes”, the semantic tree will include a requirement for an argument type (represented by ?Ty(e)?Ty(e) in the figure above). This requirement will be filled when the parser reaches the next word, “Mary”, which will yield the complete tree for the sentence “John likes Mary.”

In Grammars as Parsers (2006), Purver et al. argue that incrementality is key if we are to model the language of informal conversations. They apply dynamic syntax to the following phenomena, which are all problematic in traditional approaches to language processing:

Ellipses: in informal conversations, context allows participants to omit words from sentences without affecting their meaning, e.g. A: “Mary studies categories.” B: “John does too.”
Routinization: the repetition of the same ambiguous phrase (“that guy”) resolves the ambiguity and makes participants more likely to answer using the same phrase again.
Shared utterances: participants may complete each others’ sentences, e.g. A: “John likes…” B: “category theory? I know.”

For example, consider the dialogue A:”Mary upset Sue.” B:”John did too.” When parsing “John did too,” the DS parser will represent “did too” as a requirement asking for what John did. If the semantic tree encoding context contains a node of the appropriate type (e.g. λx.Upset(x,Mary)\lambda x.Upset(x, Mary)), the parser may use this to fill the requirement.

Dynamic syntax for dialogue

Furthermore, Purver et al. model the interactive alignment of Pickering and Garrod on three levels: lexical i.e. participants re-using the same words, syntactic i.e. participants re-using the same phrase structure, and semantic i.e. participants converging to the same representation of the situation. This requires a notion of state that goes beyond that of isolated sentences: we need to encode the participants’ memory of the dialogue’s history. The state of a DS parser — also called the context — is given by a sequence of triples C∈(T×V ⋆×A ⋆) ⋆C \in (T \times V^\star \times A^\star)^\star, such that for all (t,u,a)∈C(t, u, a) \in C and a n−1∘⋯∘a 0(∅)=ta_{n-1} \circ \cdots \circ a_{0}(\varnothing) = t, i.e. we record the list of actions which are constructed from the utterance u∈V ⋆u \in V^\star. Next to lexical actions, DS defines a set of computational actions which may depend on context, e.g.

Substitution axiom in dynamic syntax

We say a context CC is valid if for all (t,u,a)∈C(t, u, a) \in C we have t∈St \in S is a completed tree. This model defines a notion of contextual grammaticality: whether an utterance is well-formed may depend on what has been said before. When the system has only a single speaker, we can represent this context with the history string c∈L(D)c \in L(D). Below we illustrate several subcases of this single speaker system. A string u∈V ⋆u \in V^\star is:

fully grammatical if it is parsable from any valid context e.g. “John went to the market”,
fully ungrammatical if there is no context in which it is parsable e.g. “John market”,
well-formed if there is some valid context in which it is parsable e.g. “He went to the market”,
potentially well-formed if there is some context (not necessarily valid) in which it is parsable e.g. “the market”.

Bringing it All Together: DisCoCat Inc.

The aim of our summer project is to incorporate the incremental insights from dynamic syntax into categorical compositional distributional models, DisCoCat Inc. Sadrzadeh et al. (2018) explore the first steps in this direction, mapping semantic trees to tensor networks and actions to tensor contraction.

In conclusion, we may attempt to reformulate the dialogue challenge in categorical terms: given DisCoCat models F A:𝒜→Mat(S)F_A : \mathcal{A} \to \text{Mat}(S) and F B:ℬ→Mat(S)F_B : \mathcal{B} \to Mat(S), encoding the language of Alice and Bob respectively, can we build a new model F C:𝒞→Mat(S)F_C : \mathcal{C} \to Mat(S) giving a semantics to the dialogues between them? Furthermore, can we make the model for this common language F C:𝒞→Mat(S)F_C : \mathcal{C} \to Mat(S) incremental, and account for the real-time dynamics of dialogues? These questions inspired a number of links with the other group projects of the ACT adjoint school:

Partial evaluation and the bar construction: Can we make use of monads, partial evaluations and rewriting theory (as discussed in a previous guest post) to model the dynamics of syntactic structures and unify universal grammar with universal algebra?
Toward a mathematical foundation for autopoeisis: Can we model natural language entailment and question answering using the graphical regular logic of Fong, Spivak (2018)? Can we investigate language as an autopoeitic system through the behavioral mereology of Fong et al. (2018)?
Simplifying quantum circuits using the ZX-calculus: Can we use techniques inspired from quantum circuit minimisation (discussed in two previous posts here and there) to perform summarisation, i.e. find the shortest text which encodes the semantics of a given dialogue?
Traversal optics and profunctors: the theory of lenses has recently been used to model both learning algorithms in Fong, Johnson (2019) and Wittgenstein’s language games in Hedges, Lewis (2018), can we use these insights and the categorical view of optics developed in Milewski (2007) to model natural language learning through dialogue?

26 Jun 21:28

Dietary fatty acids promote sleep through a taste-independent mechanism

by Sah Pamboro, E. L., Brown, E. B., Keene, A. C.

Consumption of foods that are high in fat contributes to obesity and metabolism-related disorders that are increasing in prevalence and present an enormous health burden throughout the world. Dietary lipids are comprised of triglycerides and fatty acids, and the highly palatable taste of dietary fatty acids promotes food consumption, activates reward centers in mammals, and underlies hedonic feeding. Despite a central role of dietary fats in the regulation of food intake and the etiology of metabolic diseases, little is known about how fat consumption regulates sleep. The fruit fly, Drosophila melanogaster, provides a powerful model system for the study of sleep and metabolic traits, and flies potently regulate sleep in accordance with food availability. To investigate the effects of dietary fats on sleep regulation, we have supplemented fatty acids into the diet of Drosophila and measured their effects on sleep and activity. We found that feeding flies a diet of hexanoic acid, a medium-chain fatty acid that is a by-product of yeast fermentation, promotes sleep by increasing the number of sleep episodes. This increase in sleep is dose-dependent and independent of the light-dark cues. Diets consisting of other fatty acids, including medium- and long-chain fatty acids, also increase sleep, suggesting many fatty acid types promote sleep. To assess whether dietary fatty acids regulate sleep through the taste system, we assessed sleep in flies with a mutation in the hexanoic acid receptor Ionotropic receptor 56d, which is required for fatty acid taste perception. We found that these flies also increase their sleep when fed a hexanoic acid diet, suggesting the sleep promoting effect of hexanoic acid is not dependent on sensory perception. Overall, these results define a role for fatty acids in sleep regulation, providing a foundation to investigate the molecular and neural basis for fatty acid-dependent modulation of sleep duration.

24 Jun 03:27

Making a Simple $A+B→C$ Reaction Oscillate by Coupling to Hydrodynamic Effect

by M. A. Budroni, V. Upadhyay, and L. Rongy

Author(s): M. A. Budroni, V. Upadhyay, and L. Rongy

Any chemical reaction of the type A + B -> C, between two reactant species (A and B), can be made to oscillate in time and space if the interface region where C forms obeys certain fluid convection properties.

[Phys. Rev. Lett. 122, 244502] Published Thu Jun 20, 2019

24 Jun 02:23

Glia Accumulate Evidence that Actions Are Futile and Suppress Unsuccessful Behavior

by Yu Mu, Davis V. Bennett, Mikail Rubinov, Sujatha Narayan, Chao-Tsung Yang, Masashi Tanimoto, Brett D. Mensh, Loren L. Looger, Misha B. Ahrens

Whole-brain imaging in virtual-reality-immersed zebrafish reveals that failed swim attempts are detected by noradrenergic neurons, which drive glial cells that accumulate calcium until they trigger the suppression of further futile attempts.

24 Jun 02:23

Correlated Neural Activity and Encoding of Behavior across Brains of Socially Interacting Animals

by Lyle Kingsbury, Shan Huang, Jun Wang, Ken Gu, Peyman Golshani, Ye Emily Wu, Weizhe Hong

When two animals interact, neural activity across their brains synchronizes in a way that predicts how they will behave and how they form social dominance relationships.

15 Jun 16:03

Nerve cells from the brain invade prostate tumours

by Simon T. Schafer

Nature, Published online: 15 May 2019; doi:10.1038/d41586-019-01461-7

Prostate cancer contains nerve cells that are linked to disease progression, but their source was unknown. A mouse study reveals that cells from the brain invade prostate tumours and give rise to this nerve-cell population.

15 Jun 16:01

Lessons from cold fusion, 30 years on

by Philip Ball

Nature, Published online: 27 May 2019; doi:10.1038/d41586-019-01673-x

Why revisit long-discredited claims for a source of abundant energy, asks Philip Ball? Because we are still learning how to treat pathological science.

15 Jun 15:58

Revisiting the cold case of cold fusion

by Curtis P. Berlinguette

Nature, Published online: 27 May 2019; doi:10.1038/s41586-019-1256-6

Three years of investigation by a multi-disciplinary team into claims of ‘cold fusion’ found no evidence that the phenomenon exists, but identified a parameter space potentially worthy of further exploration.

14 May 02:26

Hierarchical recurrent state space models reveal discrete and continuous dynamics of neural activity in C. elegans

by Linderman, S. W., Nichols, A. L. A., Blei, D. M., Zimmer, M., Paninski, L.

Modern recording techniques enable large-scale measurements of neural activity in a variety of model organisms. The dynamics of neural activity shed light on how organisms process sensory information and generate motor behavior. Here, we study these dynamics using optical recordings of neural activity in the nematode C. elegans. To understand these data, we develop state space models that decompose neural time-series into segments with simple, linear dynamics. We incorporate these models into a hierarchical framework that combines partial recordings from many worms to learn shared structure, while still allowing for individual variability. This framework reveals latent states of population neural activity, along with the discrete behavioral states that govern dynamics in this state space. We find stochastic transition patterns between discrete states and see that transition probabilities are determined by both current brain activity and sensory cues. Our methods automatically recover transition times that closely match manual labels of different behaviors, such as forward crawling, reversals, and turns. Finally, the resulting model can simulate neural data, faithfully capturing salient patterns of whole brain dynamics seen in real data.

14 May 02:02

The Dynamics of Attention Shifts Among Concurrent Speech in a Naturalistic Multi-Speaker Virtual Environment

by Zion Golumbic, E., Shavit-Cohen, K.

Focusing attention on one speaker on the background of other irrelevant speech can be a challenging feat. A longstanding question in attention research is whether and how frequently individuals shift their attention towards task-irrelevant speech, arguably leading to occasional detection of words in a so-called unattended message. However, this has been difficult to gauge empirically, particularly when participants attend to continuous natural speech, due to the lack of appropriate metrics for detecting shifts in internal attention. Here we introduce a new experimental platform for studying the dynamic deployment of attention among concurrent speakers, utilizing a unique combination of Virtual Reality and Eye-Tracking technology. We created a Virtual Cafe in which participants sit across from and attend to the narrative of a target speaker. We manipulate the number and location of distractor speakers, manifest as additional patrons throughout the Virtual Cafe. By monitoring participants eye-gaze dynamics, we studied the patterns of overt shifts of attention among the concurrent speakers as well as the consequences of these shifts on speech comprehension. Our results reveal important individual differences in the gaze-pattern displayed during selective attention to speech. While some participants stayed fixated on a target speaker throughout the entire experiment, approximately 30% of participants frequently shifted their gaze toward distractor speakers or other locations in the environment, regardless of the severity of audiovisual distraction. Critically, the tendency for frequent gaze-shifts negatively impacted comprehension of the target speaker. We also found that gaze-shifts occurred primarily during gaps in the acoustic input, suggesting they are prompted by momentary unmasking of the competing audio, in line with glimpsing theories of processing speech in noise. These results open a new window into understanding the dynamics of attention as they wax and wane over time, and the different listening patterns employed for dealing with the influx of sensory input in multisensory environments. Moreover, the novel approach developed here for tracking the locus of momentary attention in a naturalistic virtual-reality environment holds high promise for extending the study of human behavior and cognition and bridging the gap between the laboratory and real-life.

Nosimpler likes this

13 Apr 00:14

Extreme free-range chicken farming

by Minnesotastan

From the always-interesting Atlas Obscura:

Massimo Rapella, a 48-year-old chicken farmer from northern Italy, is helping chickens rediscover their wild side. Since 2009, Rapella and his wife Elisabetta have been keeping an estimated 2,100 hens in a patch of pristine Alpine forest near Sondrio, in the heart of the Valtellina valley...

Shortly after relocating, Rapella and his wife started keeping a few chickens to provide eggs for their own consumption. But soon enough they noticed some unexpected behavior from their flock. “Our chickens liked roaming around the nearby woods,” Rapella explains. “So I encouraged them to venture out and lay eggs in the wild.”

A few months later, Rapella saw that the birds looked healthier—with shiny feathers and bright-colored wattles—and that their eggs had a fuller taste. “I started wondering if I could take on more chickens and create an ‘Alpine egg’ to sell in local markets,” he says. Today, he sells his uovo di selva, or egg of the woods, to about 400 direct consumers and 40 restaurants...

Most domestic chickens today would not find themselves at home in a forest: at least, not immediately. “The first large batch of chickens I took in looked very lost,” Rapella says. “They had never seen a tree nor a bug in their life, and they were scared of snow.”..

“White birds really stand out to predators,” Clauer says. Rapella keeps two different breeds of chicken: Hy-Line brown hens and the easy-to-spot white Leghorns. While he once lost the occasional chicken, now he relies on a double fence and two trained Maremma sheepdogs to keep badgers, martens (a weasel-like carnivore), foxes, and buzzards at bay.

Rapella’s chickens lay eggs almost every day, like any domesticated chicken, but they do so in the woods. “They like natural nests offered by tree roots or branches,” he says. “Usually when you spot a cranny with some leaves, you know there could be eggs.” Once a hen finds her favorite nesting spot, she goes back to it for each subsequent laying, making Rapella’s egg-hunting easier. Together with two employees, he gathers an estimated 1,000 eggs every morning.

His uovo di selva tastes like egg, but concentrated. There’s more flavor to it, and also more protein, due to the bug-filled diet of the chickens. As a result, when chefs whip the whites from Rapella’s protein-rich eggs, they get three times the volume. The egg yolk can even change with the seasons...

Adults living in the state where they were born

by Minnesotastan

Via Digg

13 Apr 00:00

Not The Onion

by Minnesotastan

An unarmed mentally ill man was shot at by police. Some of the bullets missed and struck bystanders in the street. The DA decided to charge the mentally ill man with first-degree assault because of the injuries to the bystanders.

Police later said that after a cat-and-mouse game, Broadnax had reached into his pants pocket and removed an object, briefly “pointing” it at the responding officers. They thought it was a gun, and fired three rounds. After the shots, and a tussle with cops, Broadnax was hit with a Taser and arrested.

When the dust settled, a few things became clear. First, Broadnax was unarmed; the object police had thought was a gun was, in fact, a wallet. Second, Broadnax seemed to be in the midst of a mental health crisis. He told investigators immediately after his arrest that he was having auditory hallucinations, hearing the “voices of his dead relatives.”

The third fact that emerged was that the officers who opened fire had missed their target. Broadnax was unscathed. The bullets meant for him had instead struck Khoshakhlagh and a second victim, 59-year-old Theodora Ray, who was getting dinner at a food cart on 42nd Street, as she did almost every night...

Then, near the end of October, Khoshakhlagh’s attorney got a phone call from the office of Manhattan District Attorney Cyrus Vance Jr. Prosecutors had changed their minds about the case: They now planned to prosecute Broadnax for the bullet wounds sustained by Khoshakhlagh and Ray. Under an unusual theory of law, the prosecution claimed Broadnax’s actions that night in Times Square were so irresponsible, it was as if he himself had pulled the trigger.

The top charge alone — assault in the first degree — was enough to land him in prison for the next twenty-five years...

As Binder says, when police are involved in incidents like this — high-profile and potentially embarrassing mishaps — there can be pressure to ensure that someone, anyone, catches an indictment. “It’s not surprising that Mr. Broadnax was charged,” Binder says. “This is something prosecutors do when police behave irresponsibly.

More details in a longread in The Village Voice.

Actually in The Onion:

Scientists Announce Discovery Of Dry Ice On Mars Means Planet May One Day Be Suitable For Halloween Party

Tourist In White House Gift Shop Browses Rack Of Security Clearances

Purdue Pharma Reports Opioid Deaths Falling Short Of Quarterly Goals

12 Apr 23:25

Primate Amygdala Neurons Simulate Decision Processes of Social Partners

by Fabian Grabenhorst, Raymundo Báez-Mendoza, Wilfried Genest, Gustavo Deco, Wolfram Schultz

When monkeys observe and learn from each other’s choices, neurons in the amygdala spontaneously encode decision computations to simulate the social partner’s choices.

12 Apr 23:23

Barber paradox at the time of “excellence”

by tomate

“A measure of the flexibility of excellence is that it allows the inclusion of reputation as one category among others in a ranking which is in fact definitive of reputation. The metalepsis that allows reputation to be 20 percent of itself is permitted by the intense flexibility of excellence; it allows a category mistake to masquerade as scientific objectivity.”

Bill Readings, The University in Ruins.

View attached file (1f56f97d97b228240272e0ba442cda02?s=96&d=identicon&r=G, unknown)

12 Apr 23:21

The Pi Calculus: Towards Global Computing

by John Baez

Check out the video of Christian Williams’’s talk in the Applied Category Theory Seminar here at U. C. Riverside. It was nicely edited by Paola Fernandez and uploaded by Joe Moeller.

Abstract. Historically, code represents a sequence of instructions for a single machine. Each computer is its own world, and only interacts with others by sending and receiving data through external ports. As society becomes more interconnected, this paradigm becomes more inadequate – these virtually isolated nodes tend to form networks of great bottleneck and opacity. Communication is a fundamental and integral part of computing, and needs to be incorporated in the theory of computation.

To describe systems of interacting agents with dynamic interconnection, in 1980 Robin Milner invented the pi calculus: a formal language in which a term represents an open, evolving system of processes (or agents) which communicate over names (or channels). Because a computer is itself such a system, the pi calculus can be seen as a generalization of traditional computing languages; there is an embedding of lambda into pi – but there is an important change in focus: programming is less like controlling a machine and more like designing an ecosystem of autonomous organisms.

We review the basics of the pi calculus, and explore a variety of examples which demonstrate this new approach to programming. We will discuss some of the history of these ideas, called “process algebra”, and see exciting modern applications in blockchain and biology.

“… as we seriously address the problem of modelling mobile communicating systems we get a sense of completing a model which was previously incomplete; for we can now begin to describe what goes on outside a computer in the same terms as what goes on inside – i.e. in terms of interaction. Turning this observation inside-out, we may say that we inhabit a global computer, an informatic world which demands to be understood just as fundamentally as physicists understand the material world.” — Robin Milner

The talks slides are here.

Reading material:

• Robin Milner, The polyadic pi calculus: a tutorial.

• Robin Milner, Communicating and Mobile Systems.

• Joachim Parrow, An introduction to the pi calculus.

View attached file (34784534843022b3541c8ddd693718cb?s=96&d=identicon&r=G, unknown)

Adam.hill likes this

12 Apr 22:59

Prismatic cohomology

by Terence Tao

Last week, we had Peter Scholze give an interesting distinguished lecture series here at UCLA on “Prismatic Cohomology”, which is a new type of cohomology theory worked out by Scholze and Bhargav Bhatt. (Video of the talks will be available shortly; for now we have some notes taken by two note–takers in the audience on that web page.) My understanding of this (speaking as someone that is rather far removed from this area) is that it is progress towards the “motivic” dream of being able to define cohomology ${H^i(X/\overline{A}, A)}$ for varieties ${X}$ (or similar objects) defined over arbitrary commutative rings ${\overline{A}}$ , and with coefficients in another arbitrary commutative ring ${A}$ . Currently, we have various flavours of cohomology that only work for certain types of domain rings ${\overline{A}}$ and coefficient rings ${A}$ :

Singular cohomology, which roughly speaking works when the domain ring ${\overline{A}}$ is a characteristic zero field such as ${{\bf R}}$ or ${{\bf C}}$ , but can allow for arbitrary coefficients ${A}$ ;
de Rham cohomology, which roughly speaking works as long as the coefficient ring ${A}$ is the same as the domain ring ${\overline{A}}$ (or a homomorphic image thereof), as one can only talk about ${A}$ -valued differential forms if the underlying space is also defined over ${A}$ ;
${\ell}$ -adic cohomology, which is a remarkably powerful application of étale cohomology, but only works well when the coefficient ring ${A = {\bf Z}_\ell}$ is localised around a prime ${\ell}$ that is different from the characteristic ${p}$ of the domain ring ${\overline{A}}$ ; and
Crystalline cohomology, in which the domain ring is a field ${k}$ of some finite characteristic ${p}$ , but the coefficient ring ${A}$ can be a slight deformation of ${k}$ , such as the ring of Witt vectors of ${k}$ .

There are various relationships between the cohomology theories, for instance de Rham cohomology coincides with singular cohomology for smooth varieties in the limiting case ${A=\overline{A} = {\bf R}}$ . The following picture Scholze drew in his first lecture captures these sorts of relationships nicely:

20190312_145136

The new prismatic cohomology of Bhatt and Scholze unifies many of these cohomologies in the “neighbourhood” of the point ${(p,p)}$ in the above diagram, in which the domain ring ${\overline{A}}$ and the coefficient ring ${A}$ are both thought of as being “close to characteristic ${p}$ ” in some sense, so that the dilates ${pA, pA'}$ of these rings is either zero, or “small”. For instance, the ${p}$ -adic ring ${{\bf Z}_p}$ is technically of characteristic ${0}$ , but ${p {\bf Z}_p}$ is a “small” ideal of ${{\bf Z}_p}$ (it consists of those elements of ${{\bf Z}_p}$ of ${p}$ -adic valuation at most ${1/p}$ ), so one can think of ${{\bf Z}_p}$ as being “close to characteristic ${p}$ ” in some sense. Scholze drew a “zoomed in” version of the previous diagram to informally describe the types of rings ${A,A'}$ for which prismatic cohomology is effective:

20190312_145157

To define prismatic cohomology rings ${H^i_\Delta(X/\overline{A}, A)}$ one needs a “prism”: a ring homomorphism from ${A}$ to ${\overline{A}}$ equipped with a “Frobenius-like” endomorphism ${\phi: A \to A}$ on ${A}$ obeying some axioms. By tuning these homomorphisms one can recover existing cohomology theories like crystalline or de Rham cohomology as special cases of prismatic cohomology. These specialisations are analogous to how a prism splits white light into various individual colours, giving rise to the terminology “prismatic”, and depicted by this further diagram of Scholze:

20190313_152011

(And yes, Peter confirmed that he and Bhargav were inspired by the Dark Side of the Moon album cover in selecting the terminology.)

There was an abstract definition of prismatic cohomology (as being the essentially unique cohomology arising from prisms that obeyed certain natural axioms), but there was also a more concrete way to view them in terms of coordinates, as a “ ${q}$ -deformation” of de Rham cohomology. Whereas in de Rham cohomology one worked with derivative operators ${d}$ that for instance applied to monomials ${t^n}$ by the usual formula

$\displaystyle d(t^n) = n t^{n-1} dt,$

prismatic cohomology in coordinates can be computed using a “ ${q}$ -derivative” operator ${d_q}$ that for instance applies to monomials ${t^n}$ by the formula

$\displaystyle d_q (t^n) = [n]_q t^{n-1} d_q t$

where

$\displaystyle [n]_q = \frac{q^n-1}{q-1} = 1 + q + \dots + q^{n-1}$

is the “ ${q}$ -analogue” of ${n}$ (a polynomial in ${q}$ that equals ${n}$ in the limit ${q=1}$ ). (The ${q}$ -analogues become more complicated for more general forms than these.) In this more concrete setting, the fact that prismatic cohomology is independent of the choice of coordinates apparently becomes quite a non-trivial theorem.

View attached file (3c795880f3b73784a9b75fbff3772701?s=96&d=identicon&r=PG, unknown)

12 Apr 22:51

Quantum resource theories

by Eric Chitambar and Gilad Gour

Author(s): Eric Chitambar and Gilad Gour

This review introduces a new development in theoretical quantum physics, the “resource-theoretic” point of view. The approach aims to be closely linked to experiment, and to state exactly what result you can hope to achieve for what expenditure of effort in the laboratory. This development is an extension of the principles of thermodynamics to quantum problems; but there are resources that would never have been considered previously in thermodynamics, such as shared knowledge of a frame of reference. Many additional examples and new quantifications of resources are provided.

[Rev. Mod. Phys. 91, 025001] Published Thu Apr 04, 2019

ZHU YINGHAO likes this

12 Apr 22:48

The Shape of a Life

by woit

I just finished reading The Shape of a Life, which is the great geometer Shing-Tung Yau’s autobiography, co-authored with Steve Nadis. It’s quite fascinating, and an essential read for anyone interested in the history of modern mathematics. Yau has been for a long time a central figure in the field of geometric analysis, so this is in some ways as much an autobiography of the subject as well as of the man.

Back in 2010 I wrote here about an earlier volume by Yau and Nadis, The Shape of Inner Space. What I really liked about that book (and discussed in some detail there) was the autobiographical material about Yau. Much of the book though was devoted to topics like string theory attempts to get physics out of Calabi-Yaus, with a discussion that was detailed and accurate, but to my mind often not of great interest (since these attempts don’t work…).

The new book seems to have been written specifically to appeal to me, greatly expanding the autobiographical material of the earlier book, while limiting the discussion of dubious speculative physics. There is still a fair amount about physics, but this time more focused on another of Yau’s interests, the mathematical theory of general relativity.

The book begins with the story of Yau’s early years in Hong Kong, how he managed to survive an impoverished childhood, avoid becoming a duck farmer, and ultimately find a way to get to the US and graduate study in mathematics at Berkeley. It’s a compelling story of that period and those places. It’s also about the best example I can think of to show how bringing someone with undeveloped talent into the environment of a first-rate research university can change their life, liberating them to accomplish great things, with dramatic impact on their intellectual development as well as that of a whole field.

Yau has always had a deep interest in the history of mathematics, and the story he tells of his intellectual development explains in detail how his own work and ideas grew out of earlier strands of thought. Even as a graduate student, he had started to develop the point of view that has been so fruitful in geometric analysis, using the study of non-linear partial differential equations to prove theorems about geometry and topology. Besides his proof of the Calabi conjecture, this ultimately led to the proof of the Poincare conjecture, a story Yau explains in detail.

Over the years Yau has been involved in various controversies over priority for mathematical results. In this book he doesn’t shy away from discussing these, but generally gives a measured explanation of his point of view on what happened. There’s also a fair number of often amusing stories about mathematicians and the math community that liven up the history. For one sort of example, there are Yau’s descriptions of his culture clash with the long-haired, pot-smoking Berkeley of 1969. For another, here’s a story about Richard Hamilton (of whom Yau has a very high opinion) and his 1982 lectures at the IAS:

Hamilton, who had come from Cornell, stayed for a week in an IAS apartment. At the end of his stay, the chief math secretary was livid because Hamilton had made a huge mess of the apartment, and it took a long time to clean up the place. On the other hand, he had given some wonderful talks, and collaborations between Hamilton, my students, and me picked up from that time forward. So, on balance, his visit would have to be called a great success. Hamilton may have posed some challenges to the cleaning and janitorial staff, but he had posed even more consequential challenges to the mathematics community, some of which were taken up by members of my group.

Yau is generally considered a major figure not just for his research, but also as a politician of the mathematics community, deeply involved for many years in efforts to build or expand research centers, here and in China. A recent example is the creation of the CMSA at Harvard. He has a lot to say about the stories of these efforts, and he definitely does not do so with the style of the politician careful to offend no one. In this book you get Yau’s honest, unvarnished version of what happened, as well as his analysis of some general problems, and I won’t be surprised if some people take offense at this material.

One thing there’s perhaps a bit too much of in the book are the references to his conflicts with his advisor Shiing-Shen Chern (which I’d somehow never heard about before). A major touching theme though throughout the book is that of fathers, sons and traditions of filial piety. There’s a lot about Yau’s father (who Yau very much looked up to) and quite a bit about his sons. On the mathematical side, there’s a lot about his numerous students, many of whom have gone on to important academic careers. As his academic father, Chern also fits into this theme, although not so felicitously. At the end of the book, Yau looks forward to his own future as, like Chern before him, the grand old man of the field. He’s planning more teaching and less research, and taking pleasure in his mathematical legacy and progeny.

12 Apr 22:44

Why Trust a Theory?

by woit

I noticed today that Cambridge University Press has recently published Why Trust a Theory?, a volume of articles based on a December 2015 conference held in Munich. The book is available online here (if your university is paying for it…), and preprint versions of many of the contributions are on the arXiv.

The conference had its origins in a piece published a year earlier in Nature by George Ellis and Joe Silk, entitled Scientific method: Defend the integrity of physics. Ellis and Silk made a forceful case that widely advertised but inherently untestable string theory and multiverse research does damage to the public understanding of science and is a threat to the credibility of science at a time it is under attack. The piece suggested:

A conference should be convened next year to take the first steps. People from both sides of the testability debate must be involved.

Looking through the proceedings volume, there’s lots of abstract discussion of philosophy of science and some diversity of points of view on the multiverse. When it comes to string theory though, the organizers interpreted “people on both sides” to mean bringing in one person willing to point out that there is a problem with string theory, and an army of string theorists to defend the theory. On the issue of the problems of string theory, the volume contains nearly 100 pages of pro-string theory hype, from Polchinski (two contributions), Silverstein, Kane and Quevedo. As usual with Kane, there’s a string theory “prediction” of the gluino mass (1.5 TeV +/- 10-15%) which has already been falsified. All I could find on the side of substantive criticism of string theory was in Carlo Rovelli’s contribution (preprint version here), and mainly in a single paragraph:

String theory is a living proof of the dangers of excessive reliance on non-empirical arguments. It raised great expectations thirty years ago, promising to compute all the parameters of the Standard Model from first principles, to derive from first principles its symmetry group SU(3)×SU(2)×U(1) and the existence of its three families of elementary particles, to predict the sign and the value of the cosmological constant, to predict novel observable physics, to understand the ultimate fate of black holes, and to offer a unique, well-founded unified theory of everything. Nothing of this has come true. String theorists, instead, have predicted a negative cosmological constant, deviations from Newton’s 1/r^2 law at sub-millimeters scale, black holes at the European Organization for Nuclear Research(CERN), low-energy super-symmetric particles, and more. All this was false. Still, Joe Polchinski, a prominent string theorist, writes [7] that he evaluates the Bayesian probability of string to be correct at 98.5% (!). This is clearly nonsense.

I won’t spend more time here discussing the conference and the articles in this volume, mainly because I’ve already written a lot about this in previous posts. For a contemporaneous discussion of the conference and Polchinski’s String Theory to the Rescue paper, see here and here. There are also interesting blog posts about the conference from Massimo Pigliucci, see here, here and here, and a Quanta piece by Natalie Wolchover here. For a discussion of Sean Carroll’s Beyond Falsifiability contribution, see here (and discussion here and here). For a discussion of Eva Silverstein’s contribution, see here.

Update: A few more links to material about the Munich conference: Jim Baggott here and here, Andrew Gelman here, Davide Castelvecchi here, and the conference website (with videos) here.

Update: Looking at the Preface, I notice that the editors claim:

Additional contributions were solicited by the editors with the aim of ensuring as full and balanced presentation as possible of the various positions in the debate.

With regards to string theory, the one additional contribution in the volume is from string theorist Eva Silverstein, so evidently the editors felt that balance required yet more on the pro-string theory side….

Update: I mischaracterized Polchinski’s calculation of the probability that string theory is correct as 98.5%. More accurately, he claims that the probability is “over 3 sigma” (i.e. over 99.73%).

Update: I finally got around to watching the videos of the panel discussions at the workshop (all videos available here). What most struck me about these discussions was the heavily dominant role of David Gross, who was on two of three panels, participating from the audience in the third. On the panels he was on, Gross was speaking far more than anyone else, and rarely if at all would anyone disagree with him. Gross’s point of view is that there is a testability problem with the multiverse, but all is well with string theory (although probably not at Polchinski’s “over 99.73% sure to be true” level). He’s a powerful intellect and a forceful speaker, so it’s not surprising that no one would take him on. But on the topic of string theory I think there are very serious problems with many of the claims he makes (for his arguments of 15 years ago, see the first substantive post of this blog), and the organizers should have found someone willing to challenge him on those.

12 Apr 22:43

The Topology of Neural Networks, Part 2: Compositions and Dimensions

by Jesse Johnson

In Part 1 of this series, I gave an abstract description of one of the main problems in Machine Learning, the Generalization Problem, in which one uses the values of a function at a finite number of points to infer the entire function. The typical approach to this problem is to choose a finite-dimensional subset of the space of all possible functions, then choose the function from this family that minimizes something called cost function, defined by how accurate each function is on the sampled points. In this post, I will describe how the regression example from the last post generalizes to a family of models called Neural Networks, then describe how I recently used some fairly basic topology to demonstrate restrictions on the types of functions certain neural networks can produce.

First lets recall the setup: We have a finite set of points $D_T \subset X \times Y$ and we want to find a function $f : X \to Y$ such that for each $(x, y) \in D_T$ , the distance from $f(x)$ to $y$ in $Y$ is “as small as possible”. We do this by choosing a map $\mu : \mathbf{R}^k \to C^0(X, Y)$ , then picking the point in $\mathbf{R}^k$ whose image minimizes a previously chosen cost function. In the last post, we also had a second set $D_E$ that we used to evaluate how well we did, but we won’t need that in this post.

The basic example of this is linear regression where $f(x) = mx + b$ , so $X$ and $Y$ are both one-dimensional, $k = 2$ and $R^k$ is spanned by the variables $m, b$ . This can be generalized to higher-dimensional $X$ and $Y$ by replacing the scalars $m, b$ by a matrix and a vector, respectively. This is higher-dimensional linear regression.

Neural networks define a very general framework for defining other families of functions. One goal of this post is to describe a large (but not complete) chunk of this framework. But before we do that, we need to go from linear regression to logistic regression.

Logistic regression is used for prediction problems where rather than predicting a value associated with a data point, we want to predict the probability that a data point is in a given class or meets some condition. So we want a function whose range is the interval $[0, 1]$ rather than all real numbers.

Logistic regression accomplishes this by composing the family of functions from linear regression with the logistic function a scaled and translated version of hyperbolic tangent that maps the line to the interval $[0, 1]$ . So if $X$ is one-dimensional, then we will still have $k = 2$ but a point $(m, b) \in \mathbf{R}^k$ will map to the function $f(x) = logistic(mx + b)$ .

In the training set for a logistic regression problem, the $y$ -value of each datapoint will be either 0 or 1. The cost function for each data point is the negative log of the difference between the predicted and actual values. This logarithm is not arbitrary – it’s the result of calculating the likelihood (in the statistical sense) of the function (interpreted as a probability distribution) and applying a logarithm to turn the multiplication in the definition of likelihood into the addition required for the cost function. Details left to the reader.

As with linear regression, we can increase the dimension of $X$ or $Y$ in logistic regression. For $X$ , it’s simply a matter of changing $m$ to a vector, since we can still apply the logistic function to the one-dimensional output. To increase the dimension of $Y$ , we further promote $m$ to a matrix and $b$ to a vector, then apply the logistic function independently to each output dimension. We can interpret this as each dimension of $Y$ predicting a different class/condition. So logistic regression is not basis-independent in the way that linear regression is.

In the realm of machine learning, the logistic function in logistic regression is called an activation function. The idea is that each dimension represents a neuron and the output value represents whether or not the neuron is firing. So the activation function determines how the linear combination of values feeding into the neuron determines whether or not it should fire. (In theory, the output of the activation function should be a boolean, but then we couldn’t do gradient descent.) Another popular activation function is the Rectified Linear Unit (ReLU) $f(x) = max(x, 0)$ .

Now that we know what logistic regression is and what activation functions are, we can define a large family of neural networks by simply composing a chain of (higher-dimensional) logistic regressions. Each regression is called a layer. The outputs dimensions have to match up to the input dimensions, and the overall parameter space of the composition is the direct product of the parameter spaces of the layers. We can use the same activation functions between layers, or different ones.

That’s it.

The reason this is called an (artificial) neural network is because we can think of the output dimensions of each layer as being neurons that are connected to neurons in the next layer. The linear part of each logistic regression defines the input to each neuron as a weighted sum of the previous neurons’ outputs. For this reason, the matrix is often called a weight matrix and the values are called weights. Every layer except the last one is called a hidden layer.

Note that the weights, which make up the parameters of the function family, only modify the linear steps in the neural network, while the activation functions remain fixed. However, without the non-linear activation functions the neural network would just be a composition of linear functions which would just produce a single linear function.

As I noted, this is a large family of neural networks, but the concept is much more general. There are neural networks where there are connections between non-consecutive layers, or where the neurons aren’t in layers at all. There are networks where certain weights are constrained to be the same as each other (such as convolutional neural networks) and networks that take a sequence of inputs (Recurrent Neural Networks and Long Short-Term Memory Machines).

But the family I defined above is where one typically starts, and that’s what I’m going to focus on for the last few paragraphs of this blog post.

As with linear or logistic regression, a neural network defines a finite-dimensional family of functions. But unlike with regression, there isn’t an easy way to characterize these functions. It was proved in 1989, by three independent groups, that if you have one hidden layer, but allow it to have arbitrarily high dimension, you can approximate any continuous function to an arbitrarily small epsilon on any compact subset.

However, there’s an open question of whether something similar is true if you restrict the dimension but allow arbitrarily many hidden layers. The preprint I mentioned in the last post proves that for neural networks with a one-dimensional output, if the hidden layers are only allowed to have dimension less than or equal to the input dimension, there are certain functions that can’t be approximated, regardless of the number of layers. In particular, every component of every level set in such a function must be unbounded.

The main observation in the proof is that for a neural network with a one-dimensional output, if the hidden layers are all the same dimension, the weight matrices are all non-singular and the activation function is one-to-one, then the composition of all the functions up to right before the last linear function will be one-to-one too. That last linear function is just a projection onto the line, so the preimage of any point into the last hidden layer is a hyperplane. Since the activation functions may not be onto, the preimage of this hyperplane in the one-to-one map may not be a topological hyperplane, but it will be an unbounded subset of the domain.

But the Theorem also applies to networks with lower-dimensional hidden layers, and doesn’t make assumptions about the weight matrix. That’s because these functions are all in the closure of the previous set so there’s a limit argument to show that their level sets also have to be unbounded. In fact you can also apply the limit argument if the activation function is a uniform limit of one-to-one functions, like the ReLU. The proof is the type of argument you might see in an undergraduate topology class.

So as with many examples of applied math, the mathematics of this result isn’t particularly complex. What makes it interesting is the connection to ideas that are studied elsewhere. It also suggests a new set of problems that could lead to more interesting math, namely the question of how to characterize finite-dimensional families of functions such as those defined by neural networks.

View attached file (436b5ec3552d5e81ed4f45de75d36397?s=96&d=identicon&r=G, unknown)

09 Mar 04:21

Applied Category Theory Course – Videos

by John Baez

Yay! David Spivak and Brendan Fong are teaching a course on applied category theory based on their book, and the lectures are on YouTube! Here are the first two videos:

Their book is free here:

• Brendan Fong and David Spivak, Seven Sketches in Compositionality: An Invitation to Applied Category Theory.

If you’re in Boston you can actually go to the course. It’s at MIT January 14 – Feb 1, Monday-Friday, 14:00-15:00 in room 4-237.

They taught it last year too, and last year’s YouTube videos are on the same YouTube channel.

Also, I taught a course based on the first 4 chapters of their book, and you can read my “lectures”, see discussions and do problems here:

• Applied category theory course.

So, there’s no excuse not to start applying category theory in your everday life!

View attached file (34784534843022b3541c8ddd693718cb?s=96&d=identicon&r=G, unknown)

Adam.hill likes this

09 Mar 02:06

The antiquity of "Snakes and Ladders"

by Minnesotastan

According to Veda, the game was created by the 13th century poet saint Gyandev.

In the original game square 12 was faith, 51 was Reliability, 57 was Generosity, 76 was Knowledge, and 78 was Asceticism. These were the squares where the ladder was found. Square 41 was for Disobedience, 44 for Arrogance, 49 for Vulgarity, 52 for Theft, 58 for Lying, 62 for Drunkenness, 69 for Debt, 84 for Anger, 92 for Greed, 95 for Pride, 73 for Murder and 99 for Lust. These were the squares where the snake was found. The Square 100 represented Nirvana or Moksha.

More info:

Snakes and Ladders originated in India as part of a family of dice board games that included Gyan chauper and pachisi (present-day Ludo and Parcheesi). The game made its way to England and was sold as "Snakes and Ladders", then the basic concept was introduced in the United States as Chutes and Ladders by game pioneer Milton Bradley in 1943.

The game was popular in ancient India by the name Moksha Patam. It was also associated with traditional Hindu philosophy contrasting karma and kama, or destiny and desire. It emphasized destiny, as opposed to games such as pachisi, which focused on life as a mixture of skill (free will) and luck. The underlying ideals of the game inspired a version introduced in Victorian England in 1892. The game has also been interpreted and used as a tool for teaching the effects of good deeds versus bad. The board was covered with symbolic images, the top featuring gods, angels, and majestic beings, while the rest of the board was covered with pictures of animals, flowers and people.

The ladders represented virtues such as generosity, faith, and humility, while the snakes represented vices such as lust, anger, murder, and theft. The morality lesson of the game was that a person can attain salvation (Moksha) through doing good, whereas by doing evil one will inherit rebirth to lower forms of life. The number of ladders was less than the number of snakes as a reminder that a path of good is much more difficult to tread than a path of sins. Presumably, reaching the last square (number 100) represented the attainment of Moksha (spiritual liberation).

When the game was brought to England, the Indian virtues and vices were replaced by English ones in hopes of better reflecting Victorian doctrines of morality. Squares of Fulfillment, Grace and Success were accessible by ladders of Thrift, Penitence and Industry and snakes of Indulgence, Disobedience and Indolence caused one to end up in Illness, Disgrace and Poverty. While the Indian version of the game had snakes outnumbering ladders, the English counterpart was more forgiving as it contained each in the same amount. This concept of equality signifies the cultural ideal that for every sin one commits, there exists another chance at redemption.

Interesting that success in the game as originally designed depended entirely on luck (roll of dice) with no apparent skills or strategy involved; perhaps that's part of the karma lesson. AFAIK, the American version didn't incorporate any virtues or sins - it was more like random good and bad luck. I may be misremembering. But I certainly didn't know it was an ancient game.

Yohan likes this

31 Dec 19:33

An Electrifying Idea

by monbiot

What if we abandoned photosynthesis as the means of producing food, and released most of the world’s surface from agriculture?

By George Monbiot, published in the Guardian 31^st October 2018

It’s not about “them”, it’s about us. The horrific rate of biological annihilation reported this week – 60% of the Earth’s vertebrate wildlife gone since 1970 – is driven primarily by the food industry. Farming and fishing are the major causes of the collapse of both marine and terrestrial ecosystems. Meat – consumed in greater quantities by the rich than by the poor – is the strongest cause of all. We might shake our heads in horror at the clearance of forests, the drainage of wetlands, the slaughter of predators and the massacre of sharks and turtles by fishing fleets, but it is done at our behest.

As the Guardian’s recent report from Argentina reveals, the huge forests of the Gran Chaco are heading towards extermination, as they are replaced by deserts of soya beans, almost all of which are used to produce animal feed, particularly for Europe. With Jair Bolsonaro in power in Brazil, deforestation in the Amazon is likely to accelerate, much of it driven by the beef lobby that helped bring him to power. The great forests of Indonesia and West Papua are being felled and burnt for oil palm at devastating speed.

The most important environmental action we can take is to reduce the area of land and sea used by farming and fishing. This means, above all, switching to a plant-based diet: research published in the journal Science shows that cutting out animal products would reduce the global requirement for farmland by 76%. It would also give us a fair chance of feeding the world. Grazing is no answer to the ecocide caused by grain-fed livestock: it is an astonishingly wasteful use of vast tracts of land that would otherwise support wildlife and wild ecosystems.

The same action is essential to prevent climate breakdown. Because governments, bowing to the demands of capital, have left it so late, it is almost impossible to see how we can stop more than 1.5° of global warming without drawing carbon dioxide out of the atmosphere. The only way of doing it that has been demonstrated at scale is to allow trees to return to deforested land.

But could we go beyond even a plant-based diet? Could we go beyond agriculture itself? What if, instead of producing food from soil, we were to produce it from air? What if, instead of basing our nutrition on photosynthesis, we were to use electricity, to fuel a process whose conversion of sunlight into food is ten times more efficient?

This sounds like science fiction, but it is already approaching commercialisation. For the past year, a group of Finnish researchers has been producing food without either animals or plants. Their only ingredients are hydrogen-oxidising bacteria, electricity from solar panels, a small amount of water, carbon dioxide drawn from the air, nitrogen and trace quantities of minerals such as calcium, sodium, potassium and zinc. The food they have produced is 50 to 60% protein, the rest is carbohydrate and fat. They have started a company (Solar Foods), which seeks to open its first factory in 2021. This week it was selected as an incubation project by the European Space Agency.

They use electricity from solar panels to electrolyse water, producing hydrogen, that feeds bacteria (which turn it back into water). Unlike other forms of microbial protein (such as Quorn), it requires no carbohydrate feedstock – in other words, no plants.

Perhaps you are horrified by this prospect. Certainly, there’s nothing beautiful about it. It would be hard to write a pastoral poem about bacteria grazing on hydrogen. But this is part of the problem. We have allowed a mythical aesthetic to blind us to the ugly realities of industrial agriculture. Instilled with an image of farming that begins in infancy, as about half the books for very small children involve a rosy-cheeked farmer with one cow, one horse, one pig and one chicken, living in bucolic harmony, we fail to see the amazing cruelty of large-scale animal farming, the blood and gore, filth and pollution. We fail to apprehend the mass clearance of land required to feed us, the Insectageddon caused by pesticides, the drying up of rivers, the loss of soil, the reduction of the magnificent diversity of life on Earth to a homogenous grey waste.

The compound the Finnish researchers have produced from air, water and electricity is most likely to be used as a bulk ingredient in processed food. But (though this goes well beyond the company’s current plans) is there any reason why, with modifications of the process, it could not start to deliver the proteins required to make cultured meat, or the oils that could render palm plantations redundant? Is there any reason why it should not eventually replace much of what we eat?

According to the researchers’ estimates, 20,000 times less land is required for their factories than to produce the same amount of food by growing soya. Cultivating all the protein the world now eats with their technique would require an area smaller than Ohio. The best places to do it are deserts, where solar energy is most abundant. When electricity can be generated at €15 per megawatt hour (a few years hence), their process becomes cost-competitive with the cheapest source of soya.

Could a similar technique also be used to produce cellulose and lignin, eventually replacing the need for commercial forestry? Is there any inherent reason why the hydrogen pathway could not create as many products as photosynthesis does today? Could it help to change our entire relationship with the natural world, reducing our footprint to a fraction of its current size?

There are plenty of questions to be answered, plenty of possible hurdles and constraints. But think of the possibilities. Agricultural commodities, currently using almost all the Earth’s fertile land area, could be shrunk into a few small pockets of infertile land. The potential for ecological restoration is astonishing. The potential for feeding the world, a question that has literally been keeping me awake at night, is just as electrifying.

None of this means we can afford to relax and wait for an infant technology to save us. In the meantime, as urgent intermediate steps, we should switch to a plant-based diet and mobilise against the destruction of the living planet. You could start by joining the Extinction Rebellion that launches today [Wednesday].

But if this works, it could help, alongside political mobilisation, to change almost everything. Places which have become agricultural deserts, trashed by giant corporations, could be reforested, drawing carbon dioxide from the air on a vast scale. The ecosystems of land and sea could recover, not just in pockets but across great tracts of the planet. A new age of global hunger becomes less likely.

Crude and destructive technologies got us into this mess. Refined technologies can help get us out of it. The struggle to save every possible species and ecosystem from the current wave of destruction is worthwhile. One day, perhaps within our lifetimes, they could repopulate a thriving world.

www.monbiot.com

26 Dec 21:08

A Darwinian Uncertainty Principle

by Gascuel, O.

Reconstructing ancestral characters and traits along a phylogenetic tree is central to evolutionary biology. It is the key to understanding morphology changes among species, inferring ancestral biochemical properties of life, and recovering migration routes in phylogeography. The goal is twofold: to reconstruct the character state at the tree root (e.g. the region of origin of some species), and to understand the process of state changes along the tree (e.g. species flow between countries). Although each goal can be achieved with high accuracy individually, we use mathematics and simulations to demonstrate that it is generally impossible to accurately estimate both the root state and the rates of state changes along the tree branches from the observed data at the tips of the tree. This inherent Darwinian uncertainty principle concerning the simultaneous estimation of pattern and process governs ancestral reconstructions in biology. Increasing the number of tips improves the joint estimation accuracy for certain tree shapes that arise in evolutionary models, however, for other trees shapes it does not.

Nosimpler

Shared posts

Preliminary Warning: Misalignment in Dialogue

Context-Freeness and Monoidal Categories

Natural Language and Functorial Semantics

Categorical Compositional Distributional Models

Contextual Grammaticality with Dynamic Syntax

Bringing it All Together: DisCoCat Inc.