What is the value of the whole in terms of the values of the parts?
More specifically, given a finite set whose elements have assigned
“values” v 1,…,v nv_1, \ldots, v_n and assigned “sizes” p 1,…,p np_1, \ldots, p_n (normalized to sum to 11), how can
we assign a value σ(p,v)\sigma(\mathbf{p}, \mathbf{v}) to the set in a
coherent way?
This seems like a very general question. But in fact, just a few sensible
requirements on the function σ\sigma are enough to pin it down almost
uniquely. And the answer turns out to be closely connected to
existing mathematical concepts that you probably already know.
Let’s write
Δ n={(p 1,…,p n)∈ℝ n:p i≥0,∑p i=1}
\Delta_n = \Bigl\{ (p_1, \ldots, p_n) \in \mathbb{R}^n :
p_i \geq 0, \sum p_i = 1 \Bigr\}
for the set of probability distributions on {1,…,n}\{1, \ldots, n\}. Assuming that our
“values” are positive real numbers, we’re interested in sequences of
functions
(σ:Δ n×(0,∞) n→(0,∞)) n≥1
\Bigl(
\sigma \colon \Delta_n \times (0, \infty)^n \to (0, \infty)
\Bigr)_{n \geq 1}
that aggregate the values of the elements to give a value to the whole
set. So, if the elements of the set have relative sizes p=(p 1,…,p n)\mathbf{p} =
(p_1, \ldots, p_n) and values v=(v 1,…,v n)\mathbf{v} = (v_1, \ldots, v_n), then the
value assigned to the whole set is σ(p,v)\sigma(\mathbf{p}, \mathbf{v}).
Here are some properties that it would be reasonable for σ\sigma to
satisfy.
Homogeneity The idea is that whatever “value” means, the value of
the set and the value of the elements should be measured in the same
units. For instance, if the elements are valued in kilograms then the set
should be valued in kilograms too. A switch from kilograms to grams would then
multiply both values by 1000. So, in general, we ask that
σ(p,cv)=cσ(p,v)
\sigma(\mathbf{p}, c\mathbf{v})
=
c \sigma(\mathbf{p}, \mathbf{v})
for all p∈Δ n\mathbf{p} \in \Delta_n, v∈(0,∞) n\mathbf{v} \in (0, \infty)^n and c∈(0,∞)c
\in (0, \infty).
Monotonicity The values of the elements are supposed to make a
positive contribution to the value of the whole, so we ask that if
v i≤v′ iv_i \leq v'_i for all ii then
σ(p,v)≤σ(p,v′)
\sigma(\mathbf{p}, \mathbf{v}) \leq \sigma(\mathbf{p}, \mathbf{v}')
for all p∈Δ n\mathbf{p} \in \Delta_n.
Replication Suppose that our nn elements have the same size and
the same value, vv. Then the value of the whole set should be nvn v.
This property says, among other things, that σ\sigma isn’t an average: putting in more
elements of value vv increases the value of the whole set!
If σ\sigma is homogeneous, we might as well assume that v=1v =
1, in which case the requirement is that
σ((1/n,…,1/n),(1,…,1))=n.
\sigma\bigl( (1/n, \ldots, 1/n), (1, \ldots, 1) \bigr) = n.
Modularity This one’s a basic logical axiom, best illustrated by
an example.
Imagine that we’re very ambitious and wish to evaluate
the entire planet — or at least, the part that’s land. And suppose
we already know the values and relative sizes of every country.
We could, of course, simply put this data into σ\sigma and get an answer immediately.
But we could instead begin by evaluating each continent, and then
compute the value of the planet using the values and sizes of the
continents. If σ\sigma is sensible, this should give the same answer.
The notation needed to express this formally is a bit heavy. Let
w∈Δ n\mathbf{w} \in \Delta_n; in our example, n=7n = 7 (or however many continents there are) and w=(w 1,…,w 7)\mathbf{w} = (w_1, \ldots, w_7) encodes their
relative sizes. For each i=1,…,ni = 1, \ldots, n, let p i∈Δ k i\mathbf{p}^i \in
\Delta_{k_i}; in our example, p i\mathbf{p}^i encodes the relative sizes of the
countries on the iith continent. Then we get a probability distribution
w∘(p 1,…,p n)=(w 1p 1 1,…,w 1p k 1 1,…,w np 1 n,…,w np k n n)∈Δ k 1+⋯+k n,
\mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n)
=
(w_1 p^1_1, \ldots, w_1 p^1_{k_1},
\,\,\ldots, \,\,
w_n p^n_1, \ldots, w_n p^n_{k_n})
\in
\Delta_{k_1 + \cdots + k_n},
which in our example encodes the relative sizes of all the countries on the
planet. (Incidentally, this composition makes (Δ n)(\Delta_n) into an operad,
a fact that we’ve discussed many times
before
on this blog.) Also let
v 1=(v 1 1,…,v k 1 1)∈(0,∞) k 1,…,v n=(v 1 n,…,v k n n)∈(0,∞) k n.
\mathbf{v}^1 = (v^1_1, \ldots, v^1_{k_1}) \in (0, \infty)^{k_1},
\,\,\ldots,\,\,
\mathbf{v}^n = (v^n_1, \ldots, v^n_{k_n}) \in (0, \infty)^{k_n}.
In the example, v j iv^i_j is the value of the jjth country on the iith
continent. Then the value of the iith continent is σ(p i,v i)\sigma(\mathbf{p}^i, \mathbf{v}^i), so the axiom is that
σ(w∘(p 1,…,p n),(v 1 1,…,v k 1 1,…,v 1 n,…,v k n n))=σ(w,(σ(p 1,v 1),…,σ(p n,v n))).
\sigma
\bigl(
\mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n),
(v^1_1, \ldots, v^1_{k_1}, \ldots, v^n_1, \ldots, v^n_{k_n})
\bigr)
=
\sigma \Bigl( \mathbf{w},
\bigl( \sigma(\mathbf{p}^1, \mathbf{v}^1), \ldots, \sigma(\mathbf{p}^n,
\mathbf{v}^n) \bigr)
\Bigr).
The left-hand side is the value of the planet calculated in a single step,
and the right-hand side is its value when calculated in two steps,
with continents as the intermediate stage.
Symmetry It shouldn’t matter what order we list the elements
in. So it’s natural to ask that
σ(p,v)=σ(pτ,vτ)
\sigma(\mathbf{p}, \mathbf{v})
=
\sigma(\mathbf{p} \tau, \mathbf{v} \tau)
for any τ\tau in the symmetric group S nS_n, where the right-hand side
refers to the obvious S nS_n-actions.
Absent elements should count for nothing! In other words, if p 1=0p_1 = 0
then we should have
σ((p 1,…,p n),(v 1,…,v n))=σ((p 2,…,p n),(v 2,…,v n)).
\sigma\bigl( (p_1, \ldots, p_n), (v_1, \ldots, v_n)\bigr)
=
\sigma\bigl( (p_2, \ldots, p_n), (v_2, \ldots, v_n)\bigr).
This isn’t quite triival. I haven’t yet given you any examples of the kind of function that σ\sigma
might be, but perhaps you already have in mind a simple one like this:
σ(p,v)=v 1+⋯+v n.
\sigma(\mathbf{p}, \mathbf{v}) = v_1 + \cdots + v_n.
In words, the value of the whole is simply the sum of the values of the
parts, regardless of their sizes. But if σ\sigma is to have the “absent
elements” property, this won’t do. (Intuitively, if p i=0p_i = 0 then we
shouldn’t count v iv_i in the sum, because the iith element isn’t actually
there.) So we’d better modify this example slightly, instead taking
σ(p,v)=∑ i:p i>0v i.
\sigma(\mathbf{p}, \mathbf{v}) = \sum_{i \,:\, p_i \gt 0} v_i.
This function (or rather, sequence of functions) does have the “absent elements” property.
Continuity in positive probabilities Finally, we ask that for
each v∈(0,∞) n\mathbf{v} \in (0, \infty)^n, the function σ(−,v)\sigma(-, \mathbf{v})
is continuous on the interior of the simplex Δ n\Delta_n, that is,
continuous over those probability distributions
p\mathbf{p} such that p 1,…,p n>0p_1, \ldots, p_n \gt 0.
Why only over the interior of the simplex? Basically because of
natural examples of σ\sigma like the one just given, which is continuous
on the interior of the simplex but not the boundary. Generally, it’s
sometimes useful to make a sharp, discontinuous distinction between the
cases p i>0p_i \gt 0 (presence) and p i=0p_i = 0 (absence).
Arrow’s famous
theorem
states that a few apparently mild conditions on a voting system are, in
fact, mutually contradictory. The mild conditions above are not mutually
contradictory. In fact, there’s a one-parameter family σ q\sigma_q of
functions each of which satisfies these conditions. For real q≠1q \neq 1,
the definition is
σ q(p,v)=(∑ i:p i>0p i qv i 1−q) 1/(1−q).
\sigma_q(\mathbf{p}, \mathbf{v})
=
\Bigl( \sum_{i \,:\, p_i \gt 0} p_i^q v_i^{1 - q} \Bigr)^{1/(1 - q)}.
For instance, σ 0\sigma_0 is the example of σ\sigma given above.
The formula for σ q\sigma_q is obviously invalid at q=1q = 1, but it converges to a limit as q→1q
\to 1, and we define σ 1(p,v)\sigma_1(\mathbf{p}, \mathbf{v}) to be that limit.
Explicitly, this gives
σ 1(p,v)=∏ i:p i>0(v i/p i) p i.
\sigma_1(\mathbf{p}, \mathbf{v})
=
\prod_{i \,:\, p_i \gt 0} (v_i/p_i)^{p_i}.
In the same way, we can define σ −∞\sigma_{-\infty} and σ ∞\sigma_\infty as
the appropriate limits:
σ −∞(p,v)=max i:p i>0v i/p i,σ ∞(p,v)=min i:p i>0v i/p i.
\sigma_{-\infty}(\mathbf{p}, \mathbf{v})
=
\max_{i \,:\, p_i \gt 0} v_i/p_i,
\qquad
\sigma_{\infty}(\mathbf{p}, \mathbf{v})
=
\min_{i \,:\, p_i \gt 0} v_i/p_i.
And it’s easy to check that for each q∈[−∞,∞]q \in [-\infty, \infty], the
function σ q\sigma_q satisfies all the natural conditions listed above.
These functions σ q\sigma_q might be unfamiliar to you, but they have some
special cases that are quite well-explored. In particular:
Suppose you’re in a situation where the elements don’t have “sizes”.
Then it would be natural to take p\mathbf{p} to be the uniform
distribution u n=(1/n,…,1/n)\mathbf{u}_n = (1/n, \ldots, 1/n). In that case,
σ q(u n,v)=const⋅(∑v i 1−q) 1/(1−q),
\sigma_q(\mathbf{u}_n, \mathbf{v})
= const \cdot \bigl( \sum v_i^{1 - q} \bigr)^{1/(1 - q)},
where the constant is a certain power of nn. When q≤0q \leq 0, this is
exactly a constant times ‖v‖ 1−q\|\mathbf{v}\|_{1 - q}, the (1−q)(1 -
q)-norm
of the vector v\mathbf{v}.
Suppose you’re in a situation where the elements don’t have “values”.
Then it would be natural to take v\mathbf{v} to be 1=(1,…,1)\mathbf{1} = (1,
\ldots, 1). In that case,
σ q(p,1)=(∑p i q) 1/(1−q).
\sigma_q(\mathbf{p}, \mathbf{1})
=
\bigl( \sum p_i^q \bigr)^{1/(1 - q)}.
This is the quantity that ecologists know as the Hill number of order
qq
and use as a measure of biological diversity. Information theorists know
it as the exponential of the Rényi
entropy of order qq,
the special case q=1q = 1 being Shannon entropy. And actually, the general formula for σ q\sigma_q is very closely related to Rényi relative entropy (which Wikipedia calls Rényi divergence).
Anyway, the big — and as far as I know, new — result is:
Theorem The functions σ q\sigma_q are the only functions
σ\sigma with the seven properties above.
So although the properties above don’t seem that demanding, they actually
force our notion of “aggregate value” to be given by one of the functions
in the family (σ q) q∈[−∞,∞](\sigma_q)_{q \in [-\infty, \infty]}. And although I
didn’t even mention the notions of diversity or entropy in my justification
of the axioms, they come out anyway as special cases.
I covered all this yesterday in the tenth and penultimate installment of the
functional equations course that I’m giving. It’s written up on
pages 38–42 of the notes so
far. There you can also read
how this relates to more realistic measures of
biodiversity
than the Hill numbers. Plus, you can see an outline of the (quite
substantial) proof of the theorem above.