From the subreddit: Humans Are Hardwired To Dismiss Facts That Don’t Fit Their Worldview. Once you get through the preliminary Trump supporter and anti-vaxxer denunciations, it turns out to be an attempt at an evo psych explanation of confirmation bias:

Our ancestors evolved in small groups, where cooperation and persuasion had at least as much to do with reproductive success as holding accurate factual beliefs about the world. Assimilation into one’s tribe required assimilation into the group’s ideological belief system. An instinctive bias in favor of one’s in-group” and its worldview is deeply ingrained in human psychology.

I think the article as a whole makes good points, but I’m increasingly uncertain that confirmation bias can be separated from normal reasoning.

Suppose that one of my friends says she saw a coyote walk by her house in Berkeley. I know there are coyotes in the hills outside Berkeley, so I am not too surprised; I believe her.

Now suppose that same friend says she saw a polar bear walk by her house. I assume she is mistaken, lying, or hallucinating.

Is this confirmation bias? It sure sounds like it. When someone says something that confirms my preexisting beliefs (eg ‘coyotes live in this area, but not polar bears’), I believe it. If that same person provides the same evidence for something that challenges my preexisting beliefs, I reject it. What am I doing differently from an anti-vaxxer who rejects any information that challenges her preexisting beliefs (eg that vaccines cause autism)?

When new evidence challenges our established priors (eg a friend reports a polar bear, but I have a strong prior that there are no polar bears around), we ought to heavily discount the evidence *and* slightly shift our prior. So I should end up believing that my friend is probably wrong, but I should also be slightly less confident in my assertion that there are no polar bears loose in Berkeley today. This seems sufficient to explain confirmation bias, ie a tendency to stick to what we already believe and reject evidence against it.

The anti-vaxxer is still doing something wrong; she somehow managed to get a very strong prior on a false statement, and isn’t weighing the new evidence heavily enough. But I think it’s important to note that she’s attempting to carry out normal reasoning, and failing, rather than carrying out some special kind of reasoning called “confirmation bias”.

There are some important refinements to make to this model – maybe there’s a special “emotional reasoning” that locks down priors more tightly, and maybe people naturally overweight priors because that was adaptive in the ancestral environment. Maybe after you add these refinements, you end up at exactly the traditional model of confirmation bias (and the one the Fast Company article is using) and my objection becomes kind of pointless.

But not completely pointless. I still think it’s helpful to approach confirmation bias by thinking of it as a normal form of reasoning, and then asking under what conditions it fails.

Discuss

The post The Flight appeared first on The Perry Bible Fellowship.

*If you are an algebraic abstractologist, this post is probably not for you. Further meta-commentary can be found in the “meta” section, at the bottom of the post.*

So you’ve heard of this thing called “category theory”. Maybe you’ve met some smart people who say that’s it’s really useful and powerful for… something. Maybe you’ve even cracked open a book or watched some lectures, only to find that the entire subject seems to have been generated by training __GPT-2__ on a mix of algebraic optometry and output from __theproofistrivial.com__.

What is this subject? What could one do with it, other than write opaque math papers?

This introduction is for you.

This post will cover just the bare-bones foundational pieces: categories, functors, and natural transformations. I will mostly eschew the typical presentation; my goal is just to convey intuition for what these things mean. Depending on interest, I may post a few more pieces in this vein, covering e.g. limits, adjunction, Yoneda lemma, symmetric monoidal categories, types and programming, etc - leave a comment if you want to see more.

Outline:

- Category theory is the study of paths in graphs, so I’ll briefly talk about that and highlight some relevant aspects.
- What’s a category? A category is just a graph with some notion of equivalence of paths; we’ll see a few examples.
- Pattern matching: find a sub-category with a particular shape. Matches are called “functors”.
- One sub-category modelling another: commutative squares and natural transformations.

Here’s a graph:

Here are some paths in that graph:

- A -> B
- B -> C
- A -> B -> C
- A -> A
- A -> A -> A (twice around the loop)
- A -> A -> A -> B (twice around the loop, then to B)
- (trivial path - start at D and don’t go anywhere)
- (trivial path - start at A and don’t go anywhere)

In category theory, we usually care more about the edges and paths than the vertices themselves, so let’s give our edges their own names:

We can then write paths like this:

- A -> B is written y
- B -> C is written z
- A -> B -> C is written yz
- A -> A is written x
- A -> A -> A is written xx
- A -> A -> A -> B is written xxy
- The trivial path at D is written id_D (this is roughly a standard notation)
- The trivial path at A is written id_A

We can build longer paths by “composing” shorter paths. For instance, we can compose y (aka A -> B) with z (aka B -> C) to form yz (aka A -> B -> C), or we can compose x with itself to form xx, or we can compose xx with yz to form xxyz. We can compose two paths if-and-only-if the second path starts where the first one begins - we can’t compose x with z because we’d have to magically jump from A to B in the middle.

Composition is asymmetric - composing y with z is fine, but we can’t compose z with y.

Notice that composing id_A with x is just the same as x by itself: if we start at A, don’t go anywhere, and then follow x, then that’s the same as just following x. Similarly, composing x with id_A is just the same as x. Symbolically: id_A x = x id_A = x. Mathematically, id_A is an “identity” - an operation which does nothing; thus the “id” notation.

In applications, graphs almost always have data on them - attached to the vertices, the edges, or both. In category theory in particular, data is usually on the edges. When composing those edges to make paths, we also compose the data.

A simple example: imagine a graph of roads between cities. Each road has a distance. When composing multiple roads into paths, we add together the distances to find the total distance.

Finally, in our original graph, let’s throw in an extra edge from A to itself:

Our graph has become a “multigraph” - a graph with (potentially) more than one distinct edge between each vertex. Now we can’t just write a path as A -> A -> A anymore - that could refer to xx, xx’, x’x, or x’x’. In category theory, we’ll usually be dealing with multigraphs, so we need to write paths as a sequence of edges rather than the vertices-with-arrows notation. For instance, in our roads-and-cities example, there may be multiple roads between any two cities, so a path needs to specify which roads are taken.

Category theorists call paths and their associated data “morphisms”. This a terrible name, and we mostly won’t use it. Vertices are called “objects”, which is a less terrible name I might occasionally slip into.

A category is:

- a directed multigraph
- with some notion of equivalence between paths.

For instance, we could imagine a directed multigraph of flights between airports, with a cost for each flight. A path is then a sequence of flights from one airport to another. As a notion of equivalence, we could declare that two paths are equivalent if they have the same start and end points, and the same total cost.

There is one important rule: our notion of path-equivalence must respect composition. If path p is equivalent to q (which I’ll write p≅q), and x≅y, then we must have px≅qy. In our airports example, this would say: if two flight-paths p and q have the same cost (call it c1), and two flight-paths x and y have the same cost (call it c2), then the cost of px (i.e. c1+c2) must equal the cost of qy (also c1+c2).

Besides that, there’s a handful of boilerplate rules:

- Any path is equivalent to itself (reflexivity), and if x≅y and y≅z then x≅z (transitivity); these are the usual rules which define equivalence relations.
- Any paths with different start and end points must not be equivalent; otherwise expressions like “px≅qy” might not even be defined.

Let’s look at a few more examples. I’ll try to show some qualitatively different categories, to give some idea of the range available.

__Airports & Flights__

Our airport example is already a fairly general category, but we could easily add more bells and whistles to it. Rather than having a vertex for each airport, we could have a vertex for each airport at each time. Flights then connect an airport at one time to another airport at another time, and we need some zero-cost “wait” edges to move from an airport at one time to the same airport at a later time. A path would be some combination of flights and waiting. We might expect that the category has some symmetries - e.g. “same flights on different days” - and later we’ll see some tools to formalize those.

__Divisibility__

As a completely different example, consider the category of divisibility of positive integers:

This category has a path from n to m if-and-only-if n is divisible by m (written m | n, pronounced “m divides n”, i.e. 2 | 12 is read “two divides twelve”). The “data” on the edges is just the divisibility relations - i.e. 6 | 12 or 5 | 15:

We can compose these: 2|6 and 6|12 implies 2|12. A path 12 -> 6 -> 2 in this category is, in some sense, a proof that 12 is divisible by 2 (given all the divisibility relations on the edges). Note that *any* two paths from 12 to 2 produce the same result - i.e. 12 -> 4 -> 2 also gives 2|12. More generally: in this category, any two paths between the same start and end points are equivalent.

__Types & Functions__

Yet another totally different direction: consider the category of types in some programming language, with functions between those types as edges:

This category has a LOT of stuff in it. There’s a function for addition of two integers, which goes from (int, int) to int. There’s another function for multiplication of two integers, also from (int, int) to int. There are functions operating on lists, strings, and hash tables. There are functions which haven’t been written in the entire history of programming, with input and output types which also haven’t been written.

We know how to compose functions - just call one on the result of the other. We also know when two functions are “equivalent” - they always give the same output when given the same input. So we have a category, using our usual notions of composition and equivalence of functions. This category is the main focus of many CS applications of category theory (e.g. types in Haskell). Mathematicians instead focus on the closely-related category of functions between sets; this is exactly the same except that functions go from one set to another instead of one type to another.

__Commutative Diagrams__

A lot of mathy fields use diagrams like this:

For instance, we can scale an image down (f1) then rotate it (g1) or rotate the image (g2) then scale it (f2), and get the same result either way. The idea that we get the same result either way is summarized by the phrase “the diagram commutes”; thus the name “commutative diagram”. In terms of paths: we have path-equivalence f1g1=g2f2.

Another way this often shows up: we have some problem which we could solve directly. But it’s easier to transform it into some other form (e.g. change coordinates or change variables), solve in that form, then transform back:

Again, we say “the diagram commutes”. Now our path-equivalence says f=Tf′T−1.

Talking about commutative diagrams is arguably the central purpose of category theory; our main tool for that will be “natural transformations”, which we’ll introduce shortly.

Think about how we use regexes. We write some pattern then try to match it against some string - e.g. “colou*r” matches “color” or “colour” but not “pink”. We can use that to pick out parts of a target string which match the pattern - e.g. we could find the query “color” in the target “every color of the rainbow”.

We’d like to do something similar for categories. Main idea: we want to match objects (a.k.a vertices) in the query category to objects in the target category, and paths in the query category to paths in the target category, in a way that keeps the structure intact.

For example, consider a commutative square:

We’d like to use that as a query on some other category, e.g. our airport category. When we query for a commutative square in our airport category, we’re looking for two paths with the same start and end airports, (potentially) different intermediate airports, but the same overall cost. For instance, maybe Delta has flights from New York to Los Angeles via their hub in Atlanta, and Southwest has flights from New York to Los Angeles via their hub in Vegas, and market competition makes the prices of the two flight-paths equal.

We’ll come back to the commutative square query in the next section. For now, let’s look at some simpler queries, to get a feel for the building blocks of our pattern-matcher. Remember: objects to objects, paths to paths, keep the structure intact.

First, we could use a single-object category with no edges as a query:

This can match against any one object (a.k.a vertex) in the target category. Note that there is a path hiding in the query - the identity path, where we start at the object and just stay there. In general, our pattern-matcher will always match identity paths in the query with identity paths on the corresponding objects in the target category - that’s one part of “keeping the structure intact”.

Next-most complicated is the query with two objects:

This one is slightly subtle - it might match two different objects, or both query objects might match against the *same* target object. This is just the way pattern-matching works in category theory; there’s no rule to prevent multiple vertices/edges in the query from collapsing into a single vertex/edge in the target category. This is actually useful quite often - for instance, if we have some function which takes in two objects from the target category, then it’s perfectly reasonable to pass in the same object twice. Maybe we have a path-finding algorithm which takes in two airports; it’s perfectly reasonable to expect that algorithm to work even if we pass the same airport twice - that’s a very easy path-finding problem, after all!

Next up, we add in an edge:

Now that we have a nontrivial path, it’s time to highlight a key point: we map paths to paths, *not* edges to edges. So if our target category contains something like A -> B -> C, then our one-edge query might match against the A -> B edge, or it might match against the B -> C edge, or it might match the whole *path* A -> C (via B) - even if there’s no direct edge from A to C. Again, this is useful quite often - if we’re searching for flights from New York to Los Angeles, it’s perfectly fine to show results with a stop or two in the middle. So our one-edge query doesn’t just match each edge; it matches each path between any two objects (including the identity path from an object to itself).

Adding more objects and edges generalizes in the obvious way:

This finds any two paths which start at the same object. As usual, one or both paths could be the identity path, and both paths could be the same.

The other main building block is equivalence between paths. Let’s consider a query with two edges between two objects, with the two edges declared to be equivalent:

This finds not just any two paths with the same start and end, but two *equivalent* paths. As usual, the two paths could be the same path, but they don’t have to be.

We could add a rest stop in the middle of one path, while still considering both paths equivalent:

All the paths matched by the previous query would also be matched by this one, but now we get some extra information in the matching - in addition to the two equivalent paths, we pick out some object along one of the paths.

This highlights one last key point: even if two queries match the same paths, it does matter which things we’re picking out along those paths. For each pair of equivalent paths, our rest-stop query generates one match for every intermediate object along one path - whereas the original equivalent-paths query just generates one single match per pair of equivalent paths.

Category theorists call each individual match a “functor”. Each different functor - i.e. each match - maps the query category into the target category in a different way.

Note that the target category is itself a category - which means we could use it as a query on some third category. In this case, we can compose matches/functors: if one match tells me how to map category 1 into category 2, and another match tells me how to map category 2 into category 3, then I can combine those to find a map from category 1 into category 3.

Because category theorists love to go meta, we can even define a graph in which the objects are categories and the edges are functors. A path then composes functors, and we say that two paths are equivalent if they result in the same map from the original query category into the final target category. This is called “Cat”, the category of categories and functors. Yay meta.

Meanwhile, back on Earth (or at least low Earth orbit), commutative diagrams.

Exercise: Hopefully you now have an intuitive idea of how our pattern-matcher works, and what information each match (i.e. each functor) contains. Use your intuition to come up with a formal definition of a functor. Then, compare your definition to __wikipedia’s definition__ (jargon note: "morphism" = set of equivalent paths); is your definition equivalent? If not, what’s missing/extraneous in yours, and when would it matter?

Let’s start with a microscopic model of a pot of water. We have some “state”, representing the positions and momenta of every molecule in the water (or quantum field state, if you want to go even lower-level). There are things we can do to the water - boil it, cool it back down, add salt, stir it, wait a few seconds, etc - and each of these things will transform the water from one state to another. We can represent this as a category: the objects are states, the edges are operations moving the water from one state to another (including just letting time pass), and paths represent sequences of operations.

In physics, we usually don’t care how a physical system arrived in a particular state - the state tells us everything we need to know. That would mean that any path between the same start and end states are equivalent in this category (just like in the divisibility category). To make the example a bit more general, let’s assume that we do care about different ways of getting from one state to another - e.g. heating the water, then cooling it, then heating it again will definitely rack up a larger electric/gas bill than just heating it.

Microscopic models accounting for the position and momentum of every molecule are rather difficult to work with, computationally. We might instead prefer a higher-level macroscopic model, e.g. a fluid model where we just track average velocity, temperature, and chemical composition of the fluid in little cells of space and time. We can still model all of our operations - boiling, stirring, etc - but they’ll take a different form. Rather than forces on molecules, now we’re thinking about macroscopic heat flow and total force on each little cell of space at each time.

We can connect these two categories: given a microscopic state we can compute the corresponding macroscopic state. By explicitly including these microscopic -> macroscopic transformations as edges, we can incorporate both systems into one category:

Note that multiple micro-states will map to the same macro-state, although I haven’t drawn any.

The key property in this two-part category is path equivalence (a.k.a. commutation). If we start at the leftmost microscopic state, stir (in micro), then transform to the macro representation, then that should be exactly the same as starting at the leftmost microscopic state, transforming to the macro representation, and *then* stirring (in macro). It should not matter whether we perform some operations in the macro or micro model; the two should “give the same answer”. We represent that idea by saying that two paths are equivalent: one path which transforms micro to macro and then stirs (in macro), and another path which stirs (in micro) and then transforms micro to macro. We have a commutative square.

In fact, we have a *bunch* of commutative squares. We can pick any path in the micro-model, find the corresponding path in the macro-model, add in the micro->macro transformations, and end up with a commutative square.

Main take-away: __prism-shaped__ categories with commutative squares on their side-faces capture the idea of representing the same system and operations in two different ways, possibly with one representation less granular than the other. We’ll call these kinds of structures “natural transformations”.

Next step: we’d like to use our pattern-matcher to look for natural transformations.

We’ll start with some arbitrary category:

Then we’ll make a copy of it, and add edges from objects in the original to corresponding objects in the copy:

I’ll call the original category “source”, and the copy “target”.

To finish our pattern, we’ll declare path equivalences: if we follow an edge from source to target, then take any path within the target, that’s equivalent to taking the corresponding path within the source, and then following an edge from source to target. We declare those paths equivalent (as well as any equivalences in the original category, and any other equivalences implied, e.g. paths in which our equivalent paths appear as sub-paths).

Now we just take our pattern and plug it into our pattern-matcher, as usual. Each match is called a natural transformation; we say that the natural transformation maps the source part to the match of the target part. Since we call matches “functors”, a category theorist would say that a natural transformation maps one functor to another of the same shape.

Now for an important point: remember that, in our pot-of-water example, multiple microscopic states could map to the same macroscopic state. Multiple objects in the source are collapsed into a single object in the target. But our procedure for creating a natural transformation pattern just copies the whole source category directly, without any collapsing. Is our pot-of-water example not a true natural transformation?

It is. Last section I said that it’s sometimes useful for our pattern-matcher to collapse multiple objects into one; the pot-of-water is an example where that matters. Our pattern-matcher may be *looking* for a copy of the micro model, but it will still *match* against the macro model, *because* it’s allowed to collapse multiple objects together into one.

More generally: because our pattern-matcher is allowed to collapse objects together, it’s able to find natural transformations in which the target is less granular than the source.

That concludes the actual content; now I'll just talk a bit about why I'm writing this.

I've bounced off of category theory a couple times before. But smart people kept saying that it's really powerful, in ways that sound related to my research, so I've been taking another pass at the subject over the last few weeks.

Even the best book I've found on the material seems burdened mainly by poor formulations of the core concepts and very limited examples. My current impression is that broader adoption of category theory is limited in large part by bad definitions, even when more intuitive equivalent definitions are available - "morphisms" vs "paths" is a particularly blatant example, leading to an entirely unnecessary profusion of identities in definitions. Also, of course, category theorists are constantly trying to go more abstract in ways that make the presentation more confusing without really adding anything in terms of explanation. So I've needed to come up with my own concrete examples and look for more intuitive definitions. This write-up is a natural by-product of that process.

I'd especially appreciate feedback on:

- whether I'm missing key concepts or made crucial mistakes.
- whether this was useful; I may drop some more posts along these lines if many people like it.
- whether there's some wonderful category theory resource which has already done something like this, so I can just read that instead. I would really, really prefer to do this the easy way.

Discuss

*Automatically crossposted*

Suppose that a kingdom contains a million peasants and a thousand nobles, and:

- Each noble makes as much as 10,000 peasants put together, such that collectively the nobles get 90% of the income.
- Each noble cares about as much about themselves as they do about all peasants put together.
- Each person’s welfare is logarithmic in their income.

Then it’s simultaneously the case that:

- Nobles prefer to keep money for themselves rather than donate it to peasants—money is worth 10,000x as much to a peasant, but a noble cares 1,000,000 times less about the peasant’s welfare.
- Nobles prefer a 90% income tax that is redistributed equally—a tax that costs a particular noble $1 generates $1000 of value for peasants, since all other nobles will also pay the higher taxes. That makes it a much better deal for the nobles (until the total income of nobles is roughly equal to the total income of peasants).

In this situation, let’s call redistribution a “moral public good.” The nobles are altruistic enough that they prefer it if everyone gives to the peasants, but it’s still not worth it for any given noble to contribute anything to the collective project.

The rest of the post is about some implications of taking moral public good seriously.

1. Justifying redistribution

This gives a very strong economic argument for state redistribution: it can easily be the case that *every individual *prefers a world with high redistribution to the world with low redistribution, rich and poor alike. I think “everyone prefers this policy” is basically the strongest argument you can make on its behalf.

(In fact some people just don’t care about others and so not *everyone* will benefit. I’d personally be on board with the purely selfish people just not funding redistribution, but unfortunately you can’t just ask people if they want to pay more taxes and I’m not going to sweat it that much if the most selfish people lose out a little bit.)

I think this argument supports levels of redistribution like 50% (or 30% or 70% or whatever), rather than levels of redistribution like 99% that could nearly level the playing field or ensure that no billionaires exist. I think this enough to capture the vast majority of the possible benefits from redistribution, e.g. they could get most households to >50% of the average consumption.

This argument supports both foreign aid and domestic redistribution, but the foreign aid component may require international coordination. For example, if everyone in developed countries cared equally about themselves, their country, and the world, then you might end up with optimal domestic policies allocating 10% of their redistribution abroad (much less in smaller countries who have minimal influence on global poverty, a little bit more in the US), whereas everyone would prefer a multilateral commitment to spend 50% of their redistribution abroad.

2. There are lots of public goods

I think it makes sense for states to directly fund moral public goods like existential risk mitigation, exploration, ecological preservation, arts and sciences, animal welfare improvements, *etc*. In the past I’ve thought it usually made more sense to just give people money and let them decide how to spend it. (I still think states and philanthropists should more often give people cash, I just now think the presumption is less strong.)

In fact, I think that at large scales (like a nation rather than a town) moral public goods are probably the majority of public goods. Caring slightly more about public goods slightly changed my perspective on the state’s role. It also makes me significantly more excited about mechanisms like quadratic funding for public goods.

I enjoyed David Friedman’s *The Machinery of Freedom*, but it repeats the common libertarian line that donations can help the poor just as well as taxes:

> If almost everyone is in favor of feeding the hungry, the politician may find it in his interest to do so. But, under those circumstances, the politician is unnecessary: some kind soul will give the hungry man a meal anyway. If the great majority is against the hungry man, some kind soul among the minority still may feed him—the politician will not.

This seems totally wrong. The use of coercive force is an active ingredient in the state feeding the hungry, as it is with other public good provision. Anarchists either need to make some speculative proposal to fund public goods (the current menu isn’t good!) or else need to accept the pareto inefficiency of underfunding moral public goods like redistribution.

3. Altruism is not about consequentialism

Consequentialism is a really bad model for most people’s altruistic behavior, and especially their compromises between altruistic and selfish ends. To model someone as a thoroughgoing consequentialist, you have two bad options:

- They care about themselves >10 million times as much as other people. Donating to almost anything is in insane, no way the recipient values the money 10 million times more than I do.
- They care about themselves <1% as much as everyone else in the whole world put together. When choosing between possible worlds, they would gladly give up their whole future in order to make everyone else’s life a little better. Their personal preferences are nearly irrelevant when picking policies. If they found themselves in a very powerful position they would become radically more altruistic.

I think neither of these is a great model. In fact it seems like people care a lot about themselves and those around them, but at the same time, they are willing to donate small amounts of their income.

You could try to frame this as “no one is altruistic, it’s just a sham” or “people are terrible at morality.” But I think you’ll understand most people’s altruism better if you think about it as part of a collective action or public goods provision problem. People want to e.g. see a world free from extreme poverty, and they are (sometimes) willing to chip in a small part of that vision for the same reason that they are willing to chip in to the local public park—even though the actual consequence of their donation is too small for them to care much about it.

On this perspective, donating to local charities is on much more even footing with donating to distant strangers. Both are contributions to public goods, just at different scales and of different types, and that’s the thing that most unifies the way people approach and think about them. The consequentialist analysis is still relevant—helping the poor is only a moral public good because of the consequences—but it’s not that the local charity is just a consequentialist error.

In addition to misunderstanding normal humans, I think consequentialists sometimes make related errors in their own judgments. If a bunch of utilitarians want to enjoy a nice communal space, it’s worthwhile for each of them to help fund it even though it neither makes sense on utilitarian grounds nor for their own self-interests. That’s a good norm that can leave every utilitarian better off than if they’d spent the same money selfishly. I think that a lot of moral intuition and discourse is about this kind of coordination, and if you forget about that then you will both be confused by normal moral discourse and also fail to solve some real problems that everyday morality is designed to solve.

Discuss

Next Page of Stories