Berlin, Machiavelli and AI Safety - the case for moderation

4 Aug

This piece was originally published on 11/04/20

The most frustrating results are often proofs of impossibility; that we can't achieve a limit, reconcile desiderata, or demonstrate a statement. Their existence stands as a rebuke to Enlightenment positivist empiricism, the belief that reason and experiment can reveal everything there is to know about the world. The collapse of that view, from the 19th century onwards, was painful: Gödel's theorems destroyed Hilbert's project to build secure foundations for mathematics; relativity destroyed the concept of objective measurement; the Heisenberg uncertainty principle limits our ability to predict physical systems.

Such frustrations, however, were hardly limited to the mathematical sciences: theorists since Saussure have attacked assumptions about how speech and text relates to the physical world, and destroyed the notion of a past recoverable through any historical medium. Nor was this frustration, however ubiquitous it may now seem, unknown to our ancestors. The Pythagorean sect famously struggled with irrationals, and the medieval scholastic project of reconciling biblical revelation with human reason foundered in the 15th century, leading to the Renaissance.

The collapse of objectivity has not, however, led to the collapse of these disciplines. Instead, scholars everywhere have reoriented themselves away from Enlightenment naïvétè by accepting the impossibility of perfectly representing or understanding reality; George Box's dictum that all models are wrong, but some are useful, has become mainstream.

Isaiah Berlin's 1953 essay The Originality of Machiavelli anticipated the postmodernist challenge to notions of absolute truth. According to Berlin, Machiavelli's fundamental point was that a group of people living a Christian life and practicing Christian morality cannot construct a state like the Roman Republic or the Athens of Pericles, despite the evident merits of both of those systems. The two belief systems, which together serve as the foundations for Western culture and thought, may espouse self-evidently good values, but the systems, and hence the values they hold, could not be reconciled; Machiavelli exploded the myth of the "ultimate compatibility of all values". The Classical ideal could not be constructed or maintained without the ruler, whether tyrant or assembly, dispensing with Christian morality - not in exceptional circumstances, but repeatedly, as part of the normal and expected functioning of the city. It wasn't just that people until the 16th century had failed to construct an ideal city; but that an ideal city, such as Plato's republic, couldn't even be imagined. The impossibility of an ideal city is crucial if one believes that man is a political, and a fortiori social, animal; if this is true, then it's only possible to live freely in a free society, or to live virtuously in a virtuous society. We need to create a structure of institutions and public morality to make it possible for individuals to achieve their own ends.

It is a fundamental assumption of Western thought that there exists what Berlin calls a "unifying monistic pattern" - a Neoplatonist Great Chain of Being or Yggdrasil, an objective truth, a meaning of life, a supreme being, or a supreme virtue. Such a pattern would be analogous to a Grand Unified Theory in physics, something to bring together the good parts of Christianity - the theological virtues of faith, hope, and charity, perhaps - and the grandezza, or greatness, of the classical world which the Renaissance humanists so admired: public-spiritedness, artistic flourishing, civic and military discipline, wealth and power. It would form an answer to Aristotle's great question: how can men best live together?

Machiavelli's demonstration of the incompatibility of the Christian and classical systems, not just in practice but in principle, invalidates this premise. It's not even that over the two millennia of European history which he surveyed, nobody had built an ideal state; it's that even an ideal state, such as that envisioned by Plato in the Republic, was impossible given the realities of human nature, which he saw as more or less unchanging. In Berlin's words, given that there is no answer to the question of how man can best live together, even seeking to find one would be "conceptually incoherent".

Machiavelli is of course making strong assumptions about human nature; his realism, or alternatively pessimism, is a clear avenue of attack for people who dislike his conclusions. Yet while it's easy to have a strong view on the matter, it's much harder to disprove the opinions of those who disagree with you; it's not clear, for example, that the work of Stephen Pinker et al. actually invalidates Machiavelli's assessment. Students of race in the United States, for example, might fiercely contest that there has been any great moral development from the age of slavery to the age of mass incarceration; for the purposes of this essay, Machiavelli's assumption, although highlighted, will have to stand.

In another essay, Two Concepts of Liberty, Berlin discusses positive and negative liberty; but for me, the most evocative part of Berlin's essay is his treatment of change. Berlin thinks that it is essential that our ideals not be eternal. We shouldn't condemn our descendants to the consequences of our moral decisions; how could we be so arrogant? Furthermore, morality is contingent: what is right in one set of circumstances might not be right in another. Berlin treats this as an idealistic position: we shouldn't have to claim eternal validity for our chosen ends; "principles are not less sacred because their duration cannot be guaranteed." The humility of Berlin's immense intellect is, I think, staggering.

To summarise, Berlin's work, standing on the shoulders of the giants of European thought, has presented us with two important considerations which limit any concept of utopia. First, it's impossible to envision, let alone construct, an ideal society - and hence for men to perfectly achieve their chosen ends. Second, no institution or moral system should be eternal, although that shouldn't prevent us from choosing and adhering to it in the first place. We currently deal with these problems through moderation. Our institutional moderation rejects fundamentalism and totalitarianism, instead accepting not just a system of checks and balances, but the inevitable contradictions attending a pluralist system of institutions and actors. We also aim for temporal moderation, taking a middle course between the kind of intergenerational continuity advocated by Burke, and Jefferson's position that the "earth belongs in usufruct to the living"; we should not be able to make choices that bind our descendants in perpetuity.

AI Safety

These points are crucial to AI safety research, and yet it seems to me that many AI safety researchers either disagree with them, or have failed to internalise them: in particular, a significant segment of AI researchers are only concerned with building an AGI, not considering dual-use or social impacts. Although the control problem and AI alignment are widely recognised in principle, asking just what AI should be aligned with tends to get you responses like "the social good"; "human preferences"; "the future of mankind". The points above, I would argue, make any response like this unsatisfactory, and hence outright dangerous.

The fundamental premise of AGI is that near-infinite intelligence leads, sooner or later, to near-infinite means. But it is an equally fundamental assumption that these near-infinite means will allow us to achieve near-infinite ends: the problems humanity faces are technological. Homelessness is a problem of insufficient means; so is hunger, or war, or disease (even a pandemic!); so are solar flares and asteroid impacts; by satisfying the lower levels of Maslow's hierarchy of needs, humanity will be free to focus on self-actualisation. Politics can be replaced with technocracy, as we figure out how to solve each of these problems individually through our newfound intellectual, industrial and organisational capacities.

But the prospect of near-infinite means threatens to destroy the compromises of moderation which are currently saving us from dealing with the two issues outlined above. We don't have to worry about building an ideal society, or making irreversible decisions, since for the most part (nuclear war being an obvious exception) it would be impossible to do either. The development of an AGI would certainly solve some current issues, but it would also ignite currently-dormant social problems.

In general, I would advocate for an approach to AI alignment emphasising moderation, moral agnosticism, and corrigibility; the crucial assumption being that there exists no "unifying monistic pattern", and as such any approach that seeks to derive one, whether by using inverse reinforcement learning to learn from human preferences, using an AI moral philosopher to construct a perfect ethical system, or otherwise, will fail. No system may be irreversible or incorrigible.

MacAskill's Assumption

One of the best approaches to AGI ethics and morality has been proposed by William MacAskill, the founder of the Effective Altruism movement. Two of his proposals are particularly relevant: his 2016 paper 'Normative Uncertainty as a Voting Problem' and his concept of the 'long reflection'. The first paper, deriving from his DPhil thesis and a significant philosophical tradition, proposes a democratic approach to reconcile contradictory moral positions: just as we can use expected value to deal with empirical uncertainty, we can use 'expected choice-worthiness' to deal with normative uncertainty between moral theories. The best thing about this approach is that it doesn't seek to enact a particular ethical system, and preserves popular sovereignty through an analogy with voting.

His second concept, which he explained on the 80,000 Hours Podcast, takes a similar approach:

"Different people have different sets of values. They might have very different views for what an optimal future looks like. What we really want ideally is a convergent goal between different sorts of values… Kind of like this is the purpose of civilization… I think there is an answer. I call it the long reflection, which is you get to a state where existential risks or extinction risks have been reduced to basically zero. It’s also a position of far greater technological power than we have now, such that we have basically vast intelligence compared to what we have now, amazing empirical understanding of the world, and secondly tens of thousands of years to not really do anything with respect to moving to the stars or really trying to actually build civilization in one particular way, but instead just to engage in this research project of what actually is a value. What actually is the meaning of life? And have, maybe it’s 10 billion people, debating and working on these issues for 10,000 years because the importance is just so great. Humanity, or post-humanity, may be around for billions of years. In which case spending a mere 10,000 is actually absolutely nothing."

This passage makes it clear that MacAskill's position is fundamentally at odds with that which Berlin derives from Machiavelli: MacAskill, like most of the Western tradition, firmly believes in a "unifying monistic pattern": the purpose of AGI is to find it. There are certainly practical problems with the 'long reflection': is it actually possible to reduce existential risks to zero before solving our social problems? Could we really persuade billions of people to spend 10,000 years searching for the 'purpose of civilisation', without at least some of them seeing personal advantage in taking advantage of everyone else's distraction? But more importantly, it seems to me that MacAskill is proposing spending 10,000 years searching for something that doesn't - cannot - exist.

What remains to us if we really take seriously that a convergent goal is impossible to conceive, never mind achieve? The answer, again, can be found in Machiavelli: we need to construct a system which is designed to go wrong; we have to anticipate deterioration, and incorporate ways to restore it to health. We aren't seeking to construct a stable utopia, but rather a society where when things do go wrong, they don't go wrong in catastrophic or irretrievable ways. Machiavelli praised the system of dictatorship in the Roman republic: whenever Rome was threatened, the normal system of two consuls, accountable to the Senate and the people, could be suspended by the consuls choosing a dictator to assume full powers for a limited period of time. While the notion of a single person being invested with a plenitude of power was antithetical to the mixed constitution of the republic, the institution of dictatorship was nonetheless essential to that republic. A general election performs a different function, but there is a comparable expectation that the incumbent will decline in effectiveness as its legislative programme either approaches completion or fails. The institution of dictatorship, however, contrasts with the constitution of the UK or US in that rather than seeking to construct an internally consistent set of moral values and checks and balances, it uses an external balance with a different morality to restore internal morality.

The application to AGI ethics is significant: of course we want an AGI to learn human ethics, with some degree of human oversight. But we shouldn't expect that any period of reflection will be sufficient to get us to a point where human oversight and correction is no longer required. Rather, while making every effort to set up the AGI as well as possible initially, we have to expect that it will require continual correction, and hence we should make corrigibility our primary focus, both preventing the AGI from taking irreversible social actions (which include reward hacking) and ensuring that humans have the power to correct AGI decisions through some non-exhaustive combination of democratic voting, legal review and presidential decree.

16th century political philosophy may seem a bizarre place to begin a discourse on the ethics of the distant future; yet the alien context of the distant past retains a wonderful ability to surprise. If we seek to understand a world beyond the Singularity, we have to question our assumptions, and the more basic an assumption, the more important it is to consider whether it will still hold. Today, it doesn't necessarily matter if civilisation has a purpose; but if the Singularity would make it possible to decide one way or another, then we must not stake the future of humanity on the belief that it does. The near-infinite means promised by AGI constitutes an existential threat; our response must retain the ethic of moderation which has sustained us this far. In the words of Oliver Cromwell, "I beseech you, in the bowels of Christ, think it possible that you may be mistaken."

Felix Stocker

Berlin, Machiavelli and AI Safety - the case for moderation

Droctulft