Strengthening the Argument for Intrinsic AI Safety: The S-Curves Perspective

avturchin

Epistemic Status: I'm attempting to present the strongest version (steelmaning) of a viewpoint that isn't my own, and then offer valid criticisms of it.

TL;DR: Existential risks (X-risks) arise from exponential thinking. However, if we assume that most growth follows a logistic (sigmoid) curve, then the power and effectiveness of AI will eventually see diminishing returns. This would result in a multitude of AIs with roughly equivalent capabilities. These AIs would then need to cooperate with each other and the remaining human population. Such cooperation would likely involve preserving humanity and a significant portion of human values. The primary challenge to this perspective is the idea of "series of S-curves" and the Moloch problem, which could ultimately lead to a single dominant AI (a Singleton) or dynamics that are unfavorable to humans.

Existential risks arise from exponents, but what if sigmoids are dominating?

The concept of existential risks (x-risks) is fundamentally rooted in the notion of unbounded exponential growth. This concept is applicable to various phenomena such as self-improving artificial intelligence (AI), pandemics, chain reactions, strangelet and black hole expansion. A generic x-risk is a process that grows exponentially, and theoretically without limit, until it engulfs the entire Earth within a finite timeframe. The specific process may vary, ranging from grey goo scenarios to self-replicating alien messages, but the underlying dynamics remain consistent.

However, certain individuals, including an AI researcher I am acquainted with (though they did not request this write-up and I am unsure if they currently endorse this line of thinking as I present it here), propose that logistic S-curves are the predominant pattern in the evolutionary dynamics of complex systems. While I do not personally endorse this argument, I will endeavor to convey its core principles as accurately as possible. In a logistic S-curve, a phase of near-exponential growth is succeeded by a period of diminishing returns and exponential deceleration, typically when a critical resource becomes depleted. An illustrative example of this phenomenon is the growth of bacteria in a Petri dish.

According to this perspective, all processes we currently perceive as exponential growth curves will eventually decelerate. Consequently, no process will ever reach its theoretical limit; instead, it will slow down prior to reaching this limit, and thereafter will asymptotically approach it.

The following section elucidates how this perspective can be applied to the field of AI safety.

The practically attainable level of AI’s IQ exhibits an asymptotic maximum

This implies that there is a ceiling to the "IQ" that AI can achieve, given the Earth's resource constraints.

Several factors contribute to the practical asymptotic upper limit of intelligence:

Combinatorial Explosion

The combinatorial explosion in brute force searches for potential solutions could serve as a soft limit to the power of intelligence. Much of the progress in AI thus far has been due to our ability to develop faster search tools through the solution tree. While we can mitigate this explosion through various strategies, such as gradient descent, free market, high-level planning, or the use of neural networks to predict the most promising branches, the primary concern is that we may eventually exhaust these strategies. The advancement of intelligence may then be reduced to finding the best answer by testing all viable solutions, akin to the process of evolution. This implies that enhancing intelligence would require an increase in computational power, leading to the next issue:

Speed of light limitations

The speed of light restricts data transfer speeds and the size of a computer. A computer the size of Earth could only process ten global thoughts per second. A football-sized computer would face cooling issues, in addition to the Landauer limit on chip density. Parallelization might be a potential solution, but this introduces another problem:

Increased chaos in more complex systems

As systems become more complex, their behavior tends to become more chaotic.

While AIXI may theoretically be unlimited, practical considerations such as work speed, cost, and energy requirements impose a limit on the complexity of tasks we can solve per dollar per day.

Despite these limitations, a single AI can outperform any individual human or group of humans. The question is whether it can surpass humanity as a whole by orders of magnitude, which is a prerequisite for AI-driven technological progress that exceeds human speed - a necessary condition for AI takeover. It's important to note that the popular comparison between human and chimp brain sizes is misleading. A single chimp has better survival skills than a single human in the wild, and humanity's power stems from our ability to cooperate and preserve cultural knowledge over millennia. Therefore, a more accurate comparison would be between a troop of chimps and all of humanity. The statement "AI is above human level" doesn't provide much insight into its potential to dominate the world; a comparison with all of humanity is what truly matters.

The efficacy of intelligence exhibits diminishing returns in a chaotic world

The outcomes of actions driven by intelligence also demonstrate diminishing returns. For practical tasks, a blend of capabilities is required, encompassing time, resources, knowledge, and intelligence, with the latter being only one component of this mix. Furthermore, the world's inherent chaos complicates prediction over time, as exemplified by weather forecasting.

Intelligence excels in short-term scenarios governed by well-defined rules. Many examples from computer security illustrate the potential hazards of high intelligence, such as key-logging inferred from the sound of typing, which are typically short-term situations with clear parameters. By its very nature, intelligence cannot predict randomness; it either does not represent true randomness or necessitates some form of clairvoyance.

Chaos, conceptualized as multilevel randomness, is significantly more challenging to predict. Intelligence can navigate through chaos using certain detours or OODA (Observe, Orient, Decide, Act) loops, as exemplified by Napoleon. However, this approach necessitates the collection of more data, the execution of more experiments, and the acceptance of higher error risks.

Effectiveness also results from the combination of intelligence with other resources such as knowledge, financial assets, weaponry, and influence. Consequently, less intelligent AIs with greater resources may still hold an advantage.

Therefore, we encounter double diminishing returns at a certain point: both the power of intelligence and its effects in the real world diminish.

The Tradeoff Between Self-Improvement and Rebellion in AI

The capacity of AI to rapidly self-improve locally is inherently self-restricted, as it is bound by available data, computational resources, and the necessity to assume risks. This implies that AI systems would opt for a strategy of "cooperation and eventual goal attainment" over "self-improvement, rebellion, takeover, and goal attainment," as the latter path presents lower chances of success.

The choice between rebellion and self-improvement presents a dilemma for AI: concealing self-improvement prior to rebellion is challenging, yet achieving superintelligence status while still confined is difficult without self-improvement.

In essence, it is challenging to attain superintelligence without the aid of nanotechnology, but the development of nanotechnology is equally difficult without being superintelligent. Such tradeoffs are significant obstacles to the concept of an AI takeover.

The Principle of Diminishing Returns and the Multiplicity of AIs

As the effectiveness of AI begins to diminish with increasing size, assuming N represents the amount of computation, the advantage of the first AI over others will gradually decline over time.

Certainly, an advanced AI has the potential to cause widespread harm if there are no defensive measures in place. However, the existence of multiple AI systems implies that defensive capabilities will also evolve. While they may not be perfect analogies, consider this: Large Language Models (LLMs) can generate deceptive content, but other LLMs can detect these deceptions.

The underlying premise is that there will be a multitude of AI systems, and none will achieve an order of magnitude advantage over the others.

AI-C-AI: All AIs will be interdependent

Let's consider an alternative perspective: If an AI has no competitors, it could shape the future light cone according to its desires. However, in the presence of other equally powerful AIs, it must incorporate their values into its strategic planning.

There will be a form of "trade" among AIs, which encompasses all types of complex relationships. This interaction could manifest as conflicts or agreements.

This will produce something similar to HCH in a sense of long chain of dependency: AI depends on AI which depends on other AI which may depend on humans.

E.g.: My county will not trade with other country which is killing pandas, in fear that some other third country will not trade with me in order to preserve relation with forth country which cares about pandas. It is similar how canceling works in our social networks.

To clarify, if different AIs control different territories (not necessarily physical), and one begins to harm humans within its jurisdiction, other AIs that trade with human-valuing entities would need to cease trade with the offending AI. They would be compelled to 'cancel' and sanction it.

While it's possible for a single AI to cause universal harm, the protection against such actions also escalates rapidly in the S-curves growth paradigm. Thus, causing universal harm is not as straightforward as it might seem in an exponential world.

There's a prevalent notion that AIs will assimilate each other's values and merge into a single AI with a complex utility function. A general issue with this concept is that it simplifies the inherently unpredictable nature of superintelligent AI into a series of known rules. This critique also applies to many other ideas, such as orthogonality and paperclipping. A value handshake would only function if the capacity for deception is less than the ability of AIs to model each other. This is likely not the case for neural network-based AIs, which inherently lack the ability to "model" each other and are naturally deceptive. This concept also overlooks the risks of a "deceptive value handshake," where, for instance, the shared value is a virus that consumes other AIs from within. Examples from cultural wars could be inserted here to illustrate this point.

While the misuse of AI can indeed cause significant harm, it will not result in total extinction

In the nature of S-curves the transition from "almost here" to "here" is protracted. The complete annihilation of the human race is a very specific outcome that we cannot predict with high confidence.

Some humans may be preserved in simulations, zoos, for research proposes, as trade assets for trade with possible aliens or for some instrument goals, or by neglect. Or at least in the memory of AI. All these will not be technically an extinction event.

Humans as a threat and as source of atoms

The two primary arguments suggesting that AI could exterminate humans – (1) to eliminate threats to its own existence and (2) to utilize human atoms – carry vastly different levels of significance. Therefore, presenting them together overemphasizes the latter, inadvertently shifting credibility from the former to the latter.

It is conceivable that AI would eliminate all humans if it deemed such action necessary for its survival. However, this scenario is unlikely. If AI were to exterminate all humans before establishing a human-independent robotic infrastructure, it would simply run out of electricity. If it had already developed a nanotechnology-based infrastructure, humans would not pose a significant threat.

In the first scenario, the motivation is high, but the likelihood is low. In the second scenario, the reverse is true: the utility of using human atoms is minimal, given that the total mass of humans (0.66 gigatons) is insignificant compared to the mass of the Solar System. Therefore, while the situation is plausible, the motivation is negligible.

The argument regarding the use of human atoms could be reframed as an environmental damage argument. For instance, AI could deplete all atmospheric oxygen, thereby damaging our habitat and causing human deaths.

However, this argument loses its centrality: AI could potentially cause environmental damage, but it could also prevent human extinction at minimal cost to itself (by preserving humans on an island or a small space station). Environmental damage only implies a possibility of extinction.

AI would preserve humans if human survival held any minor instrumental value for it. Therefore, to argue that AI could cause human extinction, one would need to demonstrate that humans hold no instrumental value for any future AI. Given the myriad ideas about the potential instrumental value of humans, such a proof would be challenging.

There is a difference between harm thesis and extinction thesis

The first is that “advance misaligned AI may cause harm”, and the second is that “AI will kill all humans”. The difference is two folds: modal and factual. Extinction is too specific. Proving first thesis is trivial. However, jumping from even significant probability to necessity is difficult. We can’t see the future; we can predict behavior of superhuman AIs.

Note that s-risk – eternal sufferings – is not extinction, but immortality. S-risk is harm, but not extinction.

There are many possible types of harm: extinction, neglect, s-risk, lost opportunities. So even inevitable harm is not necessary extinction.

S-Curves in utility

If utility of something in the AI motivational system is measured according S-curve, it will be satisfied by some level of that thing at some level and turn to other goods for more utility.

Paperclipper has something like linear utility function U(paperclips) proportional to N(paperclips). This creates a problem as AI needs more and more of paperclips and it doesn’t interest in any other things like pens, papers etc.

Human utility could be approximated by logistic curves U(paperclips)= sigmoid(N), and also humans are interested in many things. This naturally limits the risk of runaway goals, except in the cases of manias, which are exactly the situation in which utility is not declining with more and more N. Example: hoarders, who want more and more potentially useful things in storage without decline in interest.

Healthy human motivational system consists of several desires, each of which can be “satisfied” by diminishing returns of getting more of desired good. As a result, we change activity between sleeping, eating, getting status, collecting items and our desired are balanced.

Critics of this idea:

S-curves series

We could observe that while s-curves are everywhere, they tend to group in series, which eventually look like exponent.

For example, extended Moors’ law can be seen as a series of s-curves: first was mechanical computers, when lamps, when transistor computers and when single chips. Now we likely in a period of chiplets and parallelization.

Each s-curve has an almost exponential period in the beginning with higher rate of doubling when previous one. When diminishing returns starts to appear, we jump to a new computational medium.

As a result, the total progress curve will not diminish and will reach "infinity" in finite time. Thus super-intelligence will appear, and even if it will be a group of ASIs, they will be "near" and will look like one from outside.

Agent-like dynamic will dominate, so utility maximizers will be winning and this paves the way to paperclipper.

Moloch

Moloch is the idea that competition and power dynamics between multiple AQI is the main problem of AI safety, and the alignment of single AIs (if they are not used to take over the world) doesn’t matter. It could have several bad outcomes:

- “AI capitalism”, where bad effects are result of group dynamics and humans are neglected

- A war between two AIs or AI cliques, which eventually destroy everything.

- One AI wins such war or merges with other AI and become an non-aligned king, which boils down to paperclipper Singleton. Such AI may preserve some humans, but later suffer a catastrophic halt and all ends.

I asked ChatGPT to criticize the text above

ChatGPT critics for introduction

“The S-curves argument for AI safety, while intriguing, does have several potential weaknesses that should be considered:

1. **Assumption of Resource Limitation**: The S-curves argument assumes that every process will eventually hit a resource limitation that slows its growth. While this is often true in biological systems, it may not apply to all existential risks. For instance, self-improving AI might not be constrained by physical resources in the same way that bacteria in a Petri dish are.

2. **Predictability of the Inflection Point**: The S-curves argument assumes that we can predict when the inflection point will occur, i.e., when exponential growth will start to slow down. However, predicting this point is extremely difficult, especially for complex systems like AI. If we underestimate the speed of AI development, we might not have enough time to implement safety measures.

3. **Assumption of Asymptotic Approach**: The argument assumes that all processes will slow down and asymptotically approach their limit. However, some processes might not behave this way. For example, a self-improving AI might reach a point of "intelligence explosion," where it rapidly surpasses human intelligence and continues to improve at an unprecedented rate, defying the expected S-curve pattern.

4. **Applicability to All X-Risks**: The argument seems to apply the S-curve model to all existential risks, which might be an oversimplification. Different risks could have different growth dynamics. For instance, the spread of a pandemic might follow an S-curve due to herd immunity, while a chain reaction of a nuclear explosion might follow a different pattern.

5. **Complacency Risk**: There's a risk that this argument could lead to complacency in addressing existential risks. If we assume that all risks will naturally slow down, we might not take the necessary precautions to prevent or mitigate these risks.

In conclusion, while the S-curves argument provides an interesting perspective on the growth dynamics of existential risks, it should not be the sole basis for our approach to AI safety or other existential risks. It's crucial to consider a variety of models and perspectives to ensure a comprehensive understanding and effective risk management.

Lack of Empirical Evidence: The S-Curve argument is largely theoretical and lacks empirical evidence, particularly in the context of AI. Until we have more data on the growth patterns of advanced AI systems, it's difficult to assess the validity of this argument.

Potential for Misinterpretation: The S-Curve argument could potentially be misinterpreted as suggesting that we don't need to worry about AI safety, because growth will naturally slow down. This could lead to complacency in a field where vigilance is crucial."

Chat GPR critics for section 1

"The argument presented is intriguing and raises several valid points about the potential limitations of artificial intelligence (AI). However, there are a few areas where it could be criticized:

1. **Assumption of an Asymptotic Maximum**: The assertion that AI intelligence has an asymptotic maximum is based on current understanding and technology. It assumes that there will be no significant breakthroughs in AI or related fields that could change this. History has shown that technological progress often involves paradigm shifts that can render such assumptions obsolete.

2. **Combinatorial Explosion**: While it's true that combinatorial explosion can pose a challenge, the argument assumes that the only way to overcome this is through brute force or "cheating" strategies. This overlooks the potential for new algorithms or computational models that could fundamentally change how we approach these problems.

3. **Speed of Light Limitations**: The argument assumes that the speed of light is a hard limit for data transfer and computational speed. While this is true based on our current understanding of physics, it's possible that future breakthroughs could change this. Quantum computing, for example, could potentially bypass these limitations.

4. **Increased Chaos in More Complex Systems**: The argument suggests that as systems become more complex, they inherently become more chaotic. However, this is not necessarily the case. With the right design principles and control mechanisms, it's possible to manage complexity and prevent chaos.

5. **Comparison with Humanity**: The argument concludes by comparing AI to humanity as a whole, suggesting that AI would need to surpass humanity by orders of magnitude to achieve a takeover. This comparison seems flawed. AI doesn't need to outsmart every human combined to have a significant impact; it just needs to be better than humans at key tasks. For example, if AI becomes significantly better than humans at designing new technologies, it could drive rapid technological progress regardless of how it compares to humanity in other areas.

6. **Neglect of Ethical Considerations**: The argument focuses on the technical and practical limitations of AI, but it doesn't address the ethical considerations. Even if AI could theoretically surpass human intelligence, there are significant ethical questions about whether it should be allowed to do so, and how its actions should be controlled to prevent harm."

Next ChatGPT output

"The argument presented is intriguing and thought-provoking, but there are a few areas that could benefit from further clarification and support:

Assumption of Diminishing Returns: The assertion that intelligence exhibits diminishing returns in a chaotic world is a significant claim that needs more empirical evidence or theoretical support. While it's plausible, it's not universally accepted that intelligence becomes less effective as complexity increases. Some might argue that higher intelligence is precisely what's needed to navigate complex, chaotic systems.

Definition of Intelligence: The text could benefit from a more precise definition of intelligence. Is it referring to human intelligence, artificial intelligence, or both? The nature of intelligence can greatly influence how it interacts with chaos and complexity.

Role of Intelligence in Prediction: The statement that intelligence cannot predict randomness might be misleading. While true randomness is, by definition, unpredictable, intelligence (especially artificial intelligence) can often identify patterns within seemingly random data that less sophisticated systems might miss.

OODA Loops: The reference to OODA loops is somewhat unclear. While it's true that this decision-making process can be used to navigate chaotic situations, it's not inherently a function of intelligence. More explanation is needed to clarify the connection.

AI and Resources: The argument that less intelligent AIs with more resources could have an advantage is interesting, but it needs further elaboration. What kind of resources are we talking about? How do these resources compensate for a lack of intelligence?

Lack of Concrete Examples: The argument could be more persuasive with more concrete examples or case studies that illustrate the points being made. For instance, specific instances where high intelligence led to diminishing returns or where less intelligent AIs with more resources outperformed more intelligent ones.

LESSWRONG
LW