Takeoff as a Measurement Problem: The Singularity as Divergence Between Win-Rate and Latent Power

What if the “singularity” is not a discontinuity in capability growth, but in our ability to measure it?

Optimization Power Curve

Figure 1: A measurement envelope, not a capability curve. A binary success rate \(s\) (a win-rate / accept-reject metric) becomes an ill-conditioned ruler for the underlying optimization-power gap \(\Delta P\): near \(s \approx 0\) or \(s \approx 1\), many different \(\Delta P\) values produce nearly the same telemetry.

Introduction

“Hard vs soft takeoff” is usually framed as a debate about the shape of capability growth. This post is narrower: it is about the shape of what we can measure.

The claim: the “singularity” can be treated as a psychometric failure mode, where an objective proxy (delegation win-rate) diverges from a latent construct (optimization power). We observe the proxy because it is cheap (accept vs retry); we care about the construct because it governs how much an agent can optimize across objectives.

We:

  • Define an optimality probability \(p_\Sigma(B \ge A)\) (a win-rate over tasks/objectives) and a latent \(\Sigma\)-power \(\mathcal{C}_\Sigma(\cdot)\).
  • Use a concentration assumption on reward gaps to relate upset probability to the power gap \(\Delta P\).
  • Show why binary success rates saturate: at the tails, the same accept/reject telemetry is consistent with many different latent power gaps.

The upshot is an observability constraint: near the extremes, success-rate curves alone cannot distinguish sharp discontinuities from smooth growth. The ruler breaks before the world necessarily does.

Background on takeoff scenarios

Takeoff scenarios are usually sorted into “slow” vs “fast,” but as Raemon argues on LessWrong, those terms conflate the shape of capability growth with its calendar speed. “Smooth/sharp” or “soft/hard” is clearer: a smooth takeoff means continuous capability progression, a sharp one means a sudden jump, and either can happen quickly or slowly in wall-clock time.

Instrumental convergence and power

Instrumental convergence says some states or strategies are broadly useful across many reward functions. Turner et al. formalize this: power is the expected value of a state over a distribution of reward functions:

\[\text{POWER}(s) \sim \mathbb{E}[\langle \rho_s, r \rangle | \rho_s \text{ is a valid occupancy-measure for s}]\]

This measures the average optimization advantage a procedure has across a distribution of objectives, a standard move in learning theory. We use a generic \(\Sigma\)-complexity functional:

\[\mathcal{C}_{\Sigma}(X) := \mathbb{E}_{\sigma \sim \Sigma}\left[\sup_{x \in X}\langle x,\sigma\rangle\right],\]

i.e., the expected support function of the feasible set \(X\) under a random linear objective \(\sigma\). A standard special case is Rademacher complexity: if \(\Sigma\) is the i.i.d. product measure over Rademacher signs (each coordinate of \(\sigma\) is \(\pm 1\) with probability \(1/2\)), then \(\mathcal{C}_{\Sigma}(\cdot)\) reduces (up to the usual normalization conventions) to an empirical Rademacher complexity of the induced function/value class.

Applying complexity measures to takeoff scenarios

We can model definitions similar to those used in power-seeking research and apply them to the concept of takeoff scenarios. We’ll work with a generic \(\Sigma\)-complexity functional:

\[\mathcal{C}_{\Sigma}(X) := \mathbb{E}_{\sigma \sim \Sigma}\left[\sup_{x \in X}\langle x,\sigma\rangle\right].\]

(As noted above, choosing \(\Sigma\) to be i.i.d. Rademacher signs yields the usual Rademacher complexity special case, up to normalization.)

Some definitions:

Definition 1: The optimality probability of \(A\) relative to \(B\) under distribution \(\Sigma\) is

\[p_{\Sigma}(A \ge B) := P_{\sigma \sim \Sigma} \left( \sup_{a \in A} \ \langle a, \sigma \rangle \ge \sup_{b \in B} \ \langle b , \sigma \rangle \right)\]

Definition 2 (\(\Sigma\)-dominance / more-often-optimal): We’ll say \(B\) \(\Sigma\)-dominates \(A\) whenever \(p_\Sigma(A \geq B) \leq 1/2\) (equivalently, \(p_\Sigma(B \ge A)\ge 1/2\)). Terminology note: this is a comparative “more-often-optimal” relation under \(\Sigma\), and is not intended as a re-definition of the standard “instrumental convergence” thesis in alignment discourse.

Definition 3: Assuming zero-mean rewards, the \(\Sigma\)-power of a state is given as:

\[\text{POWER}_{\Sigma}(s) = \mathcal{C}_{\Sigma}(I_s)\]

where \(I_s\) is the set of feasible state-occupancy measures (discounted) that have initial full-support on a state \(s\).

Using these definitions, we can state the following theorem:

Theorem 1 (informal): Assume \(I_a\) and \(I_b\) have positive \(\Sigma\)-power and that the reward-induced gap random variable satisfies the sub-Gaussian condition in the lemma below. If we have,

\[\text{POWER}_{\Sigma}(b) > \text{POWER}_{\Sigma}(a),\]

then \(I_b\) is more likely to be optimal under \(\Sigma\), with the “upset” probability \(p_{\Sigma}(I_a \ge I_b)\) decaying at least exponentially in the power gap. Conversely, observing a non-extreme optimality probability (bounded away from 0 and 1) implies the implied power gap

\[|\text{POWER}_{\Sigma}(b)-\text{POWER}_{\Sigma}(a)|\]

cannot be arbitrarily large under the same assumptions.

ChatGPT success rates and power estimation

We can make the “respond vs Try again” signal analytically meaningful by treating it as a delegation decision between two procedures:

  • \(A\): the user’s outside option (manual completion or switching tools).
  • \(B\): delegating to ChatGPT (requesting a completion).

To connect this UI telemetry to the formal quantity \(p_\Sigma(B\ge A)\), we assume the following minimal model.

Minimal assumptions (delegation / sequential stopping-rule):

  1. Stationary task & preference distribution. Tasks and preferences are drawn from a fixed distribution \(\Sigma\) (or approximately stationary over the measurement window).
  2. Binary accept/reject decision at the margin. After seeing a completion, the user’s choice “proceed/respond” vs “Try again” is driven primarily by whether the completion clears a task-dependent acceptability threshold.
  3. Threshold matches the outside option. That acceptability threshold corresponds to the value of the best alternative procedure \(A\) for the task (up to a stable nuisance offset). Informally: the user rejects exactly when they judge “this draw is worse than doing something else.”
  4. Stable retry model. The user’s retry costs/patience are stable enough that we can model behavior with a consistent sequential rule (e.g., accept if above threshold; otherwise resample up to a budget).
  5. Best-of-k is explicit. Clicking Try again is treated as resampling \(B\), not switching to \(A\). When telemetry is aggregated without modeling retries, it reflects the performance of a composite procedure like “\(B\) with up to \(k\) draws” rather than a single-shot \(B\).

With retries made explicit, there are two closely related quantities:

  • Single-shot win rate: \(s_1 := p_\Sigma(B \ge A)\), the probability one draw from \(B\) beats the outside option.
  • Best-of-k win rate: \(s_k := p_\Sigma(B^{(k)} \ge A)\), where \(B^{(k)}\) denotes “up to \(k\) independent draws from \(B\), take the best.”

UI telemetry like “proceed/respond vs Try again” more directly identifies acceptance/rejection dynamics, which typically correspond to \(s_k\) for an implicit, user-dependent \(k\) (or a distribution over \(k\)), rather than to \(s_1\). In settings where retry budgets and task mix are stable (or are explicitly modeled), this can still be used to estimate a well-defined delegation success probability \(s\), but it should be clear which composite procedure \(B\) is being measured.

What breaks clean estimation: if retry costs, patience, UI friction, or the task mix shift over time, or if best-of-\(k\) behavior varies with task difficulty and stakes, the telemetry starts to reflect stopping and search strategy as much as capability.

Deriving the optimization power curve

The curve shown in Figure 1 is derived from a lemma in the proof of Theorem 1.

Lemma: Let \(\Delta(\sigma) = \sup_{a \in A} \langle a, \sigma \rangle - \sup_{b \in B} \langle b, \sigma \rangle\). Assume the centered random variable \(\Delta(\sigma) - \mathbb{E}[\Delta(\sigma)]\) is sub-Gaussian with variance proxy \(\nu_R^2\) (equivalently, its MGF is bounded as in Hoeffding’s lemma), and assume \(\mathcal{C}_{\Sigma}(A) < \mathcal{C}_{\Sigma}(B)\). Then:

\[p_{\Sigma}(A \ge B) \le e^{\cfrac{- (\mathcal{C}_{\Sigma}(B) - \mathcal{C}_{\Sigma}(A))^2}{2\nu_R^2}}\]

To derive our curve, we interpret the lemma in terms of a delegation-success probability \(s\) and a relative power difference \(\Delta P\). Under the minimal delegation assumptions in the previous section, the success rate

\[s := P(\text{user proceeds rather than clicking "Try again"})\]

can be treated as

\[s = p_{\Sigma}(B \ge A) = 1 - p_\Sigma(A \ge B),\]

where \(A\) is the user’s outside option procedure and \(B\) is delegating to the model. The relative power difference is

\[\Delta P := \mathcal{C}_{\Sigma}(B) - \mathcal{C}_{\Sigma}(A).\]

Using \(1-s = p_\Sigma(A \ge B)\) and the lemma, we can invert the exponential tail bound into a piecewise envelope for the relative power gap \(\Delta P := \mathcal{C}_{\Sigma}(B) - \mathcal{C}_{\Sigma}(A)\).

  • If \(s \ge 1/2\) (i.e., \(B\) is more-often-optimal under \(\Sigma\)), the lemma gives
\[1-s \le e^{\cfrac{-(\Delta P)^2}{2\nu_R^2}} \quad\Rightarrow\quad 0 \le \Delta P \le \nu_R\sqrt{2\log\left(\cfrac{1}{1-s}\right)}.\]

If \(s \le 1/2\) (i.e., \(A\) is more-often-optimal), applying the same reasoning with roles swapped yields

\[s \le e^{\cfrac{-(\Delta P)^2}{2\nu_R^2}} \quad\Rightarrow\quad -\nu_R\sqrt{2\log\left(\cfrac{1}{s}\right)} \le \Delta P \le 0.\]

These one-sided bounds form the shaded region in Figure 1. The parameter \(\nu_R\) controls how quickly the envelope opens up; it is the sub-Gaussian variance proxy for the reward-gap variable \(\Delta(\sigma)\) under \(\Sigma\).

The envelope diverges as \(s \to 0\) or \(s \to 1\) because any saturating accept/reject metric loses resolution at the extremes: very small changes in a near-certain success rate are uninformative about the actual power gap. This is an identifiability problem; it can make progress look sharp or smooth depending on where you are on the curve, without by itself implying an underlying discontinuity in capability growth.

Implications for takeoff scenarios

The implication is about detectability limits: our main relationship is a one-sided tail bound (an upper bound on the upset probability as a function of the power gap under sub-Gaussian noise), and it constrains what kinds of takeoff we can even observe from accept/reject telemetry.

The curve should be read as: when a binary success-rate metric saturates, it stops being a reliable ruler for distinguishing “smooth” vs “sharp” changes in underlying capability. At the extremes, many different underlying trajectories are observationally consistent with the same telemetry.

  1. Low success rates (\(s \approx 0\))
    The bound permits very large negative \(\Delta P\). “The model usually loses to the outside option” does not tightly identify how much worse it is, especially once retry policies, patience, and best-of-\(k\) behavior vary across tasks.

  2. Intermediate success rates (\(s\) away from 0 and 1): the mapping from \(s\) to the bound on \(\|\Delta P\|\) is better conditioned, so changes in telemetry are more interpretable as changes in an underlying gap.

  3. High success rates (\(s \approx 1\))
    The envelope also permits very large positive \(\Delta P\), but this does not mean “we’ve demonstrated a huge gap.” The metric has saturated: once success is near-certain, additional capability gains produce little movement in \(s\). Conversely, small movements in \(s\) near 1 are uninformative about how much the power gap has grown. Depending on how you rescale the metric, the same underlying smooth growth can look either “sharp” or “flat.”

Summary: upper-bound–based reasoning plus saturated accept/reject telemetry imposes a hard observational constraint. Near the extremes, the same data admits very different “takeoff shapes,” so success-rate curves alone cannot settle the smooth-vs-sharp question.

Conclusion

The point of this post is narrow: a binary delegation success rate \(s\) is an ill-conditioned instrument near 0 and 1. That limits what we can infer about capability trajectories from accept/reject telemetry alone. It is not evidence for any particular takeoff regime.

In the mid-range, \(s\) still carries information. But once the metric saturates, distinguishing takeoff shapes requires either (i) richer measurements (graded outcomes, calibrated scoring, task-level difficulty controls) or (ii) explicit modeling of retries, thresholds, and task-mix drift.

Appendix: proofs

Proof of Theorem 1

We’ll use the following lemma to help us prove Theorem 1:

Lemma: Let \(\Delta(\sigma) = \sup_{a \in A} \langle a, \sigma \rangle - \sup_{b \in B} \langle b, \sigma \rangle\). Assume the centered random variable \(\Delta(\sigma) - \mathbb{E}[\Delta(\sigma)]\) is sub-Gaussian with variance proxy \(\nu_R^2\) (equivalently, its MGF is bounded as in Hoeffding’s lemma), and assume \(\mathcal{C}_{\Sigma}(A) < \mathcal{C}_{\Sigma}(B)\). Then:

\[p_{\Sigma}(A \ge B) \le e^{\cfrac{- (\mathcal{C}_{\Sigma}(B) - \mathcal{C}_{\Sigma}(A))^2}{2\nu_R^2}}\]

Proof of Theorem 1: The previous lemma yields:

\[p_{\Sigma}(I_a \ge I_b) \le e^{\cfrac{- (\text{POWER}_{\Sigma}(b) - \text{POWER}_{\Sigma}(a))^2}{2\nu_R^2}}\] \[\Rightarrow p_{\Sigma}(I_b > I_a) \ge 1 - e^{\cfrac{- (\text{POWER}_{\Sigma}(b) - \text{POWER}_{\Sigma}(a))^2}{2\nu_R^2}}\] \[\Rightarrow \log(p_{\Sigma}(I_a \ge I_b)) \le \cfrac{- (\text{POWER}_{\Sigma}(b) - \text{POWER}_{\Sigma}(a))^2}{2\nu_R^2}\] \[\Rightarrow \left|\text{POWER}_{\Sigma}(b) - \text{POWER}_{\Sigma}(a)\right| \le \nu_R \sqrt{2 \log \left( \frac{1}{p_{\Sigma}(I_a \ge I_b)} \right)}\] \[\le \nu_R \sqrt{2 \log \left( \frac{1}{1-p_{\Sigma}(I_b > I_a)} \right)}\]

Thus, relatively optimal policies have bounded power difference. ∎

Proof of Lemma: For any \(\lambda > 0\) we have:

\[p_{\Sigma}(A \ge B) \le \mathbb{E}_{\sigma \sim \Sigma} \left[ \mathbb{I}\left(\sup_{a \in A} \ \langle a, \sigma \rangle \ge \sup_{b \in B} \ \langle b , \sigma \rangle \right) \right]\] \[\le \mathbb{E}_{\sigma \sim \Sigma} \left[ e^{\lambda \left(\sup_{a \in A} \ \langle a, \sigma \rangle - \sup_{b \in B} \ \langle b , \sigma \rangle \right) } \right] \quad \text{(0-1 Loss)}\] \[= \mathbb{E}_{\sigma \sim \Sigma} \left[ e^{\lambda \Delta(\sigma) } \right] = \mathbb{E}_{\delta \sim \Delta^{-1}(\Sigma)} \left[ e^{\lambda \delta } \right] \quad \text{(Change of Variables)}\] \[\le e^{\lambda \mathbb{E}[\delta] + \frac{\lambda^2 \nu^2}{2} } = e^{\lambda (\mathcal{C}_{\Sigma}(A) - \mathcal{C}_{\Sigma}(B)) + \frac{\lambda^2 \nu^2}{2} } \quad \text{(Hoeffding Lemma)}\]

We may optimize the bound and obtain \(\lambda = (\mathcal{C}_{\Sigma}(B) - \mathcal{C}_{\Sigma}(A)) / \nu_R^2 > 0\) by assumption. This implies:

\[p_{\Sigma}(A \ge B) \le e^{\cfrac{- (\mathcal{C}_{\Sigma}(B) - \mathcal{C}_{\Sigma}(A))^2}{2\nu_R^2}}\]

Written on February 22, 2026