Asplund Decompositions

We study the Asplund decomposition of maximal monotone operators into (i) a convex subdifferential part and (ii) an acyclic operator, a monotone operator with a minimal affine subdifferential component.

(This is my very first blog post—hope you all enjoy reading it!)

Convex optimization has been widely popularized as a framework that encompasses numerous problems encountered in practice, most recently in the development of AI, as well as in control theory, economics, statistics, and many other fields. Meanwhile, another field of studies on monotone operators has arisen as one of the foundations of modern optimization and variational analysis. Such operators include (but are not limited to) gradients of convex functions, thereby generalizing convex optimization to broader classes of problems, including variational inequalities and monotone inclusion problems. Monotone operator theory thus extends the reach of convex optimization, accommodating settings where a scalar-valued objective function may not exist or be explicitly defined. This is powerful in the sense that we can cover broader classes of problems, but also partially disallows the application of some well-known convex optimization techniques and intuitions to monotone operator theory.

In light of this relationship between the two, we will take a look at an interesting yet unrenowned result by the Swedish mathematician Edgar Asplund [Asp70], which shows that a maximal monotone operator admits an Asplund decomposition into a convex subdifferential and an acyclic monotone operator, which means that it has no non-trivial subdifferential component. Formally, a maximal monotone operator \( \mathbb{T} \) acting on a Hilbert space \( \mathcal{H} \) with nonempty interior can be written in the form

$$ \mathbb{T} = \partial f + \mathbb{A}, $$

where \( f \) is a convex, closed, proper function and \( \mathbb{A} \) is an acyclic operator, which means that \( \mathbb{A} \) is monotone and, whenever \( \mathbb{A} = \partial g + \mathbb{A}’ \) with convex \( g \) and monotone \( \mathbb{A}’ \), the function \( g \) must be affine. Thus, given any maximal monotone operator, it is theoretically possible to maximally extract and completely separate the convex subdifferential part, leaving something what would then be minimally monotone! These types of decomposition could be potentially useful as it may possibly let us analyze dynamics and/or algorithms related to monotone operators, including minimax optimization and reinforcement learning, through tools and ideas developed from convex optimization theory.

The proof of the existence of the Asplund decomposition is based on another interesting result by [Roc66] that views convex subdifferentials through a completely different characterization called cyclical monotonicity. Also, while the proof of the main result is highly nonconstructive, we will also see a special type of a monotone operator for which this decomposition can be explicitly derived.

Monotone Operators

Suppose that \( \mathcal{H} \) is a Hilbert space equipped with an inner product \( \langle \cdot, \cdot \rangle \). A typical operator \( \mathbb{T} \) on \( \mathcal{H} \) maps each point \( x \in \mathcal{H} \) to a set of points \( \mathbb{T} (x) \subset \mathcal{H} \). (We sometimes call such operators set-valued operators.) We denote the domain and range of the operator by \( \mathrm{dom} \mathbb{T} \) and \( \mathrm{ran} \mathbb{T} \), respectively. We define the graph of the operator by

$$\mathrm{Gra} \mathbb{T} := \{ (x, u) : u \in \mathbb{T} (x) \}.$$

For the case when the set \( \mathbb{T} (x) \) is always either a singleton or an empty set, we say that \( \mathbb{T} \) is a single-valued operator. We can think of such operators as functions, and for these cases we often write \( \mathbb{T} (x) = u \) instead of \( \mathbb{T} (x) = \{ u \} \) with a slight abuse of notation.

One of the most famous examples of monotone operators would be the subdifferentials of convex functions.

Note that the subdifferential of a convex function is also a set-valued operator by definition, where each point is mapped to a set of subgradients. With a slight abuse of notation, we will often write inequalities as

$$ f(x) \ge f(y) + \langle \partial f(y), x-y \rangle $$

to indicate that for all \( g \in \partial f(y) \), the above inequality holds when we replace \( \partial f(y) \) with \( g \). In fact, Given a convex function \( f \), it is easy to check that

$$ \begin{align} \begin{aligned} f(x) &\ge f(y) + \langle \partial f(y), x-y \rangle \\ f(y) &\ge f(x) + \langle \partial f(x), y-x \rangle \end{aligned}, \quad \ \ x, y \in \mathcal{H} \end{align} $$

and adding the two inequalities, we have

$$ \begin{align} \langle \partial f(x) {}-{} \partial f(y), x-y \rangle &\ge 0, \quad \forall x, y \in \mathcal{H}, \end{align} $$

and thus the operator \( \partial f \) is always a monotone operator.

We will often focus on Convex functions that are Closed and Proper, which we will refer to as CCP functions following the terminology in [RY22].

While convex subdifferentials are monotone operators, the converse isn’t always true; there are many other examples of monotone operators that are not convex subdifferentials. In fact, there is a surprisingly elegant way to reformulate convex subdifferentials as a special subset of monotone operators through the language of monotone operator theory, which we will explore right below!

Cyclical Monotonicity

While monotonicity is defined by an inequality that holds for any choice of two points on \( \mathrm{Gra} \mathbb{T} \), we can define a stronger notion of the same inequality by introducing more points to a similar inequality in a cyclic fashion.

First of all, we can explicitly observe that cyclical monotonicity is stronger than the typical monotonicity condition, which is actually simply equivalent to \( 2 \)-monotonicity (i.e., \(N = 2\)). In fact, there are ways to construct explicit examples of \( N \)-monotone operators that are not \( (N+1) \)-monotone [BW07].

While this seems to be quite a random way to generalize the idea of monotonicity at first glance, it surprisingly turns out cyclical monotonicity is actually an equivalent notion to the operator being a convex subdifferential!

Proof. \( (\Leftarrow) \) Suppose that \( \mathbb{T} = \partial f \). Note that \( \mathbb{T} \) is maximal monotone if \( f \) is CCP. For any choice of \( N \in \mathbb{N} \) and any \( N \) points \( \{ x_{i} \}_{i \in [N]} \), we have the following inequalities by convexity:

$$ \begin{align}
f(x_{2}) &\ge f(x_{1}) + \langle \partial f(x_{1}), x_{2} {}-{} x_{1} \rangle \\
f(x_{3}) &\ge f(x_{2}) + \langle \partial f(x_{2}), x_{3} {}-{} x_{2} \rangle \\
& \ \ \vdots \\
f(x_{N}) &\ge f(x_{N-1}) + \langle \partial f(x_{N-1}), x_{N} {}-{} x_{n-1} \rangle \\
f(x_{1}) &\ge f(x_{N}) + \langle \partial f(x_{N}), x_{1} {}-{} x_{N} \rangle
\end{align} $$

which sums up to the cyclical monotonicity inequality.

\( (\Rightarrow) \) Suppose that \( \mathbb{T} \) is a maximal cyclically monotone operator. We can define a function from \( \mathbb{T} \) as follows:

$$ f(x) = \sup_{ \substack{\{ (x_{i}, u_{i}) \}_{i \in [N]} \subset \mathrm{Gra} \mathbb{T} \\ N \in \mathbb{N}} } \left\{ \langle x {}-{} x_{N}, u_{N} \rangle + \sum_{i=1}^{N-1} \langle x_{i+1} {}-{} x_{i}, u_{i} \rangle \right\}. $$

As \( f \) is the supremum of a collection of affine functions, we have that \( f \) is convex and its value is never \( – \infty \). Also, we immediately have \( f(x_{1}) = 0 < \infty \) by cyclical monotonicity, and thus \( f \) is proper as well. Moreover, the supremum of continuous functions is always lower semicontinuous, which concludes that \( f \) is CCP.

Now, we are left to show that for any \( (\bar{x}, \bar{u}) \in \mathrm{Gra} \mathbb{T} \), we have \( \bar{u} \in \partial f(\bar{x}) \). We will equivalently show that for all \( \alpha < f(\bar{x}) \) and \( x \in \mathcal{H} \),

$$ f(x) > \alpha + \langle x {}-{} \bar{x}, \bar{u} \rangle. $$

By the definition of \( f \), there exists \( \{(x_{i}, u_{i}) \}_{i \in [N]} \subset \mathrm{Gra} (\mathbb{T}) \) such that

$$ \alpha < \langle \bar{x} {}-{} x_{N}, u_{N} \rangle + \sum_{i=1}^{N-1} \langle x_{i+1} {}-{} x_{i}, u_{i} \rangle. $$

Now, let us add \( (x_{N+1}, u_{N+1}) = (\bar{x}, \bar{u}) \) to the set of points. Then we have for all \( x \in \mathcal{H} \),

$$ f(x) \ge \langle x {}-{} \bar{x}, \bar{u} \rangle + \langle \bar{x} {}-{} x_{N}, u_{N} \rangle + \sum_{i=1}^{N-1} \langle x_{i+1} {}-{} x_{i}, u_{i} \rangle, $$

again by the definition of \( f \). This shows that \( f(x) > \alpha + \langle x {}-{} \bar{x}, \bar{u} \rangle \) for all \( \alpha < f(\bar{x}) \) and \( x \in \mathcal{H} \) as desired, which completes the proof.


In this post we will use this property of convex subdifferentials as the key component to the proof of the existence of Asplund decompositions of maximal monotone operators.

Asplund Decomposition

Let us assume \( 0 \in \mathbb{T} (0) \), where \( 0 \) is the zero vector of the Hilbert space \( \mathcal{H} \), without loss of generality. (This is just to make equations shorter by omitting terms, and we may adequately translate \( \mathrm{Gra} \mathbb{T} \) for any \( \mathbb{T} \) without affecting the monotonicity inequalities.)

Note that this is the same inequality as in \( 3 \)-monotonicity, but with one point we choose from the graph being fixed to \( (0, 0) \). An operator is \( 2 \)-monotone if it is \( 3^{-} \)-monotone, and \( 3^{-} \)-monotone if it is \( 3 \)-monotone.

Now we define a family of order relation among operators as follows.

We can observe that this relation defines a partial ordering among the family of monotone operators, and particularly for the ordering \( \le_{N} \) for \( N \in \{ 3^{-}, 3, \dots, \omega_{0} \} \), we can show that any ordered chain has an upper bound if \( 0 \in \mathrm{int} \mathrm{dom} \mathbb{T} \), i.e., \( 0 \) is in the interior of the domain of \( \mathbb{T} \).

Proof. For simplicity, let us denote \( \mathbb{T}_{\beta \alpha} := \mathbb{T}_{\beta} {}-{} \mathbb{T}_{\alpha} \). Then \( \mathbb{T}_{\alpha} \le_{N} \mathbb{T}_{\beta} \) is equivalent to \( \mathbb{T}_{\beta \alpha} \) being \( N \)-monotone. Since this implies \( 3^{-} \)-monotonicity as well, we have for all \( x, y \in \mathrm{dom} \mathbb{T} \):

$$ \langle y, \mathbb{T}_{\beta \alpha}(x) \rangle \le \langle x, \mathbb{T}_{\beta \alpha}(x) \rangle + \langle y, \mathbb{T}_{\beta \alpha}(y) \rangle. $$

Observe that for all \( x \in \mathrm{dom} \mathbb{T} \),

$$ 0 \le \langle x, \mathbb{T}_{\alpha}(x) \rangle \le \langle x, \mathbb{T}_{\beta}(x) \rangle \le \langle x, \mathbb{T}(x) \rangle $$

by \( 2 \)-monotonicity, and thus the set \( \{ \langle x, \mathbb{T}_{\alpha}(x) \rangle \}_{\alpha \in \mathcal{A}} \) is monotonocially increasing with an upper bound. Therefore \( \langle x, \mathbb{T}_{\alpha} (x) \rangle \) converges as \( \alpha \rightarrow \infty \), which implies that

$$ \begin{align}
\lim_{\alpha, \beta \rightarrow \infty} \langle x, \mathbb{T}_{\beta \alpha}(x) \rangle = 0 \quad \forall x \in \mathrm{dom} \mathbb{T}.
\end{align} $$

Let us denote the unit ball around \( 0 \) by \( B(0) \), i.e.,

$$ B(0) := \{ x \in \mathcal{H} : \| x \| < 1 \}. $$

By Fact 1 and \( 0 \in \mathrm{int} \mathrm{dom} \mathbb{T} \), there exists \( \epsilon > 0 \) and \( M > 0 \) such that

$$ B_{\epsilon} (0) \subseteq \mathrm{dom} \mathbb{T} \ \ \mathrm{and} \ \ \mathbb{T} (\epsilon B(0)) \subset M \epsilon B (0). $$

If so, we also have \( -y \in B_{\epsilon} (0) \subseteq \mathrm{dom} \mathbb{T} \) whenever \( y \in B_{\epsilon} (0) \subseteq \mathrm{dom} \mathbb{T} \). Thus, by \( 3^{-} \)-monotonicity, we have

$$ \begin{align}
\langle y, \mathbb{T}_{\beta \alpha} (x) \rangle &\le \langle x, \mathbb{T}_{\beta \alpha} (x) \rangle + \langle y, \mathbb{T}_{\beta \alpha} (y) \rangle, \\
\langle -y, \mathbb{T}_{\beta \alpha} (x) \rangle &\le \langle x, \mathbb{T}_{\beta \alpha} (x) \rangle + \langle -y, \mathbb{T}_{\beta \alpha} (-y). \rangle
\end{align} $$

The RHS of both inequalities converge to \( 0 \) as \( \alpha, \beta \rightarrow \infty \), which yields

$$ \begin{align}
\langle y, \mathbb{T}_{\beta \alpha} (x) \rangle \rightarrow 0 \quad \forall x \in \mathrm{dom} \mathbb{T}, y \in B_{\epsilon} (0).
\end{align} $$

For all \( y \in B_{\epsilon} (0) \) we have

$$ 0 \le \langle y, \mathbb{T}_{\beta \alpha} (y) \rangle \le \langle y, \mathbb{T}_{\beta \alpha} (y) \rangle \le \epsilon M. $$

For each \( x \in \mathrm{dom} \mathbb{T} \), we can choose a large enough index \( \gamma(x) \in \mathcal{A} \) such that

$$ 0 \le \langle x, \mathbb{T}_{\beta \alpha} (x) \rangle \le \epsilon \quad \forall \gamma(x) < \alpha < \beta $$

because \( \langle x, \mathbb{T}_{\alpha} (x) \rangle \) converges as \( \alpha \rightarrow \infty \). Therefore we have

$$ \begin{align}
\langle y, \mathbb{T}_{\beta \alpha} (x) \rangle \le \langle x, \mathbb{T}_{\beta \alpha} (x) \rangle + \langle y, \mathbb{T}_{\beta \alpha} (y) \rangle \le \epsilon (M+1) \quad \forall x \in \mathrm{dom} \mathbb{T}, y \in B_{\epsilon} (0).
\end{align} $$

This implies that \( \| \mathbb{T}_{\beta \alpha} (x) \| \le M+1 \).

The net \( \{ \mathbb{T}_{\alpha} (x) \}_{\alpha > \gamma(x)} \) is therefore a norm-bounded weak-star Cauchy net, and therefore the net also weak-star converges to the desired \( N \)-monotone limit, which we can denote by \( \mathbb{T}_{\mathcal{A}} (x) \), which completes the proof.


Now that we have ensured the existence of an upper bound element for a given chain (increasing net) of operators in terms of the ordering \( \le_{N} \), we will specifically choose the case \( N = \omega_0 \) and prove our main result. Before we do so, let us formally define the notion of acyclic operators.

An acyclic operator \( \mathbb{A} \) has no nontrivial (nonaffine) subdifferential component we can extract from \( \mathbb{A} \) while preserving monotonicity of the residual \( \mathbb{A}’ \).

Skew linear operators are one of the simplest example of acyclic operators.

Now we are ready to state and prove the main result.

Proof. First, as mentioned above, we may translate \( \mathrm{Gra} \mathbb{T} \) so that \( 0 \in \mathbb{T} (0) \). Define the set of operators

$$ \mathcal{C} := \{ \mathbb{S} : 0 \le_{\omega_{0}} \mathbb{S} \le_{2} \mathbb{T}, 0 \in \mathbb{S} (0) \}. $$

By Theorem 2, every chain in \( \mathcal{C} \) with respect to the ordering \( \le_{\omega_{0}} \) has a maximal element. Also, \( \mathcal{C} \) is nonempty as it contains the zero mapping. Thus, by Zorn’s lemma, there exists a maximal element \( \hat{\mathbb{S}} \in \mathcal{C} \) such that

$$ 0 \le_{\omega_{0}} \mathbb{S} \le_{\omega_{0}} \hat{\mathbb{S}} \le_{2} \mathbb{T} \quad \forall \mathbb{S} \in \mathcal{C}. $$

Since \( 0 \le_{\omega_{0}} \hat{\mathbb{S}} \), by Theorem 1 we may write \( \hat{\mathbb{S}} = \partial f \) for some CCP \( f \). Since \( \partial f = \hat{\mathbb{S}} \le_{2} \mathbb{T} \), we may write \( \mathbb{T} = \partial f + \mathbb{A} \) for some monotone \( \mathbb{A} \ge_{2} 0 \).

To show \( \mathbb{A} \) is acyclic, suppose that \( \mathbb{A} = \partial g + \mathbb{A}’ \) for some convex \( g \) and monotone operator \( \mathbb{A}’ \). Then we have

$$ \mathbb{T} = \partial f + \partial g + \mathbb{A}’ $$

which means that \( \mathbb{S}’ := \partial f + \partial g {}-{} \partial g(0) \in \mathcal{C} \). Note that we subtract the constant \( \partial g(0) \) just to set \( 0 \in \mathbb{S}’ (0) \) so that \( \mathbb{S}’ \in \mathcal{C} \).

Since \( \hat{\mathbb{S}} = \partial f \) is chosen as the maximal element, we have

$$ \partial f + \partial g {}-{} \partial g(0) \le_{\omega_{0}} \partial f $$

and thus \( – \partial g + \partial g(0) \ge_{\omega_{0}} 0 \), which implies that \( -g \) is CCP and hence affine. Therefore \( \mathbb{A} \) must be acyclic, which completes the proof.


While this theorem shows the existence of an Asplund decomposition of arbitrary maximal monotone operators admit, it is still unknown how to explicitly compute this decomposition because the proof is highly nonconstructive; it heavily relies on using Zorn’s lemma, and thus we cannot obtain any information about the components \( \partial f \) and \( \mathbb{S} \). In fact, there are only a few cases where we do have Asplund decompositions with explicit forms; the easiest of them are convex subdifferentials themselves (with zero acyclic components), and probably the second easiest example would be saddle gradients of convex-concave functions with a certain structrure called bilinear coupling.

Example: Saddle Operators

For now, let us restrict ourselves to the Euclidean space \( \mathcal{H} = \mathbb{R}^{d} \). More specifically, here we consider differentiable convex-concave functions \( f : \mathbb{R}^{d_x + d_y} \rightarrow \mathbb{R} \) defined as follows.

The saddle gradient of convex concave functions have been recognized by Rockafellar [Roc70] as a good example of a monotone operator which is not a convex gradient.

Now, let us restrict ourselves even further to the following class of functions.

If \( f \) is a convex-concave function with bilinear coupling , we can write

$$ \begin{align}
\mathbb{T} (x, y) &= \begin{bmatrix}
\nabla_x g(x) + M y \\ \nabla_y h(y) {}-{} M^{\top} x
\end{bmatrix} \\
&= \begin{bmatrix}
\nabla_x g(x) \\ \nabla_y h(y)
\end{bmatrix} + \begin{bmatrix}
0 & M \\ – M^{\top} & 0
\end{bmatrix} \begin{bmatrix}
x \\ y
\end{bmatrix}.
\end{align} $$

The left part is exactly the gradient (and thus the subdifferential) of the convex function \( g(x) + h(y) \), while the right part is skew linear and hence an acyclic operator by Fact 2. This shows that for this nice case, we can explicitly write the Asplund decomposition as above!

Open Problems

I would like to conclude with some interesting open questions, all closely related to each other, yet still unsolved and left to the community:

  1. How do acyclic operators look like, or what are their notable properties? Would it admit a similar behavior with skew linear operators in some sense?
  2. Are there other cases where we can explicitly compute the Asplund decomposition of monotone operators? In particular, could we do something for saddle gradients of general convex-concave functions (without bilinear coupling)?
  3. Could the relationship between \( \mathbb{T} \) and the convex subdifferential part \( \partial f \) lead to a similar/unified analysis of the dynamics (like \( \dot{x} ={} – \mathbb{T} (x) \) or etc.) induced from each of the two?

References

[Asp70] E. Asplund. A monotone convergence theorem for sequences of nonlinear mappings. Proceedings of Symposia in Pure Mathematics 18 (1970), 1–9.

[Bor06] J. M. Borwein. Maximal montonicity via convex analysis. Journal of Convex Analysis (2006).

[BW07] J. M. Borwein and H. Wiersma. Asplund decomposition of monotone operators. SIAM J. on Optimization. 18 (2007), p. 946–96.

[CP11] A. Chambolle and T. Pock. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. Journal of Mathematical Imaging and Vision, 40 (2011), pp 120-145.

[Roc66] R. T. Rockafellar. Characterization of the subdifferentials of convex functions. Pacific J. Math. 17 (1966), 497–510.

[Roc70] R. T. Rockafellar. Monotone operators associated with saddle functions and minimax problems. Proceedings of Symposia in Pure Mathematics 18 (1970), 241–250.

[RY22] E. K. Ryu and W. Yin. Large-Scale Convex Optimization: Algorithms & Analyses via Monotone Operators. Cambridge University Press, 12 (2022).

Jaewook

I am a first-year PhD student at Stanford EE. I am interested in various topics in optimization theory and algorithms, applications to ML/AI settings, and other random math-related stuff.

Leave a Reply

Your email address will not be published. Required fields are marked *