WIND>Main/FWItuto>WebHome (16 Feb 2021, operto)EditAttach

FWI in Exploration Geophysics: a Tutorial

This a tutorial on the principles of FWI (in perpetual construction). We develop the theory in the frequency domain for sake of compact notations. However, we will periodically make the link with the time domain formulation since the frequency-domain matrix formulation requires a higher level of abstraction and hence may be difficult to conceptualized after a first reading.




click on the section number to get there

\( \ref{I} \) Notation and problem statement

\( \ref{II} \) FWI with local gradient-based optimization methods

\( \ref{III} \) Generic expression of gradient and Hessian

\( \ref{IV} \) The gradient and Hessien of FWI: imaging principle

\( \ref{V} \) Efficient computation of the sensitivity matrix

\( \ref{VI} \) Efficient computation of the gradient

\( \ref{VII} \) Computing the gradient of the misfit function with the adjoint-state method

\( \ref{VIII} \) The adjoint-state method with Lagrangian functional

\( \ref{IX} \) Role of the Hessian

\( \ref{X} \) Resolution analysis

\( \ref{XI} \) The cycle skipping pathology

\( \ref{XII} \) Regularization of FWI

\( \ref{XIII} \) FWI and least-squares Reverse Time Migration

\( \ref{XIV} \) Reflection Waveform Inversion (RWI)

\( \ref{XV} \) Source estimation in FWI

\( \ref{XVI} \) Multi-parameter FWI

\( \ref{XVII} \) Appendix 1: Few words about the method of Lagrange multiplier



Notation and problem statement

Let's consider the time-harmonic (Helmholtz) equation written in matrix form as

$$\boldsymbol{A} \left( \omega,\boldsymbol{m}(\boldsymbol{x}) \right)\boldsymbol{u}(\omega,\boldsymbol{x}) = \boldsymbol{b}(\omega,\boldsymbol{x}), \tag{1.1}\label{1.1}$$

where \( \boldsymbol{A} = \omega^2 \text{diag}(\boldsymbol{m}) + \Delta \) denotes the Helmholtz operator, \( \omega \) is the angular frequency, \(\Delta = \partial_{xx}+\partial_{zz}\) is the Laplace operator, \( \boldsymbol{m} \in \Bbb{R}^{N \times 1} \) is the squared slownesses, \( \boldsymbol{u} \in \Bbb{C}^{N \times 1} \) is the monochromatic wavefield and \( \boldsymbol{b} \in \Bbb{C}^{N \times 1} \) is the monochromatic source.

u and b are vectors of dimension N, where N denotes the number of degrees of freedom discretizing the subsurface medium m. Accordingly, A is a full rank N x N squared matrix.

Unless explicitely stated otherwise, we will drop the dependency to ω in equation \( (\ref{1.1}) \) for sake of compact notations

$$\boldsymbol{A} (\boldsymbol{m}) \boldsymbol{u}= \boldsymbol{b}. \tag{1.2}\label{1.2}$$

This means that we consider a mono-frequency problem. However, multiple frequencies can be jointly processed by FWI through discrete summation in a similar manner to the time-domain formulation which proceeds by summation over time.

Equation \( (\ref{1.2}) \) is called the state equation which can be written in generic form as

$$F(\boldsymbol{u},\boldsymbol{m}) =0, \tag{1.3}\label{1.3}$$

where u is the state variable and m the model parameters.

FWI aims to estimate the subsurface parameters m from parsimonious measurements of u.

Accordingly, FWI can be formulated as the following multivariate PDE-constrained optimization problem

$$\min_{\boldsymbol{u},\boldsymbol{m}} \| \boldsymbol{P u} - \boldsymbol{d}^*\|_2^2 ~~ \text{subject to} ~~ \boldsymbol{A}(\boldsymbol{m}) \boldsymbol{u}= \boldsymbol{b}, \tag{1.4a}\label{1.4a}$$

or equivalently, $$\min_{\boldsymbol{u},\boldsymbol{m}} \| \boldsymbol{P u} - \boldsymbol{d}^*\|_2^2 ~~ \text{subject to} ~~ F(\boldsymbol{u},\boldsymbol{m}) =0,\tag{1.4b}\label{1.4b}$$

where d* denotes the observations (the recorded data) and the linear observation operator P samples the wavefield \( \boldsymbol{ u} \) at the receiver positions. For sake of compactness, we consider a single source problem but the extension to multiple sources follows easily through a summation over sources. P is a M x N matrix where M << N denotes the number of measurements.

The equation relating the wavefield solution u (the state variable) to the measurements \( \boldsymbol{d}^*\) through the observation operator P is called the observation equation

$$\boldsymbol{P u} - \boldsymbol{d}^*. \tag{1.5}\label{1.5}$$

From equation \( (\ref{1.2}) \), let's consider the closed-form expression of u as a nonlinear function of m

$$\boldsymbol{u}= \boldsymbol{A}^{-1}(\boldsymbol{m})\boldsymbol{b}, \tag{1.6}\label{1.6}$$

where the columns of \( \boldsymbol{A}^{-1} \) are the Green functions for point sources located at each grid point of the computational domain (Indeed, \( \boldsymbol{A} \boldsymbol{A}^{-1} = \boldsymbol{I} \), where the identity matrix \( \boldsymbol{I} \) gathers the impulsional sources).

In compact form,

$$\boldsymbol{u}= \boldsymbol{S}(\boldsymbol{m}). \tag{1.7}\label{1.7}$$

With this notation, the observation equation reads

$$\boldsymbol{d}= \boldsymbol{P} \boldsymbol{S}(\boldsymbol{m})=\phi(\boldsymbol{m}). \tag{1.8}\label{1.8}$$

In the framework of the so-called reduced-space formulation of FWI, we can recast the multi-variate constrained problem, \( (\ref{1.4a}) \), as a mono-variate unconstrained problem by enforcing the closed-form expression of u as a function of m, \( (\ref{1.6}) \), in the data misfit function (the first term of \( (\ref{1.4a}) \)). Simply put, we force the constraint to be strictly satisfied. This shrinks the FWI search space for the benefit of computational efficiency. This variable elimination is called variable projection .

$$\min_{\boldsymbol{m}} \| \boldsymbol{P A}(\boldsymbol{m})^{-1} \boldsymbol{b} - \boldsymbol{d}^*\|_2^2. \tag{1.9}\label{1.9}$$


$$\min_{\boldsymbol{m}} \| \phi(\boldsymbol{m}) - \boldsymbol{d}^*\|_2^2. \tag{1.10}\label{1.10}$$

Back to \ref{Content}


FWI with local gradient-based optimization methods

Let's denote by \( C(\boldsymbol{m}) \) the nonlinear objective function associated with the minimization problem \( (\ref{1.10}) \),

$$C(\boldsymbol{m}) = \| \phi(\boldsymbol{m}) - \boldsymbol{d}^*\|_2^2. \tag{2.1}\label{2.1}$$

To solve \( (\ref{1.10}) \) with local optimization techniques, we consider a surrogate locally-quadratic approximation of \( C(\boldsymbol{m}) \). This surrogate function \( \tilde{C} \) majorizes C (Figure 1) and therefore Newton methods belong to the category of Majorization-Minimization (MM) method (Lange, 2004). To build the quadratic surrogate function around an initial guess \( {\boldsymbol{m}}_0 \), we apply a second-order Taylor-Lagrange development on \( C(\boldsymbol{m}) \)

$$\tilde{C}({\boldsymbol{m}}_0 + \Delta {\boldsymbol{m}},{\boldsymbol{m}}_0)=C({\boldsymbol{m}}_0)+\sum_{j=1}^{M} \frac{\partial C({\boldsymbol{m}}_0)}{\partial m_j} \Delta m_j
+ \frac{1}{2} \sum_{j=1}^{M} \sum_{k=1}^{M} \frac{\partial^2 C({\boldsymbol{m}}_0)}{\partial m_j \partial m_k} \Delta m_j \Delta m_k, \tag{2.2}\label{2.22}$$

where \( \Delta ({\boldsymbol{m}}) \) is a small perturbation of \( {\boldsymbol{m}}_0 \). The second argument of \( \tilde{C} \) is here to remind the point around which we linearize the problem. This point changes at each FWI iteration (Figure 2.1).

In matrix form,

$$\tilde{C}({\boldsymbol{m}},{\boldsymbol{m}}_0)=C({\boldsymbol{m}}_0)+\left( \frac{\partial C({\boldsymbol{m}}_0)}{\partial \boldsymbol{m}} \right)^T (\boldsymbol{m}-\boldsymbol{m}_0)
+(\boldsymbol{m}-{\boldsymbol{m}}_0)^T \left( \frac{\partial^2 C({\boldsymbol{m}}_0)}{\partial \boldsymbol{m}} \right) ( \boldsymbol{m} - \boldsymbol{m}_0). \tag{2.2}\label{2.2}$$

Note that \( \tilde{C}({\boldsymbol{m}}_0,{\boldsymbol{m}}_0)=C({\boldsymbol{m}}_0) \), \( \nabla_{m} \tilde{C}({\boldsymbol{m}}_0,{\boldsymbol{m}}_0) = \nabla_{m} C({\boldsymbol{m}}_0) \) and \( \partial^2_m \tilde{C}({\boldsymbol{m}}_0,{\boldsymbol{m}}_0) = \partial^2_m C({\boldsymbol{m}}_0) \).


Figure 2.1: At each iterate, the misfit function \( C(m) \) is majorized by a quadratic function \( \tilde{C}(m,m_k) \) such that \( \tilde{C}(m_k,m_k)=C(m_k) \) and \( \nabla_m \tilde{C}(m_k,m_k) = \nabla_m C(m_k) \). Provided that \( \tilde{C}(m,m_k)\) is sufficiently close to \( C(m) \) in the vicinity of \( m_k \), \( \tilde{C}(m,m_k)\) majorizes \( C(m) \) and converges toward the minimizer of \( C(m) \) as k tends to \( \infty \) (see the textbook of K. Lange for more details).

Let's differentiate \( \tilde{C} \) with respect to the parameter \( m_l \)

$$\frac{\partial \tilde{C}(\boldsymbol{m},\boldsymbol{m}_0)}{\partial m_l}=\frac{\partial C({\boldsymbol{m}}_0)}{\partial m_l} +\sum_{j=1}^{M} \frac{\partial^2 C({\boldsymbol{m}}_0)}{\partial m_j \partial m_l} \Delta m_j. \tag{2.3}\label{2.3}$$

In matrix form,

$$\frac{\partial \tilde{C}({\boldsymbol{m}},\boldsymbol{m}_0)}{\partial {\boldsymbol{m}} }=\frac{\partial C({\boldsymbol{m}}_0)}{\partial \boldsymbol{m}}+\frac{\partial^2 C({\boldsymbol{m}}_0)}{\partial {\boldsymbol{m}}^2} \Delta {\boldsymbol{m}}. \tag{2.4}\label{2.4}$$

The minimum of \( \tilde{C} \) is reached when the first derivative of \( \tilde{C} \) vanishes.

$$\frac{\partial \tilde{C}(\boldsymbol{m},\boldsymbol{m}_0)}{\partial \boldsymbol{m}}=\boldsymbol{0}. \tag{2.5}\label{2.5}$$

This leads to the so-called Newton system:

$$\frac{\partial^2 C({\boldsymbol{m}}_0)}{\partial {\boldsymbol{m}}^2} \, \Delta {\boldsymbol{m}} = - \frac{\partial C({\boldsymbol{m}}_0)}{\partial \boldsymbol{m}}. \tag{2.6}\label{2.6}$$

The right-hand side is the opposite of the gradient of C (the steepest descent direction ). The left operator is the Hessian matrix , the second derivative of \( C \).

A Newton algorithm iterative updates m through the recursion

$$\boldsymbol{m}^{k+1}= \boldsymbol{m}^k + \alpha \Delta \boldsymbol{m}^k,\tag{2.7}\label{2.7}$$

where the descent direction \( \Delta \boldsymbol{m}^k \) is given by

$$ \Delta {\boldsymbol{m}}^{k} = - \left( \frac{\partial^2 C({\boldsymbol{m}}^k)}{\partial {\boldsymbol{m}}^2} \right)^{-1} \, \frac{\partial C({\boldsymbol{m}}^k)}{\partial \boldsymbol{m}}, \tag{2.8}\label{2.8}$$

where \( k \) denotes the iteration count, and \( \alpha \) the step length . The step length controls the quantity of descent as illustrated in Figure 2.2.


Figure 2.2: Sketch illustration of the impact of the step length on the quantity of descent at a given iteration.

Remark: It is generally not possible to build explicitly the inverse of the Hessian in (\ref{2.6}). Therefore, we generally resort to iterative methods to solve the Newton system (\ref{2.6}) such as steepest-descent (along the parabola) or linear conjugate gradient method. This iterative algorithm introduces an inner loop within the outer nonlinear FWI loop, which requires to perform Hessian-vector product. To check this, note that

$$\nabla_{\boldsymbol{m}} \tilde{C}({\boldsymbol{m}},{\boldsymbol{m}}_0) = \nabla_{\boldsymbol{m}} C({\boldsymbol{m}}_0)
+\left( \frac{\partial^2 C({\boldsymbol{m}}_0)}{\partial \boldsymbol{m}} \right) ( \boldsymbol{m} - \boldsymbol{m}_0). \tag{2.9}\label{2.9}$$

This Hessian-vector product can be computed with second-order adjoint state method (Metivier et al., 2017) or the BFGS algorithm ( Nocedal, 2006).

Alternatively, we use steepest descent \( ( \partial^2 C({\boldsymbol{m}}^k) / \partial {\boldsymbol{m}}^2 \approx \boldsymbol{I} ) \) , nonlinear conjugate gradient or limited-memory quasi-Newton method (l-BFGS) to approximate the Newton search direction (\ref{2.8}) in the outer FWI loop.

Back to \ref{Content}


Generic expression of the gradient and Hessian


Let's compute the \( k^{th} \) element of the gradient of the misfit function.

$$\frac{\partial C({\boldsymbol{m}}) }{\partial m_k} = \frac{1}{2} \frac{ \partial \left( \left( \phi({\boldsymbol{m}}) - {\boldsymbol{d}}^* \right)^\dagger \left( \phi({\boldsymbol{m}}) - {\boldsymbol{d}}^* \right) \right) }{\partial m_k}\tag{3.1}\label{3.1}$$

In discrete form,

$$\frac{\partial C(\boldsymbol{m})}{\partial m_k}=\frac{1}{2} \sum_{i=1}^{M} \frac{ \partial \left( \left(\phi_{i}-d_{i}^*\right) \left(\overline{\phi_{i}-{d}_{i}^*}\right) \right) }{\partial m_k},\tag{3.2}\label{3.2}$$

where \( \phi_{i} \) denotes the elements of the vector \( \phi(\boldsymbol{m}) \) (the synthetic data computed in the model \( {\boldsymbol{m}}) \). \( M \) denotes the number of observations. \( \bar{a} \) denotes the conjugate of \( a \). \( ^\dagger \) denotes the Hermitian (conjugate) transpose.

$$\frac{\partial C(\boldsymbol{m})}{\partial m_k}=-\frac{1}{2} \sum_{i=1}^{M} \left\{ \frac{\partial \phi_{i}}{\partial m_k} \left(\overline{\phi_{i}-d^*}\right)+ \left(\phi_{i}-d_{i}^*\right) \left( \overline{\frac{\partial \phi_{i}}{\partial m_k} }\right) \right\}.\tag{3.3}\label{3.3}$$

$$\frac{\partial C(\boldsymbol{m})}{\partial m_k}= -\sum_{i=1}^{M} \Re \left\{\left( \overline{\frac{\partial \phi_{i}}{\partial m_k}} \right) \left(\phi_{i}-d^*_{i}\right) \right\},\tag{3.4}\label{3.4}$$

where \( \Re\left\{a\right\} \) denotes the real part of \( a \) and we assume that \( m \) is real valued. Going back to matrix notation,

$$\frac{\partial C(\boldsymbol{m})}{\partial m_k}= -\Re \left\{ \left( \frac{\partial \phi({\boldsymbol{m}}) }{\partial m_k} \right)^\dagger \left( \phi({\boldsymbol{m}}) - {\boldsymbol{d}}^* \right) \right\}.\tag{3.5}\label{3.5}$$

$$\frac{\partial C({\boldsymbol{m}})}{\partial \boldsymbol{m}}= -\Re \left\{ \left( \frac{\partial \phi({\boldsymbol{m}}) }{\partial {\boldsymbol{m}} } \right)^\dagger \left( \phi({\boldsymbol{m}}) - {\boldsymbol{d}}^* \right) \right\}=-\Re \left\{ {\boldsymbol{J}}^\dagger ({\boldsymbol{m}}) \Delta {\boldsymbol{d}}({\boldsymbol{m}}) \right\}.\tag{3.6}\label{3.6}$$

The \( M \times N \) matrix \( \boldsymbol{J} \) is called the sensitivity matrix or the Jacobian matrix or the Fréchet derivative matrix . The \(M\) vector \(\Delta {\boldsymbol{d}}({\boldsymbol{m}}) \) contains the data residuals. The gradient is formed by the zero-lag correlation between the partial derivative of the data with respect to the parameters and the data residuals.

  • If we consider real-valued time-dependent data (as for seismograms), the gradient becomes
$$\frac{\partial C(\boldsymbol{m})}{\partial m_k}= - {\boldsymbol{J}}^T ({\boldsymbol{m}}) \Delta {\boldsymbol{d}}({\boldsymbol{m}}), \tag{3.7}\label{3.7}$$

which should be interpretated as a zero-lag correlation between the partial derivative of \( \phi \) and \( \Delta \boldsymbol{d} \).


$$\frac{\partial C(\boldsymbol{m})}{\partial m_k}= -\int_t \sum_r \frac{\partial \phi_r(t + \tau)}{\partial m_k} \Delta d_r(t) dt, \tag{3.8}\label{3.8}$$

with \( \tau \) = 0 and \( r \) labels the positions where the measurements are performed.

Behind correlation, there is indeed the idea of comparison of two signals, one of them been delayed by \( \tau \). When there is similarity, the correlation function has a high amplitude. Here, we perform the comparison of the signals \( \partial \phi / \partial \boldsymbol{m} \) and \( \Delta \boldsymbol{d} \) for only one lag equal to 0 because the output signal (the gradient) is independent to time. Note that FWI approaches that extend the search space re-introduces a dependency of the gradient to time to extend the search space of FWI and relax the requirement of accurate velocity model (for example, see Biondi and Almomin (2014) and section XI).
  • In the frequency domain, if multiple frequencies are processed jointly, the vector \( \Delta \boldsymbol{d} \) gathers all the frequencies involved in the inversion and the gradient is computed by summation over frequencies (for the same reason as above, the gradient is independent to frequencies for a full bandwidth signal).
  • If \( m \) is complex-valued (for example, when attenuation is considered in frequency-domain FWI), we need to use Wirtinger derivative.
$$\frac{\partial }{\partial z } = \frac{1}{2} \left( \frac{\partial }{\partial x } -i \frac{\partial }{\partial y } \right) ~~~~ \frac{\partial }{\partial \bar{z} } = \frac{1}{2} \left( \frac{\partial }{\partial x } +i \frac{\partial }{\partial y } \right), \tag{3.9}\label{3.9}$$

where \( z = x + i y \).

Let's now compute the element \( m,l \) of the second-derivative of the misfit function.

$$\frac{\partial^2 C({\boldsymbol{m}})}{\partial m_k \partial m_l}=-\sum_{i=1}^{M} \Re \left\{ \left( \overline{\frac{\partial^2 \phi_i }{\partial m_k \partial m_l} } \right) (\phi_i-d_i^*) \right\}
+\sum_{i=1}^{M} \Re \left\{ \left( \overline{ \frac{\partial \phi_i }{\partial m_k} } \right) \frac{\partial \phi_i }{\partial m_l} \right\}.\tag{3.10}\label{3.10}$$

Going back to matrix notation,

$$\frac{\partial^2 C({\boldsymbol{m}})}{\partial {\boldsymbol{m}}^2}=\Re \left\{ {\boldsymbol{J}}^\dagger {\boldsymbol{J}} \right\}+\Re \left\{ \frac{\partial {\boldsymbol{J}}^t}{\partial {\boldsymbol{m}}^T} \left(\overline{\Delta {\boldsymbol{d}}} ~ ... ~ \overline{\Delta {\boldsymbol{d}}} \right) \right\}.\tag{3.11}\label{3.11}$$

The linear term of the Hessian, which is sometimes referred to as approximate Hessian or Gauss-Newton Hessian , is given by:

$${\boldsymbol{H}}_a = \Re \left\{ {\boldsymbol{J}}^\dagger {\boldsymbol{J}} \right\}.\tag{3.12}\label{3.12}$$

The linear term in the Hessian represents the sum of the zero-lag correlation between the partial derivative of the data with respect to two different
model parameters \( m_l \) and \( m_k \).

$$h_{a_{lk}} = \Re \left\{ \sum_{i=1}^{M} \left( \overline{ \frac{ \partial \phi_i }{\partial m_l} } \right) \, \frac{ \partial \phi_i }{\partial m_k}. \right\}. \tag{3.13}\label{3.13}$$

The second term is non linear. It means that it would be zero if the forward problem equation was linear. It corrects the gradient from artifacts resulting from the erroneous interpretation of double-scattering effects as single-scattering ones.

Back to \ref{Content}


The gradient and Hessian of the FWI problem: imaging principle

This section is inspired from Pratt et al. (1998). To develop the formal expression of the FWI gradient, we can compute the partial derivative of \( \boldsymbol{u} \) with respect to \( \boldsymbol{m} \); then, sample them at the receiver positions with \( \boldsymbol{P} \) to build the sensitivity matrix. Then, perform the matrix vector product between the sensitivity matrix and the conjugate of the data residuals according to (\ref{3.6}).

We start by differentiating the forward-problem equation with respect to \( \boldsymbol{m} \)

$$\boldsymbol{A}(\boldsymbol{m}) \boldsymbol{u} = \boldsymbol{b}.\tag{4.1}\label{4.1}$$

$$\frac{ \partial \boldsymbol{A}(\boldsymbol{m})}{\partial m_j} \boldsymbol{u} + \boldsymbol{A}(\boldsymbol{m}) \frac{\partial \boldsymbol{u}}{\partial m_j} = \boldsymbol{0}. \tag{4.2}\label{4.2}$$

We indeed use the fact that \( \boldsymbol{b} \) does not depend on \( \boldsymbol{m}\).

$$\boldsymbol{A}(\boldsymbol{m}) \frac{\partial \boldsymbol{u}}{\partial m_j} = - \frac{ \partial \boldsymbol{A}(\boldsymbol{m})}{\partial m_j} \boldsymbol{u}. \tag{4.3}\label{4.3}$$

The partial derivative of \( \boldsymbol{u} \) with respect to \( m_j \) is a wavefield that satisfies the wave equation for the virtual scattering source \( \boldsymbol{f}^{(j)} = - \left( \partial \boldsymbol{A}(\boldsymbol{m}) / \partial m_j \right) \boldsymbol{u} \). The spatial support of this source is centred on \( m_j \) and its temporal support is centred on the arrival time of \( \boldsymbol{u}\) at \(m_j\). The first statement is shown by the fact that the radiation pattern matrix \( \partial \boldsymbol{A}(\boldsymbol{m}) / \partial m_j \) is a sparse matrix, whose non zero coefficients sample \( \boldsymbol{u} \) in the close vicinity of \( m_j \) (Figure 4.1). The second one is shown by the fact that \( \partial \boldsymbol{A}(\boldsymbol{m}) / \partial m_j \) is applied on \( \boldsymbol{u} \).


Figure 4.1: Pattern of the impedance matrix when discretized by finite differences on Cartesian grid with a compact 9-point stencil. If we take the partial derivative of the coefficients of the matrix with respect to \( m_i \) where \( m \) is the squared slowness and if the mass matrix is discretized with a lumped mass, then the resulting matrix has one non zero coefficient on the diagonal \( i \) whose amplitude is \( \omega^2 \).

The partial derivative wavefield has an obvious interpretation: it is the wavefield single scattered by a point diffractor located at the position of the model parameter which respect to which we compute the partial derivative of the wavefield (Figure 4.2). The radiation pattern of the scattering source depends on the subsurface parametrization with which the wave equation is formulated (for example, wavespeed-density, wavespeed-impedance, density-impedance for the acoustic wave equation) and the parameter type in this parametrization with respect to which the partial derivative of u is computed (Figure 4.3).


Figure 4.2: Sketch of the partial derivative wavefield. In a homogeneous background model for a surface acquisition,the partial derivative of the data with respect to one parameter is shown by a diffraction hyperbolae (top panel). If the parameter is the wavespeed for a (velocity-density) parametrization the scattering source has en isotropic radiation pattern (as a pressure source).


Figure 4.3: Snapshot of partial derivative wavefields computed numerically for a source located at the vertex of the model parameter. (a): Vp parameter in the (Vp-rho) parametrization. (b) Rho parameter in the (Vp-rho) parametrization. (c) Vp parameter in the (Vp-Ip) parametrization. (d) Ip parameter in the (Vp-Ip) parametrization. Vp: wavespeed. Rho: Density. Ip: impedance. The green curves are analytical radiation pattern computed in the framework of the ray+Born approximation. In (a), the virtual source associated with Vp is a isotropic pressure source. In (b) the virtual source associated with rho is a force pointing toward the source (in this paticular case, a vertical force).

The FWI gradient can be interpreted in the framework of diffraction tomography as reviewed below. The gradient of the misfit function is the the zero-lag correlation between the Fréchet-derivative seismograms and the data-residual seismograms (see previous section)/

$$\frac{\partial C(\boldsymbol{m}) }{\partial m_j} = -\Re \left[ \left( \frac{ \partial \phi }{\partial m_j} \right)^T \overline{ \Delta \boldsymbol{d} } \right], \tag{4.4}\label{4.4}$$


$$\frac{\partial \phi }{\partial m_j} = \boldsymbol{P} \frac{\partial \boldsymbol{u} }{\partial m_j} = - \boldsymbol{P} \boldsymbol{A}^{-1} \left( \frac{ \partial \boldsymbol{A} }{ \partial m_j } \right) \boldsymbol{u} = - \boldsymbol{P} \boldsymbol{A}^{-1} \boldsymbol{f}^{(j)},\tag{4.5}\label{4.5}$$

and \( \boldsymbol{P} \boldsymbol{u} = \boldsymbol{u}(\boldsymbol{x}_r) \) where \( \boldsymbol{x}_r \in {\mathcal{D}}_r \) denotes the receiver positions.

The data-residual seismograms (the difference by the observations and the simulated data in the initial model) can be interpreted as the recording at receiver positions of wavefield scattered by all the missing (unknown) heterogeneities in the initial model. The Fréchet-derivative seismograms can be interpreted as the recording at receiver positions of the wavefield scattered by a small perturbation located at \( m_j \), all the other parameters being kept fixed. The zero-lag correlation aims to pick in the data residuals (assuming traveltime correspondance of the partial derivative wavefields and the data residuals) the piece of information that must be transformed into a model perturbation at position of \( m_j \). This interpretation is illustrated schematically in Figure 4.4.


Figure 4.4: Schematic interpretation of the FWI gradient. The true medium is homogeneous with three small circular inclusions (circles in (c)). The background model \( \boldsymbol{m}_0 \) lacks the three inclusions. The data residual seismograms are therefore the three diffraction hyperbolae shown in (b). The partial derivative of the data with respect to the parameter located at the position of the yellow inclusion is the diffraction hyperbolae shown in (a). The zero lag correlation between the seismograms shown in (a) and (b) aims to extract from (b) the residuals that have been generated by a missing heterogenity located at the position of the yellow circle: if we take a zero-like correlation of (a) and (b) (time wise multiplication) and integrate over time, most of the contribution will come from the diffraction hyperbole in (b) that matches that shown in (a). Indeed, this imaging principles works if the background model m0 is sufficiently kinematically accurate (see the following of the tutorial for the meaning of "sufficiently").

Back to \ref{Content}


Efficient Computation of the sensitivity matrix

The sensitivity matrix can be efficiently computed capitalizing on the spatial reciprocity of the Green functions as shown below.

Let's plug the expression of the Fréchet derivatives in the gradient.

$$\frac{ \partial C(\boldsymbol{m}) }{ \partial m_j } = -\Re \left\{ \left( \frac{ \partial \phi ({\boldsymbol{m}}) }{\partial m_j } \right)^T \overline{\Delta \boldsymbol{d}} \right\} = - \Re \left\{ \left( \boldsymbol{P} \boldsymbol{A}^{-1} \left( \frac{ \partial \boldsymbol{A} }{ \partial m_j} \right) \boldsymbol{u} \right)^T \overline{\Delta \boldsymbol{d}} \right\} = - \Re \left\{ \boldsymbol{u}^T \left( \frac{ \partial \boldsymbol{A} }{ \partial m_j} \right)^T \boldsymbol{A}^{-T} \boldsymbol{P}^T \overline{\Delta \boldsymbol{d}} \right\}\tag{5.1}\label{5.1}$$

By analogy between the second and third terms, the coefficient \( ij \) of the Fréchet derivative matrix is given by:

$$J_{ij} = \boldsymbol{u}_s^T \left( \frac{ \partial \boldsymbol{A} }{ \partial m_j} \right)^T \boldsymbol{A}^{-T} \boldsymbol{\delta}_r = \left( \frac{ \partial \boldsymbol{A} }{ \partial m_j} \boldsymbol{u}_s \right)^T \boldsymbol{A}^{-T} \boldsymbol{\delta}_r = \left( \boldsymbol{f}_s^{(j)} \right)^T \left( \boldsymbol{A}^{-T} \boldsymbol{\delta}_r \right), \tag{5.2a}\label{5.2a}$$

where source-receiver pair \( (s,r) \) corresponds to data \( i \) and \( \boldsymbol{\delta}_r \) denotes an impulse source at receiver \( r \). One coefficient of the sensitivity matrix can be build by the inner product of the virtual source by a Green function computed from the receiver position.

In the particular case where \( \boldsymbol{m} \) is the squared slowness and \( \boldsymbol{A} \) is self adjoint as for the Hemholtz operator ( \( \boldsymbol{A}^T = \boldsymbol{A} \) ), the coefficient ij of the sensitivity matrix can be computed by the multiplication of the monochromatic Green functions generated from the source s and receiver r and taken at the position of the model parameter j, weighted by \( \omega^2 \) and the source signature of the source wavefield \( s(\omega) \).

$$J_{i j} = \omega^2 s(\omega) \left( \boldsymbol{A}^{-1} \boldsymbol{\delta}_s \right)_j \left( \boldsymbol{A}^{-1} \boldsymbol{\delta}_r \right)_j, \tag{5.2b}\label{5.2b}$$.

The sensitivity matrix \( \boldsymbol{J} \) can be built explicitly by performing one forward problem per non redundant positions of shot and receivers . This is far less expensive than computing \( N \) forward problems per shot where \( N \) denotes the number of model parameters.

We can now compute the sensitivity kernel (one row of the sensitivity matrix) of acoustic FWI in the frequency-space domain for Vp parameter in the (Vp,rho) parametrization (Figure 5.1). This amounts to compute Green functions for two point sources located at the position of the shot and the receiver and multiply the two Green functions. The product of the two monochromatic Green functions has a phase equal to the sum of the phase of the two Green functions and an amplitude equal to the product of their amplitudes consistently with the definition of the single-scattered wavefield computed in the framework of the Born approximation. Figure 5.1 shows an interference picture, called wavepath (Woodward, 1992), which shows isophase surfaces of elliptical shape called isochrones (since we are in the frequency domain, all the traveltimes are synthetized in the figure). They represent Fresnel zones on which a residuals of temporal support spaning over t ± T/2 where t is the central traveltime and T is the period are back-projected. All the diffractions occurring within a Fresnel zone have a traveltime that do not differ by more than half a period. This implies that the background velocity model should be accurate enough to predict recorded traveltimes with an error lower than half a period (if this condition is not fulfilled, the simulated and recorded component of the residual will be back-projected onto two different isochrones). Anticipating the resolution analysis of FWI, one can note that the width of the isochrones decreases as the scattering angle Θ decreases.


Figure 5.1: Sensitivity kernel of frequency-domain acoustic FWI for Vp parameter in the (Vp-rho) parametrization. The source is denoted by the star, the receiver by the triangle. The frequency is 5Hz. The interference picture shows isophase surfaces (Fresnel zones) on which residuals are backprojected during FWI. All the diffractions occuring within one Fresnel zone have traveltimes that do not differ by more than half a period. Some diffraction paths are illustrated. Note how the width of the Fresnel zones decreases as the scattering angle θ decreases. The width of the Fresnel zone gives the resolution of the FWI in the spatial direction pointed by the wavenumber vector k.

Back to \ref{Content}


Efficient computation of the gradient

We now show how to compute the gradient of FWI without building the sensitivity matrix. Computing the gradient does not require to compute explicitly the sensitivity matrix.

$$\frac{ \partial C(\boldsymbol{m}) }{ \partial m_j } = - \Re \left\{ \boldsymbol{u}^T \left( \frac{ \partial \boldsymbol{A} }{ \partial m_j} \right)^T \left( \boldsymbol{A}^{-1} \right)^T {\boldsymbol{P}}^T \overline{\Delta \boldsymbol{d}} \right\} = - \Re \left\{ \boldsymbol{u}^T \left( \frac{ \partial\boldsymbol{A} }{ \partial m_j} \right)^T \overline{\boldsymbol{v}} \right\} = - \Re \left\{ \left( \frac{ \partial\boldsymbol{A} }{ \partial m_j} \boldsymbol{u} \right)^T \overline{\boldsymbol{v}} \right\} = - \Re \left( \left\{ \boldsymbol{f}_s^{(j)} \right)^T \overline{\boldsymbol{v}} \right\}, \tag{6.1}\label{6.1}$$

where the adjoint wavefield satisfies the wave equation

$$\boldsymbol{A}^T \, \overline{\boldsymbol{v}} = {\boldsymbol{P}}^T \overline{\Delta \boldsymbol{d}},\tag{6.2}\label{6.2}$$

with a source term corresponding to the assemblage of all of the residuals (we exploit here the linearity of the wave equation). We see now that we can compute \( \partial C \) by taking the zero-lag correlation between the virtual source \( \frac{ \partial\boldsymbol{A} }{ \partial m_j} \boldsymbol{u} \) and the backpropagated adjoint wavefield \( \boldsymbol{v} \) instead of taking the zero-lag correlation between the partial derivative wavefield at the receiver positions with the data residuals as indicated by (\ref{4.4}).

To achieve this, we have used the property of the adjoint of the wave equation operator \( \boldsymbol{A} \)

$$ < \boldsymbol{P} \boldsymbol{A} \boldsymbol{f} | \Delta \boldsymbol{d} >_{\partial \Omega} = < \boldsymbol{f} | \boldsymbol{A}^{\dagger} \boldsymbol{P}^T \Delta \boldsymbol{d} >_\Omega = < \boldsymbol{f} | \boldsymbol{A}^{T} \overline{\Delta \boldsymbol{d}} >_\Omega ,$$

where \( \boldsymbol{f} = \frac{ \partial\boldsymbol{A} }{ \partial m_j} \boldsymbol{u} \) is the virtual source, \(\Omega\) is the object domain spanned by \(\boldsymbol{m}\), \(\partial \Omega\) is the acquisition domain.
The adjoint of the wave equation operator backpropagates in time the data residuals (Figure 6.1). This is shown by the conjugate of the data residuals.

$$ITF ( \overline{u}(\omega)) = \int_{-\infty}^{+\infty} \overline{u}(\omega) \, \exp\left(i \omega t\right) d\omega = \int_{-\infty}^{+\infty} u(-\omega) \, \exp \left(i \omega t\right) d\omega = \int_{-\infty}^{+\infty} u(\omega) \, \exp \left(i \omega (-t)\right) d\omega = u(-t). \tag{6.3}\label{6.3}$$

It is reminded that, when \( u(t) \) is real valued, \( u(-\omega) = \bar{u}(\omega) \) as \( \bar{u}(\omega) = \overline{ \int_t u(t) e^{-i \omega t} dt } = \int_t u(t) e^{i \omega t} dt = \int_t u(t) e^{-i (-\omega) t} dt = u(-\omega) \).

Computing the gradient requires two forward problems per source: one to compute the incident wavefield \( \boldsymbol{u} \), one to compute the adjoint wavefield \( {\boldsymbol{v}} \).

fig_imagingprinciple_adjoint.png fig_adjoint_snap.png

Figure 6.1: Sketch illustration of the imaging principle of FWI based upon adjoint simulation. The gradient is built by the zero-lag correlation between the incident wavefield and the adjoint wavefield back propagated in time from the receiver positions. The source of the adjoint equation is illustrated in the top left panel. It represents the data residuals reversed in time. The incident wavefield and the back-propagated adjoint wavefield should be in phase at the position of the missing heterogeneity at T=0.775s (right figure). FWI can be viewed as a time reversal source localization method, where the targeted sources are the virtual scattering sources in depth.

Back to \ref{Content}


Computing the gradient of the misfit function with the adjoint-state method


We review what has been shown above in a more generic way using the so-called adjoint-state method as developed by G. Chavent and reviewed by Plessix (2006) and Kern (2016).

We consider the least-squares functional.

$$C(\boldsymbol{m}) = \| \phi(\boldsymbol{m}) - \boldsymbol{d}^* \|_2^2.\tag{7.1}\label{7.1}$$

The gradient of the functional as a function of the Fréchet derivatives is given by

$$\nabla_{\boldsymbol{m}} C(\boldsymbol{m}) = \left[ \left( \frac{\partial \phi(\boldsymbol{m}) }{\partial \boldsymbol{m} } \right)^\dagger \right]_{N \times M} \left( \phi(\boldsymbol{m}) - \boldsymbol{d}^* \right)_M.\tag{7.2}\label{7.2}$$

Let's differentiate the state equation \( F(\boldsymbol{u},\boldsymbol{m})=0 \).

$$\frac{\partial F(\boldsymbol{u},\boldsymbol{m})}{\partial \boldsymbol{u} } \delta \boldsymbol{u} = - \frac{\partial F(\boldsymbol{u},\boldsymbol{m})}{\partial \boldsymbol{m} } \delta \boldsymbol{m}.\tag{7.3}\label{7.3}$$

$$\left(\frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{u} } \right) \frac{ \partial \boldsymbol{u} }{ \partial \boldsymbol{m} } = - \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{m} }.\tag{7.4}\label{7.4}$$

From the observation equation \( \phi(\boldsymbol{m}) = \boldsymbol{P} \boldsymbol{u}(\boldsymbol{m}) \rightarrow \partial_{\boldsymbol{m}} \phi(\boldsymbol{m}) =\boldsymbol{P} \partial_{\boldsymbol{m}} \boldsymbol{u}(\boldsymbol{m}) \).

$$\nabla_{\boldsymbol{m}} C(\boldsymbol{m}) = -\left[ \left( \boldsymbol{P} \frac{\partial \boldsymbol{u}(\boldsymbol{m}) }{\partial \boldsymbol{m} } \right)^\dagger \right]_{N \times M} \left( \phi(\boldsymbol{m}) - \boldsymbol{d}^* \right)_M.\tag{7.5}\label{7.5}$$

Computing the gradient, written under the form of (\ref{7.5}), requires to solve the linearized equation (\ref{7.4}). This linearized equation has \( N \) right-hand sides and hence is expensive to solve.

Let's plug the closed-form solution of equation (\ref{7.4}) in the expression of the gradient, equation (\ref{7.5}).

$$ \nabla_{\boldsymbol{m}} C(\boldsymbol{m}) = - \left[ \left( \boldsymbol{P} \left(\frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{u} } \right)^{-1} \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{m} } \right)^\dagger \right]_{M \times N} \left( \boldsymbol{d} - \boldsymbol{d}^* \right)_N.\tag{7.6}\label{7.6}$$

$$\nabla_{\boldsymbol{m}} C(\boldsymbol{m}) = - \left[ \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{m} }^\dagger \left( \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{u} }^{-1} \right)^\dagger \boldsymbol{P}^T \right]_{M \times N} \left( \boldsymbol{d} - \boldsymbol{d}^* \right)_N.\tag{7.7}\label{7.7}$$

Let's parenthesize (\ref{7.7}) as follow

$$\nabla_{\boldsymbol{m}} C(\boldsymbol{m}) = - \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{m} }^\dagger \left[ \left( \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{u} }^\dagger \right)^{-1} \boldsymbol{P}^T \left( \boldsymbol{d} - \boldsymbol{d}^* \right) \right].\tag{7.8}\label{7.8}$$

With inner-product notations, we have used the identity

$$< \boldsymbol{P} \left(\frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{u} }\right)^{-1} \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{m} }, \boldsymbol{d} - \boldsymbol{d}^* >=< \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{m} } , \left( \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{u} }^\dagger \right)^{-1} \boldsymbol{P}^T \left( \boldsymbol{d} - \boldsymbol{d}^* \right) >.\tag{7.9}\label{7.9}$$

Let's introduce the adjoint state variable \( \boldsymbol{v} \).

$$\frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{u} }^\dagger \boldsymbol{v}= \boldsymbol{P}^T \left( \boldsymbol{d} - \boldsymbol{d}^* \right).\tag{7.10}\label{7.10}$$

With this new variables, the gradient reads

$$\nabla_{\boldsymbol{m}} C(\boldsymbol{m}) = - \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{m} }^\dagger \boldsymbol{v}. \tag{7.11}\label{7.11}$$

The gradient requires now to solve one state equation (to compute \( \boldsymbol{u} \) from which \( \frac{ \partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{m} } \) follows) and one adjoint state equation (\ref{7.10}).

Back to \ref{Content}


The adjoint-state method with Lagrangian functional

Let's come back to the original full-space formulation of FWI as a constrained optimization where the state equation is the equality constraint (\ref{1.4b}).

$$ \min_{\boldsymbol{m},\boldsymbol{u}} \| \boldsymbol{P} \boldsymbol{u} - \boldsymbol{d}^* \|_2^2 ~~ \text{subject to } ~~ F(\boldsymbol{u},\boldsymbol{m})=0\tag{8.1}\label{8.1}.$$

We solve this constrained problem with the method of Lagrange multiplier. Let's introduce the following Lagrangian function

$$\mathcal{L}(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v}) = \| \boldsymbol{P} \boldsymbol{u} - \boldsymbol{d} \|_2^2 + < \boldsymbol{v} , F(\boldsymbol{u},\boldsymbol{m}) >,\tag{8.2}\label{8.2}$$

where \( \boldsymbol{v} \) is the Lagrange multiplier (or adjoint-state variable), \( < > \) denotes inner product, and \( \boldsymbol{u} \), \( \boldsymbol{v} \) and \( \boldsymbol{m} \) are independent variables.

We solve the following min max (saddle point) problem

$$ \min_{\boldsymbol{m},\boldsymbol{u}} \max_{\boldsymbol{v}} \| \boldsymbol{P} \boldsymbol{u} - \boldsymbol{d}^* \|_2^2 + < \boldsymbol{v} , F(\boldsymbol{u},\boldsymbol{m}) >. \tag{8.3}\label{8.3}$$

The first-optimality conditions state that the minimizer is reached at a saddle point of the Lagrangian function, namely,

$$ \partial \mathcal{L}(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v}) / \partial \boldsymbol{v} = 0, \tag{8.4}\label{8.4}$$

$$ \partial \mathcal{L}(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v}) / \partial \boldsymbol{u} = 0, \tag{8.5}\label{8.5}$$

$$ \partial \mathcal{L} (\boldsymbol{m},\boldsymbol{u},\boldsymbol{v})/ \partial \boldsymbol{m} = 0. \tag{8.6}\label{8.6}$$

Instead of jointly updating \( \boldsymbol{v} \), \( \boldsymbol{u} \), and \( \boldsymbol{m} \) , we restrict the search space to the case when \( \boldsymbol{u} \) is a realization of the state equation for any \( \boldsymbol{m} \). Let's denote by \( \tilde{\boldsymbol{u}} \) this realization.

In the full space spanned by the independent variables \( \boldsymbol{m} \), \( \boldsymbol{u} \), and \(\boldsymbol{v}\), the partial derivative of the Lagrangian with respect to \( \boldsymbol{v} \) is the state equation

$$\frac{\partial \mathcal{L}(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v})}{\partial \boldsymbol{v}}=F(\boldsymbol{u},\boldsymbol{m}). \tag{8.7}\label{8.7}$$

When \( \boldsymbol{u} \) is a realization of the state equation, the first optimality condition \( \partial \mathcal{L}(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v}) / \partial \boldsymbol{v} \) is satisfied since \( F(\tilde{\boldsymbol{u}},\boldsymbol{m}) = 0 \).

Let's now consider the second optimality condition. In the full space spanned by the independent variables \( \boldsymbol{m} \), \( \boldsymbol{u} \), and \(\boldsymbol{v}\), zeroing the Lagrangian with respect to the state variable gives the adjoint-state equation

$$\left( \frac{\partial F(\boldsymbol{u},\boldsymbol{m}) }{\partial \boldsymbol{u}} \right)^\dagger \boldsymbol{v} = \boldsymbol{P}^T (\boldsymbol{P} \boldsymbol{u} - \boldsymbol{d}^*). \tag{8.8}\label{8.8}$$.

In limiting ourselves to the reduced space spanned by \( \tilde{\boldsymbol{u}} \), the associated adjoint-state variables, denoted by \( \tilde{\boldsymbol{v}} \) satisfy

$$\left( \frac{\partial F(\tilde{\boldsymbol{u}},\boldsymbol{m}) }{\partial \boldsymbol{u}} \right)^\dagger \tilde{\boldsymbol{v}} = \boldsymbol{P}^T (\boldsymbol{P} \tilde{\boldsymbol{u}} - \boldsymbol{d}^*). \tag{8.9}\label{8.9}$$,

Moreover, when \( \boldsymbol{u} \) is a realization of the state equation, we have

$$C(\boldsymbol{m}) = \left[ \mathcal{L}(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v})\right]_{\tilde{\boldsymbol{u}}}, \tag{8.10}\label{8.10}$$

where \( C(\boldsymbol{m}) \) is the misfit function of the unconstrained problem (\ref{2.1}).

Indeed, we have also

$$C(\boldsymbol{m}) = \left[ \mathcal{L}(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v})\right]_{\tilde{\boldsymbol{u}}, \tilde{\boldsymbol{v}}}.\tag{8.11}\label{8.11}$$

Therefore, in virtue of the definition of partial derivative, taking the partial derivative of \( \mathcal{L} \) with respect to \( \boldsymbol{m} \) and taking the values at \( (\tilde{\boldsymbol{u}}, \tilde{\boldsymbol{v}}) \) gives the gradient of \( \mathcal{L} \) with respect to \( \boldsymbol{m} \)

$$\nabla_\boldsymbol{m} C(\boldsymbol{m}) = \left[ \partial_\boldsymbol{m} \mathcal{L}(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v})\right]_{\tilde{\boldsymbol{u}}, \tilde{\boldsymbol{v}}}.\tag{8.12}\label{8.12}$$

We find

$$\nabla_{\boldsymbol{m}} C(\boldsymbol{m}) = \left[ \partial_{\boldsymbol{m}} \mathcal{L}(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v}) \right] _{\tilde{\boldsymbol{u}}, \tilde{\boldsymbol{v}}}= \left[ \left( \frac{\partial F(\boldsymbol{u},\boldsymbol{m})}{\partial \boldsymbol{m}} \right)^\dagger \boldsymbol{v} \right]_{\tilde{\boldsymbol{u}}, \tilde{\boldsymbol{v}}},\tag{8.13}\label{8.13}$$

which is identical to (\ref{7.11}). The above expression of the gradient can be used to iteratively update \( \boldsymbol{m} \) with any local optimization algorithm (steepest-descent, conjugate gradient, quasi Newton, Newton).

We can now apply this recipe to frequency-domain and time-domain FWI.
Computing the gradient of frequency-domain FWI with Lagrangian function

The Lagrangian function is given by

$${\mathcal{L}} \left(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v}\right) = \| \boldsymbol{P} \boldsymbol{u} - \boldsymbol{d}^* \|_2^2 + \Re \left\{ < \boldsymbol{v} , \boldsymbol{A}(\boldsymbol{m}) \, \boldsymbol{u} - \boldsymbol{b} > \right\},\tag{8.14}\label{8.14}$$

where it is reminded that \( \boldsymbol{u} \) and \( \boldsymbol{v} \) are complex valued. Taking the derivative of a function with respect to complex-valued parameter makes use of Wirtinger derivative. See also the paper of Brandwood about computing the gradient of a function of complex-valued parameters.

The adjoint-state equation is

$$\partial_{\boldsymbol{u}} {\mathcal{L}} \left(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v}\right) = 0\tag{8.15}\label{8.15}$$

Before taking the derivative with respect to \( \boldsymbol{u} \), we introduce the adjoint of \( \boldsymbol{A} \).

$${\mathcal{L}} \left(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v}\right) = \| \boldsymbol{P} \boldsymbol{u} - \boldsymbol{d}^* \|_2^2 + \Re \left\{ < \boldsymbol{A}(\boldsymbol{m})^\dagger \boldsymbol{v} , \boldsymbol{u} > - < \boldsymbol{v} , \boldsymbol{b} > \right\}.\tag{8.16}\label{8.16}$$

The adjoint-state equation is given by

$$\partial_{\boldsymbol{u}} {\mathcal{L}} \left(\boldsymbol{m},\boldsymbol{u},\boldsymbol{v}\right) = 0 \rightarrow \boldsymbol{A}(\boldsymbol{m})^\dagger \boldsymbol{v} = \boldsymbol{P}^T \left( \boldsymbol{P} \boldsymbol{u} - \boldsymbol{d}^* \right) \rightarrow \boldsymbol{A}(\boldsymbol{m})^T \overline{\boldsymbol{v}} = \boldsymbol{P}^T \left( \overline{\boldsymbol{P} \boldsymbol{u} - \boldsymbol{d}^*} \right),\tag{8.17}\label{8.17}$$

which is identical to (\ref{8.8}).

The gradient is given by

$$\nabla_{\boldsymbol{m}} C(\boldsymbol{m})= \Re \left\{ < \boldsymbol{v} , \frac{\partial \boldsymbol{A}(\boldsymbol{m}) }{\partial \boldsymbol{m}} \, \boldsymbol{u} > \right\},\tag{8.18}\label{8.18}$$

which is identifcal to (\ref{8.13}).
Computing the gradient of time-domain FWI with Lagrangian function

We review how to compute the gradient of time-domain FWI with Lagrangian function when the wave equation is formulated as a first-order hyperbolic velocity-stress system.

$$\left\{ \begin{array}{l} \partial_t v(x,t) = b(x) \partial_x \sigma(x,t) + s(x,t) \\ \partial_t \sigma(x,t) = \kappa(x) \partial_x v(x,t) \end{array} \right.\tag{8.19}\label{8.19}$$

The wave equation can be written in matrix compact form as

$$\partial_t \boldsymbol{u}(x,t) = \boldsymbol{M}(m(x))\boldsymbol{D} \boldsymbol{u}(x,t) + \boldsymbol{b}(x,t), \tag{8.20}\label{8.20}$$


$$\boldsymbol{u}(x,t)=\left(\begin{array}{c} v(x,t) \\ \sigma(x,t)\end{array}\right), ~~ \boldsymbol{b}(x,t)=\left(\begin{array}{c} s(x,t) \\ 0 \end{array}\right), ~~ \boldsymbol{M}(x) = \left(\begin{array}{cc}b(x) & 0 \\ 0 & \kappa(x)\end{array}\right), ~~ \boldsymbol{D}= \left(\begin{array}{cc} 0 & \partial_x \\ \partial_x & 0 \end{array} \right).\tag{8.21}\label{8.21}$$

with initial conditions: \( \boldsymbol{u}(x,0)=\boldsymbol{0}; ~~ \partial_t \boldsymbol{u}(x,0)=\boldsymbol{0} \).

We rewrite the wave equation such that it is self-adjoint by moving the subsurface properties in the left-hand side

$$\boldsymbol{C}(m(x)) \partial_t \boldsymbol{u}(x,t) = \boldsymbol{D} \boldsymbol{u}(x,t) + \boldsymbol{C}(m(x)) \boldsymbol{b}(x,t) ~~~ \text{with} ~~~ \boldsymbol{C}(x) = \left(\begin{array}{cc}\rho(x) & 0 \\ 0 & 1/\kappa(x)\end{array}\right)\tag{8.22}\label{8.22}$$

If we write the observation equation as

$$d(x_r,t) = d_r(t) = \boldsymbol{P}_r \boldsymbol{u}(x,t)\tag{8.23}\label{8.23}$$

then, the Lagrangian function is given by

$$L(\boldsymbol{u}(x,t),\boldsymbol{v}(x,t),\boldsymbol{m}(x)) = \frac{1}{2} \int_0^T \sum_r \left( \boldsymbol{P}_r \boldsymbol{u}(x,t) - \boldsymbol{d}_r(t)^* \right)^2 dt + \int_{t=0}^T < \boldsymbol{v}(x,t), \underbrace{\boldsymbol{C}(\boldsymbol{m}(x)) \partial_t \boldsymbol{u}(x,t) - \boldsymbol{D} \boldsymbol{u}(x,t) - \boldsymbol{C}(m(x)) \boldsymbol{b}(x,t)}_{State ~ equation} > dt,\tag{8.24}\label{8.24}$$

where \( \boldsymbol{v} \) is the Lagrange multiplier.

We apply an integration by part for the time and space derivatives.

  • Integration by part for time derivative

$$\int_0^T < \boldsymbol{v}(x,t), \partial_t \boldsymbol{u}(x,t) > = \left[ < \boldsymbol{v}(x,t) , \boldsymbol{u}(x,t) > \right]_0^T - \int_0^T < \partial_t \boldsymbol{v}(x,t), \boldsymbol{u}(x,t) = < \boldsymbol{v}(x,T) , \boldsymbol{u}(x,T) > - \int_0^T < \partial_t \boldsymbol{v}(x,t), \boldsymbol{u}(x,t) >.\tag{8.25}\label{8.25}$$

  • Integration by part for space derivative

$$\int \partial_x \boldsymbol{u}(x,t) \boldsymbol{v}(x,t) dx = - \int \boldsymbol{u}(x,t) \partial_x \boldsymbol{v}(x,t) dx. \tag{8.26}\label{8.26}$$

$$< \boldsymbol{D} \boldsymbol{u}(x,t), \boldsymbol{v}(x,t) > = < \boldsymbol{u}(x,t), \boldsymbol{D}^\dagger \boldsymbol{v}(x,t) > ~~ \text{with} ~~ \boldsymbol{D}^\dagger = - \boldsymbol{D}. \tag{8.27}\label{8.27}$$

Using these results, the Lagrangian after a double integration by part reads

$$\begin{array}{l}L(\boldsymbol{u}(x,t),\boldsymbol{v}(x,t),\boldsymbol{m}(x)) = \frac{1}{2} \int_0^T \sum_r \left( \boldsymbol{P}_r \boldsymbol{u}(x,t) - \boldsymbol{d}_r(t)^* \right)^2 dt + < \boldsymbol{C}(\boldsymbol{m}(x)) \boldsymbol{v}(x,T), \boldsymbol{u}(x,T) > \\ + \int_{t=0}^T \left[ - < \boldsymbol{C}(\boldsymbol{m}(x)) \partial_t \boldsymbol{v}(x,t), \boldsymbol{u}(x,t) > + < \boldsymbol{D} \boldsymbol{v}(x,t) , \boldsymbol{u}(x,t) > - < \boldsymbol{v}(x,t) , \boldsymbol{C}(\boldsymbol{m}(x)) \boldsymbol{b}(x,t) > \right] dt \end{array} \tag{8.28}\label{8.28}$$.

The adjoint-state equation is given by

$$\boldsymbol{C}(\boldsymbol{m}(x)) \partial_t \boldsymbol{v}(x,t) = \boldsymbol{D} \boldsymbol{v}(x,t) + \left( \boldsymbol{P}_r \right)^T \sum_r \left( \boldsymbol{P}_r \boldsymbol{u}(x,t) - \boldsymbol{d}_r(t)^* \right).\tag{8.29}\label{8.29}$$

or, equivalently in a form suitable for explicit time stepping

$$\partial_t \boldsymbol{v}(x,t) = \boldsymbol{M}(\boldsymbol{m}(x)) \boldsymbol{D} \boldsymbol{v}(x,t) + \boldsymbol{M}(\boldsymbol{m}(x)) \left( \boldsymbol{P}_r \right)^T \sum_r \left( \boldsymbol{P}_r \boldsymbol{u}(x,t) - \boldsymbol{d}_r(t)^* \right),\tag{8.30}\label{8.30}$$
with a final condition \( \boldsymbol{v}(x,T) = 0 \) (this final condition results from the second term of the Lagrangian).

We perform a change of variable to transform the final condition into an initial condition

$$\boldsymbol{w}(x,t)=\boldsymbol{v}(x,T-t)~~~ \text{such that} ~~~ \boldsymbol{w}(x,0) = \boldsymbol{v}(x,T).\tag{8.31}\label{8.31}$$

Sustituting \( \boldsymbol{v} \) by \( \boldsymbol{w} \) in the adjoint-state equation gives

$$\partial_t \boldsymbol{w}(x,T-t) = \boldsymbol{M}(\boldsymbol{m}(x)) \boldsymbol{D} \boldsymbol{w}(x,T-t) + \boldsymbol{M}(\boldsymbol{m}(x)) \left( \boldsymbol{P}_r \right)^T \sum_r \left( \boldsymbol{P}_r \boldsymbol{u}(x,t) - \boldsymbol{d}_r(t)^* \right).\tag{8.32}\label{8.32}$$

Then, we apply the change of variable \( t' = T - t \).

$$-\partial_t' \boldsymbol{w}(x,t') = \boldsymbol{M}(\boldsymbol{m}(x)) \boldsymbol{D} \boldsymbol{w}(x,t') + \boldsymbol{M}(\boldsymbol{m}(x)) \left( \boldsymbol{P}_r \right)^T \sum_r \left( \boldsymbol{P}_r \boldsymbol{u}(x,T-t') - \boldsymbol{d}_r(T-t')^* \right).\tag{8.33}\label{8.33}$$

$$\partial_t' \boldsymbol{w}(x,t') = - \boldsymbol{M}(\boldsymbol{m}(x)) \boldsymbol{D} \boldsymbol{w}(x,t') - \boldsymbol{M}(\boldsymbol{m}(x)) \left( \boldsymbol{P}_r \right)^T \sum_r \left( \boldsymbol{P}^v_r \boldsymbol{u}(x,T-t') - \boldsymbol{d}_r(T-t')^* \right),\tag{8.34}\label{8.34}$$

with \( \boldsymbol{v}(x,0)=0 \).

In summary, the practical procedure to implement the adjoint-state equation consists of

  • Reverse in time the data residuals.
  • Reverse the sign of \( dt\) in the wavefield simulation code.

Let's now compute the gradient. From (\ref{8.24}), we have

$$\nabla_{\boldsymbol{m}} C(\boldsymbol{m}) \approx - \int_{t=0}^T < \frac{ \partial_{\boldsymbol{m}} \boldsymbol{C}(\boldsymbol{m}(x)) }{ \partial \boldsymbol{m} } \partial_t \boldsymbol{v}(x,t), \boldsymbol{u}(x,t) > dt, \tag{8.35}\label{8.35}$$

More explicitly,

$$\begin{array}{l}\nabla_{\rho} C(\boldsymbol{m}) \approx - \int_{t=0}^T < \partial_t \boldsymbol{v}(x,t), \boldsymbol{u}(x,t) > dt, \\ \nabla_{\kappa} C(\boldsymbol{m}) \approx \int_{t=0}^T < \frac{1}{\kappa^2(x)} \partial_t \boldsymbol{v}(x,t), \boldsymbol{u}(x,t) > dt. \end{array}\tag{8.36}\label{8.36}$$

To compute the gradient with respect to other classes of parameters, use the chain rule of derivatives.

Back to \ref{Content}


Role of the Hessian

Let's remind the Newton system providing the solution of the descent direction.

$$\frac{\partial^2 C({\boldsymbol{m}}_0)}{\partial {\boldsymbol{m}}^2} \, \Delta {\boldsymbol{m}}=- \frac{\partial C({\boldsymbol{m}}_0)}{\partial \boldsymbol{m}}. \tag{9.1}\label{9.1}$$

We have just seen the role of the gradient in FWI. We now analyze the role of the Hessian.

Let's now compute the element \( m,l \) of the second-derivative of the misfit function.

$$\frac{\partial^2 C({\boldsymbol{m}})}{\partial m_k \partial m_l}=-\sum_{i=1}^{N} \Re \left\{ \left( \overline{ \frac{\partial^2 d_i }{\partial m_k \partial m_l} } \right) (d_i-d_i^*) \right\}
+\sum_{i=1}^{N} \Re \left\{ \left( \overline{ \frac{\partial d_i}{\partial m_k} } \right) \frac{\partial d_i}{\partial m_l} \right\}. \tag{9.2}\label{9.2}$$

Going back to matrix notation,

$$\frac{\partial^2 C({\boldsymbol{m}})}{\partial {\boldsymbol{m}}^2}=\Re \left\{ \boldsymbol{J}^\dagger \boldsymbol{J} \right\}+\Re \left\{ \frac{\partial \boldsymbol{J}^t}{\partial {\boldsymbol{m}}^t} (\overline{ \Delta \boldsymbol{d} } ~ ... ~ \overline{\Delta \boldsymbol{d}} ) \right\}. \tag{9.3}\label{9.3}$$

In particular,at the point \( \boldsymbol{m}_0 \) of the model space, we have

$$\frac{\partial^2 C({\boldsymbol{m}}_0)}{\partial \boldsymbol{m}^2}=\Re \left\{ \boldsymbol{J}_0^\dagger \boldsymbol{J}_0 \right\}+\Re \left\{ \frac{\partial \boldsymbol{J}_0^t}{\partial \boldsymbol{m}^t} ( \overline{\Delta \boldsymbol{d}} ~ ... ~ \overline{ \Delta \boldsymbol{d} }) \right\}. \tag{9.4}\label{9.4}$$

The linear term of the Hessian, which is sometimes referred to as approximate or Gauss-Newton Hessian, is given by:

$${\boldsymbol{H}}_a = \Re \left\{ \boldsymbol{J}^\dagger \boldsymbol{J} \right\}. \tag{9.5}\label{9.5}$$

The linear term in the Hessian represents the sum of the zero-lag correlation between partial derivative wavefields with respect to two different
model parameters \( m_l \) and \( m_k \)$ taken at the receiver positions.

$$h_{a_{lk}} = \Re \left\{ \sum_{i=1}^{N} \overline{\frac{ \partial d_i }{\partial m_l}} \, \frac{ \partial d_i }{\partial m_k} \right\}, \tag{9.6}\label{9.6}$$

where \( N\) is the total number of data.

As seismic sources are band-limited, the partial derivative wavefields associated with close parameters are correlated. This limits the spatial resolution of the imaging.

Applying the inverse of the approximate Hessian on the gradient aims to correct for these limited bandwidth effects as in deconvolution.

For sake of illustration, w e consider a point diffractor in a homogeneous background model (Figure 9.1). Grid dimensions are \( 31 \times 31\). The grid interval is 50 m. The acquisition is circular and centered on the point diffractor. There 41 sources and 41 receivers evenly distributed on the acquisition circle. We invert 15 frequencies within the frequency band \( \left[4.5;20\right] \) Hz.
We perform 15 iterations per frequency. The source wavelet is a Dirac. We compare the convergence of Gauss-Newton and steepest-descent methods in Figure 9.2.

fig_inclusion_small_vtrue.png fig_hessian1.png

Figure 9.1: (Left) Inclusion velocity model. (Right) Gauss-Newton Hessian for the 4.5 Hz frequency and a full circular acquisition surrounding the inclusion.


Figure 9.2: Gauss-Newton versus steepest-descent reconstruction of the inclusion in the 4.5-20 Hz frequency band from the circular acquisition (Figure 7).

The diagonal terms of the approximate Hessian are the sum of the auto-correlations of the partial derivative wavefields (square amplitude) at the receiver positions:

$$h_{a_{ll}} = \Re \left\{ \sum_{i=1}^{M} \overline{\frac{ \partial d_i }{\partial m_l}} \, \frac{ \partial d_i }{\partial m_l} \right\}= \sum_{i=1}^{M} \| \frac{ \partial d_i }{\partial m_l} \|_2^2$$

The diagonal terms of the Hessian aims to remove from the gradient the geometrical-amplitude effects of the partial derivative wave. It can be used as a very efficient preconditioner in FWI from surface acquisitions (Figure 9.3). It also mitigates the acquisition footprint (Figure 9.4).


Figure 9.3: (Top) FWI velocity model of the Baragiano thrust belt. (Bottom) Diagonal Hessian as a FWI preconditioner. (See Ravaut et al., 2004).


Figure 9.4: Diagonal Hessian for the inclusion test. The image shows the acquisition footprint.

We now have a look to the nonlinear term of the Hessian. Let's remember the expression of the Hessian:

$${\boldsymbol{H}} = \Re \left\{ \boldsymbol{J}^\dagger \boldsymbol{J} \right\}+\Re \left\{ \frac{\partial \boldsymbol{J}^T}{\partial {\boldsymbol{m}}^T} ( \overline{\Delta \boldsymbol{d}} ~ ... ~ \overline{\Delta \boldsymbol{d}} ) \right\} \tag{9.7}\label{9.7}$$

and of the partial derivative wavefield:

$$\boldsymbol{A} \left( \frac{ \partial \boldsymbol{u} }{\partial m_l} \right) = {\boldsymbol{f}}^{(l)} = - \left( \frac{ \partial \boldsymbol{A} }{ \partial m_l} \right) \boldsymbol{u}. \tag{9.8}\label{9.8}$$

Differentiation of the partial derivative wavefield with respect to \( m_k \) gives:

$$\boldsymbol{A} \frac{ \partial^2 \boldsymbol{u} }{\partial m_l \partial m_k} = - {\boldsymbol{f}}^{(lk)}. \tag{9.9}\label{9.9} $$


$${\boldsymbol{f}}^{(lk)} = \left( \frac{\partial \boldsymbol{A} }{\partial m_l} \right) \left( \frac{\partial \boldsymbol{u} }{\partial m_k} \right) + \left( \frac{\partial \boldsymbol{A} }{\partial m_k} \right) \left( \frac{\partial \boldsymbol{u} }{\partial m_l} \right) + \left( \frac{\partial^2 \boldsymbol{A} }{\partial m_l \partial m_k} \right) \boldsymbol{u}. \tag{9.10}\label{9.10}$$

$${\boldsymbol{A}} \frac{ \partial^2 \boldsymbol{u} }{\partial m_l \partial m_k} = - {\boldsymbol{f}}^{(lk)}. \tag{9.11}\label{9.11} $$


$${\boldsymbol{f}}^{(lk)} = \left( \frac{\partial {\boldsymbol{A}} }{\partial m_l} \right) \left( \frac{\partial \boldsymbol{u} }{\partial m_k} \right) + \left( \frac{\partial {\boldsymbol{A}} }{\partial m_k} \right) \left( \frac{\partial \boldsymbol{u} }{\partial m_l} \right) + \left( \frac{\partial^2 {\boldsymbol{A}} }{\partial m_l \partial m_k} \right) \boldsymbol{u}. \tag{9.12}\label{9.12} $$

The second partial derivative of the wavefields describe doubly-scattered wavefield by the model perturbations \( m_l \) and \( m_k \) (Figure 9.5).
These second derivatives are multiplied by the data residuals in the Hessian. This terms refocuses the footprint of double-scattered
events in the single-scattering gradients at the correct position of model parameters \( m_l \) and \( m_k \).

$$\Delta {\boldsymbol{m}} = \left[ \Re \left\{ \boldsymbol{J}_0^\dagger \boldsymbol{J}_0 \right\}+\Re \left\{ \frac{\partial \boldsymbol{J}_0^T}{\partial {\boldsymbol{m}}^T} \left( \overline{\Delta \boldsymbol{d}} ~ ... ~ \overline{\Delta \boldsymbol{d} } \right) \right\} \right]^{-1} \Re \left\{ \boldsymbol{J}_0^\dagger \Delta \boldsymbol{d} \right\}. \tag{9.13}\label{9.13}$$


Figure 9.5: Double scattering effects in FWI.

$$\left[ M \right] = \left( \frac{ \left[M\right] \ \left[M\right] }{ \left[D\right] \ \left[D\right] } \right) \left( \frac{ \left[D\right] }{ \left[M\right] } \left[D\right] \right). \tag{9.14}\label{9.14} $$

The gradient of the misfit function has not the units of the model. Only the action of the Hessian on the gradient gives the physical units to
the model perturbation. Therefore, accounting for the Hessian is particularly important in multiparameter FWI where the gradients associated with each parameter class need to be scaled properly.

Back to \ref{Content}


Resolution analysis of FWI

To state the problem of seismic imaging, let's start with the equation of the reflection traveltime curve

$$t(o)= \frac{1}{c} \sqrt{4 \, z^2 + o^2}, \tag{10.1}\label{10.1}$$

where \( t \) is traveltime, \( o \) is offset, \( z \) is the depth of the reflector, \( c \) is the wavespeed of the half-space above the reflector (Figure 10.1).



Figure 10.1: Key attributes in seismic imaging.

For \( o \rightarrow 0 \), \(~~~~ t(o) \approx \frac{2 z }{c} \). This equation highlights the ill-famed velocity-depth ambiguity.
For \( o \rightarrow \infty \) \(~~~~ t(o) \approx \frac{o }{c} \). This equation highlights that waves recorded at long offsets (diving waves, refracted waves, post-critical reflections) carried out information on the kinematic properties of the subsurface, represented by the wavespeed. Wavespeed also represents the average property of the upper medium, its long wavelengths.
These equations provide the basic guidelines to design hierarchical multiscale FWI of wide-aperture data.
Step 1: invert the wide-aperture data to solve the long-wavelengths of the velocity model.
Step 2: invert the short-spread reflection data to image reflectivity.
These two end-members tasks can be implemented in a single integrated seismic imaging workflow with FWI when wide-aperture data are available, while they are performed in alternating mode with conventional reflection data through an explicit scale separation between the long and short wavelengths. The intermediate wavelengths belong to the null space of the seismic imaging problem even if broadband wide-azimuth data contribute to shrink this null space.
Resolution analysis in the framework of diffraction tomography

This section is inspired by Sirgue and Pratt (2004) who provided a tutorlal of the principles of diffraction tomography ( Devaney, 1984; Wu and Toksoz, 1987 ). Let's assume distant sources and receivers and a homogeneous background medium.
In this framework, incident and scattered Green functions can be approximated by monochromatic plane waves:

$$\begin{array}{l} g_0\left(\omega,\boldsymbol{x};{\boldsymbol{s}}\right) \approx \exp \left( i k \hat{\boldsymbol{s}} . \boldsymbol{x} \right), \\ g_0\left(\omega,\boldsymbol{x};{\boldsymbol{r}}\right) \approx \exp \left( i k \hat{\boldsymbol{r}}\boldsymbol{x} \right), \end{array}\tag{10.2}\label{10.2}$$

where \( k = \omega / c_0 \), \( \hat{\boldsymbol{s}} \) is unit vector in the incident propagation direction (source to scatterer), \( \hat{\boldsymbol{r}} \) is the unit vector in the inverse scattering direction (receiver to scatterer) (Figure 10.2).


Figure 10.2: incident (left) and scattered (middle) plane waves. (Right) Corresponding wavenumber mapped in the image.

Let's remind the expression of the gradient of the misfit function:

$$\frac{ \partial C(\boldsymbol{m}) }{ \partial m_j } = - \Re \left\{ \left( \overbrace{ {\boldsymbol{A}}^{-1} \frac{ \partial{\boldsymbol{A}} }{ \partial m_j} \boldsymbol{u} }^{partial \;\; derivative \;\; wavefields} \right)^\dagger \overbrace{ {\boldsymbol{P}}^T \Delta \boldsymbol{d} }^{Data \;\; residuals} \right\}. \tag{10.3}\label{10.3}$$

In the case of the scalar wave equation for squared-slowness update, this simplifies to:

$$\frac{ \partial C(\boldsymbol{x}_j) }{ \partial m_j } = \sum_\omega \omega^2 \sum_{s,r} \Re \left\{ s(\omega) \overline{g_0}\left(\omega,\boldsymbol{x}_j;{\bf{s}}\right) \Delta {\bf{d}}(\omega,{\boldsymbol{r}},{\boldsymbol{s}}) \overline{g_0}\left(\omega,\boldsymbol{x}_j;{\boldsymbol{r}}\right) \right\}. \tag{10.4}\label{10.4}$$

Substitution of the expression of the Green functions in the gradient gives:

$$\frac{ \partial C(\boldsymbol{x}_l) }{ \partial m_l } = \sum_\omega \omega^2 \sum_{s,r} \Re \left\{ s(\omega) \Delta {\boldsymbol{d}}(\omega,{\boldsymbol{r}},{\boldsymbol{s}}) \exp \left( - i k_0 \left( {\hat{\boldsymbol{s}}} + {\hat{\boldsymbol{r}}} \right) . \boldsymbol{x} \right) \right\}. \tag{10.5}\label{10.5}$$

The wavenumber vector \( k_0 \left( {\hat{\boldsymbol{s}}} + {\hat{\boldsymbol{r}}} \right) \) can be written in polar form as:

$${\boldsymbol{k}} = k \left( \cos (\phi), \sin (\phi) \right), \tag{10.6}\label{10.6}$$

where \( \phi = \phi_s + \phi_r \) (see Figure 10.3).


Figure 10.3: Scattering and dip angles.

For a source-receiver pair and a frequency, the modulus of the wavenumber injected in the FWI model can be related to the local wavelength \( \lambda_0 = c_0 / f \) and the scattering angle \( \theta \) by:

$$k = \frac{ 2 \, \pi \, f }{c_0} \cos \left( \theta / 2 \right). \tag{10.7}\label{10.7}$$

From (\ref{10.5}), we conclude that the gradient has the form of a truncated inverse Fourier series.
The argument of the basis functions is the local wavenumber vector

$$ {\boldsymbol{k}} = k \left( \cos (\phi), \sin (\phi) \right) = k \left( {\hat{\boldsymbol{s}}} + {\hat{\boldsymbol{r}}} \right) \tag{10.8}\label{10.8}$$

injected in the FWI model at position \( \boldsymbol{x}_j\).
Truncation of the Fourier series is controlled by the source bandwidth, the scattering-angle \( \theta \) and the dip angle \( \phi \) illumination. The scattering-angle and the dip-angle illuminations are themselves controlled by the source-receiver acquisition geometry (Figure 10.4).


Figure 10.4: Wavenumber coverage versus source-receiver geometry and frequency. (left) Long-offset surface acquisition combining transmission and reflection (16km maximum offset) for four frequencies (3.5, 5, 7, 10 Hz). The velocity linearly increases with depth in the background velocity model. Vertical wavenumber coverage is fostered at the expense of small horizontal wavenumbers. (Right) Short offset acquisition. A flat reflector is injected in the background velocity model. It behaves as a secondary source. This secondary source allows for small horizontal wavenumber coverage along the transmission paths connecting the reflector and the sources/receivers. Migration of the short spread reflections maps high wavenumbers. There is a wavenumber gap between the small wavenumbers mapped by reflection tomography and the high wavenumbers mapped by migration of short-spread reflections. Classical seismic imaging workflow relies on this alternating update of the velocity macro model by tomography and the reflectivity by migration. Each of the two subproblems uses the output of the other subproblem as a passive quantity: the tomography uses the reflectivity as a secondary source, migration uses the velocity macromodel as the propagator.

In 3D, the azimuth angle should be added (Figure 10.5).

$$\boldsymbol{k} = k
\sin \left( \phi \right) \cos \left( \varphi \right) \\
\sin \left( \phi \right) \sin \left( \varphi \right) \\
\cos \left( \phi \right)
\right) ~~ \text{with} ~~ k = \frac{\omega}{V_P} \cos \left( \theta / 2 \right).


Figure 10.5: Angles controlling the FWI resolution in 3D: scattering, dip and azimuth angles.

Fundamental conclusions can be inferred from \ref{10.7} and \ref{10.8}

  • One frequency and one scattering angle map one wavenumber.
  • Low frequencies and wide-scattering angles map low wavenumbers. Conversely, high frequencies and small-scattering angles map high wavenumbers (Figure 10.6).


Figure 10.6: Low frequencies - wide angles map low wavenumbers. Conversely, high frequencies - short angles map high wavenumbers.

  • This leads to a redundant control of frequencies and scattering angle on wavenumber coverage for wide-aperture acquisitions, i.e., many (frequency - \( \theta \)) pairs map the same frequency (Figure 10.7).


Figure 10.7: Several pairs of frequency and scattering angles map the same wavenumber.

  • CPU-efficient FWI algorithms can be designed by decimating the redundancy of the wavenumber coverage. This amounts to limit the inversion to a few discrete frequencies (Figure 10.8). Sirgue& Pratt (2004) provides some guidelines to choose a frequency sampling in FWI (Figure 20). This criterion assumes that a dense array of point sources will allow for a fine sampling of the scattering angles. Under this condition, one frequency maps one band of wavenumber in the model space in virtue of the redundant control of frequency and scattering angle on the wavenumber coverage. Sirgue& Pratt (2004) considered the simple case of a reflection from a flat interface. In this framework, only the vertical wavenumbers of the target (the interface) are non zero. Formally, the criterion of Sirgue& Pratt (2004) has no practical applications in complex heterogeneous media; Yet, it provides useful guidelines to design efficient frequency-domain FWI algorithms.

    $$k_z(f) = \frac{2 f}{c} \cos \left( \theta / 2 \right). \tag{10.10}\label{10.10}$$

    The critetion of Sirgue& Pratt (2004) proposes that the maximum wavenumber mapper by a given frequency shoulb be equal to the minimum wavenumber mapped by the next frequency to remove the redundancy in the wavenumber coverage and minimize the number of frequencies involved in frequency-domain FWI.

    $$k_{max}(f)= k_{min}(f + \Delta f). \tag{10.11}\label{10.11}$$

    $$\frac{2 f}{c} \cos \left( \theta_{min} / 2 \right) = \frac{2 ( f + \Delta f )}{c} \cos \left( \theta_{max} / 2 \right). \tag{10.12}\label{10.12}$$

    From this equality, we infer easily the adaptive frequency interval to be used to remove wavenumber redundancy.

    $$\Delta f = w . f \; \; \; \text{where} \; \; \; w = \frac{ \cos (\theta_{min}/2) }{ \cos (\theta_{max}/2) - 1}. \tag{10.13}\label{10.13}$$

    Basic conclusions:

    • frequency interval increases with frequency.
    • frequency interval increases when the wide-angle coverage increases.


Figure 10.8: Vertical wavenumber bands mapped by a reflection from a flat reflector at the base of a homogeneous half space and by a finite number of frequenciess such that the wavenumber bands do not overlap. A smaller number of frequencies are necessary to continuously sample the wavenumber spectrum for long offset acquisitions relative to short offset counterpart according to (\ref{10.13}).

The wavepath in the frequency domain is computed by multiplying the monochromatic Green functions emitted from the source and the receiver (Figure 21). The real part of the complex-valued monochromatic wavepath is shown here. Note how the width of the first Fresnel zone and of the isochrones decrease with source-receiver offset.


Figure 10.9: Sensitivity kernel of FWI based upon single-scattering Born approximation. The top and middle panels show the shot and receiver monochromatic Green functions in homogeneous media for two different offsets. Bottom panels show the corresponding wavepaths ( Woodward, 1992). The width of the Fresnel zones correlate with the scattering angfe.


Figure 10.10: Wavepath, isochrone, Fresnel zones.

The wavepath in Figure 10.10 can be described from two slightly different perspectives, in the framework of the inverse scattering theory or reverse-time (adjoint) imaging principle.

  • The monochromatic wavepath for a point source represents an interference picture that shows the equi-phase surfaces associated with traveltimes of waves scattered by all the point diffractors in the model (i.e., the partial derivative wavefields). Cross-correlation of these partial derivative wavefields with the data residuals provide the contribution of the seismic trace to the gradient: a residual positioned at a given traveltime will be backprojected in space on the corresponding equi-phase surface. The width of the equi-phase surfaces (i.e., the modulus of the wave number vector) gives the resolution in the perpendicular direction to the surface.
  • Let's assume that the data residuals is a delta Dirac function in the frequency domain. The corresponding residual in the time domain has an infinite support. The monochromatic wavepath provides a complete picture over time of the spatial locations where residuals observed at different times will be backprojected.
The relationship between frequency, scattering angle, and wavenumber can be established in the framework of asymptotic ray+Born migration/inversion. Ray+Born migration/inversion is a migration method which has been recast in the theoretical framework of least-squares inverse problem theory by Lambare et al. (1992), Jin et al. (1992), Thierry et al. (1999a), Thierry et al. (1999b), Operto et al. (2000), Xu et al. (2001) and Lambare et al. (2003). Ray+Born inversion consists in a waveform inversion of the scattered wavefield where the Green's functions are computed with ray theory and the forward problem is linearized with the Born approximation. In the high frequency limit, the relationship between frequency, scattering angle and wavenumber can be clearly established through a diagonalization of the Hessian operator (see also the original work of Beylkin (1985) and Miller et al. (1987). The kernel of the ray+Born migration inversion can be used also to precondition reverse-time migration ( Metivier et al., 2015).We seek to image a point diffractor located in the middle of \( 101 \times 101 \) grid. The optimization algorithm is L-BFGS. Step length is estimated by line search. All of the frequencies are inverted in one go. Thirty iterations are performed. The prewhitening damping factor of the diagonal Hessian is \( 0.01 \times max ( diag \left\{ H_a \right\} ) \). We first illustrate band-pass filtering effects induced by narrow illumination of the wavenumber spectrum.

$$ \begin{array}{|c|c|c|c|} \hline V_{P_0} (m/s) & \Delta V_P (m/s) & \text{Frequency band (Hz)} & \text{h (m)} \\ \hline 4000 & 5000 & 0.2 - 20 Hz & 50 \\ \hline \end{array} \tag{10.14}\label{10.14}$$

The source wavelet is a Ricker of main frequency 5 Hz (Figure 10.11). Frequencies between 0.2 and 20 Hz are inverted simultaneously with a frequency sampling of 0.2 Hz.


Figure 10.11: Spectrum of the Ricker wavelet for the Dirac test.

We compare the FWI results for a single-shot acquisition and for a circular acquisition centered on the point diffractor. In the second case, bandpass filtering effects only result from the limited bandwidth of the source (Figure 10.12).

fig_fwi_pointdiffractor_1shot.png fig_fwi_pointdiffractor_fine_41shot.png

Figure 10.12: Reconstruction of a scatterer. (Left) Single-source surface acquisition. (Right) Full circular acquisition.

The lest panel in Figure 10.12 shows artifacts resulting from the sharp truncature of the FWI sensitivity kernel (the Fourier series) resulting from the incomplete illumionation of the dip and scattering angles. The radius of the first Fresnel zone in reflection mode is:

$$R = \sqrt{ \frac{ \lambda \, z }{ 2 } }\tag{10.15}\label{10.15}$$

This gives for this particular case (\( f_{max} \) = 20 Hz; \( z \approx 2 km\); \(V_P\) = 4 km/s): R = 447 m. This is consistent witht the width of the migrated segment shown in Figure 10.13. The circular inclusion gives a low-pass reconstruction of the scatterer. Sime mild artifacts are also generated by aliasing (see the following).


Figure 10.13: Size of the Fresnel zone associated with the reflection sampling the scatterer.

We assess now more precisely the resolution with which the scatterer has been reconstructed in Figure 10.13 (right). We look for the trapezoidal filter which allows us to roughly match the vertical log across the reconstructed scatterer with the log across the true scatterer after application of the trapezoidal filter (Figure 10.14). The filter shows that FWI reconstructs the low wavenumbers thanks to the full angular illumination. Moreover, the high wavenumber cutoff is roughly consistent with the maximum frequency of the Ricker wavelet.


Figure 10.14: Resolution of the scatterer reconstruction (Ricker wavelet).

We repeat the experiment by replacing the Ricker wavelet with a Dirac wavelet. The results are shown in Figure 10.15. Clearly, the inversion manages to reconstruct higher wavenumbers compared to Figure 10.14. This results because the spectral components injected in FWI are not anymore weighted by the spectrum of the Ricker wavelet (the amplitude spectrum of the wavelet can be seen as a weighting operator in the data misfit function).


Figure 10.15: Resolution of the scatterer reconstruction (Dirac wavelet).

Figure 10.16 compares the results of three experiments. In the first, multiple frequencies are simultaneously inverted with a Ricker wavelet as in Figure 10.14. In the second the Dirac wavelet replaces the Ricker wavelet as in Figure 10.15. In the third, the frequencies are inverted sequentially (frequency continuation) with the Ricker wavelet. In this case, FWI is blind to the amplitude spectrum of the source signature. The best resolution is achieved in this latter case because the information carried out by each frequency is optimally extracted.


Figure 10.16: Resolution of the scatterer reconstruction. (Left) Ricker wavelet. Simultaneous inversion of multiple frequencies. (Middle) Dirac wavelet. Simultaneous inversion of multiple frequencies. (Right) Ricker wavelet. Sequential inversion of multiple frequencies.

Aliasing issue in FWI

Beyond illumination, sparse acquisition may raise sampling issues: In frequency-domain FWI, the gradient of the misfit function is sampled in the wavenumber domain. If the wavenumber spectrum (either \( k \) or \( \phi \)) is not sampled sufficiently-finely in some directions, some spatial aliasing (wraparound) will occur along these directions with a period of \( 1 / \Delta k \) or \( 1 / \Delta \phi \). This is illustrated in Figure 10.17 with the diffractor test. On the left, we subsample the sources in the circular acquisition. In this case, the dip angle is subsampled, leading to complex wraparound pattern. On the right, we subsample the frequencies in the surface acquisition. In this case, we subsample the wavenumber modulus, generating wraparound of the reflector segment along vertical and subvertical directions. Aghamiry and al. (2019) have shown how to mitigate these artifacts wtih sparsity-promoting \( \ell{1} \) regularization.

Acquisition design in the framework of diffraction tomography as above reviewed in discussed in section 8 of Versteeg (2012).


Figure 10.17: Aliasing artifacts in FWI. (Left): Single-offset circular acquisition. Sources are downsampled. This translates into a downsampling of the dip angle. (Right) Single-source surface acquisition. Frequencies are downsampled. This translates into subsampling of the wavenumber modulus, which manifest by wraparoung of the reflector segment in the space domain.

Back to \ref{Content}


The cycle skipping pathology

Definition of cycle skipping

FWI is a nonlinear and ill-posed problem, mainly due to the oscillating nature of seismic waves and the uncomplete illumination of the underground from surface acquisition. The misfit function has many local minima (Figure 11.1).




Figure 11.1: Multimodality of the l2 data-difference based distance in FWI for a two parameter problem. The targeted medium is a velocity gradient model. The two parameters are the velocity v0 on top the model and the vertical velocity gradient \(\alpha\) ( Mulder and Plessix, 2008).

Therefore, a sufficiently-accurate initial model is required to converge towards the global minimum. A criterion that needs to be satisfied is that, in the framework of the Born approximation, the initial model should allow to match the traveltime of a given arrival with an error that does not exceed half the period.This condition results from the width of the Fresnel zones, on which data residuals are backprojected: Fresnel zones define surface over which scattered waves whose traveltime differ by no more than half a period interfere constructively to build an interpretable arrival (Figure 10.10). If the kinematic error exceeds the previous criterion, the difference-based residuals will be unconsistently back-projected on wrong isochrones. More precisely, the recorded and simulated component of a difference-based residual will be back-projected on different Fresnel zones. Moreover, the residuals associated with a given phase recorded at different offsets will be back-projected unconsistently from one offset to the next.



Figure 11.2: Heuristic interpretation of cycle skipping from two different perspectives. (Left) If the recorded and simulated components of the data residuals (top right inset) differ by more than one half a period, they will be back-projected onto two different isochrones. (Right) Phase ambiguity beyond \(\pi\). Half a period time lag between recorded and simulated


data means a phase shift of \(\pi\) beyond which the phase is ambiguous.

The previous criterion can be translated in terms of relative time error:

$$\frac{\Delta t}{T_L} < \frac{1}{2 \, N_\lambda}\tag{11.1}\label{11.1}$Equation (\ref{11.1}) shows that the number of propagated wavelengths \( N_\lambda \) should be as small as possible during the early stages
of FWI.

How to mitigate cycle skipping?

Data-driven hierarchical approaches

The most obvious way to decrease \( N_\lambda \) is to decrease frequencies. This prompts many authors to design multiscale approaches of FWI by hierarchical inversion of subdatasets of increasing high-frequency contents (Figure 10.8). In virtue of the relationships between local wavenumber, frequency and scattering angle, hierarchical inversions of increasing frequencies leads to a multiscale approach which proceeds from the long-wavelength to the short-wavelength reconstruction (Figure 11.3). A complementary approach consists of limiting the propagation distances. This can be achieved by jointly limiting source-receiver offset and traveltime (we also need to limit traveltimes to avoid migrating a deep short spread reflection; if the background velocity model is inaccurate, the migration will badly positioned the deep reflector and FWI may face difficulties to move this reflector; on the other hand, this deep reflector can be used by FWI as a deep scattering source which enrichs the low wavenumber content of the sensitivity kernel along the so-called rabbit ears (see Reflection Waveform Inversion (RWI) section). This leads to some layer-stripping approach where the subsurface model is built from top to bottom.


fig_overthrust.png Figure 11.3: Illustration of mutiscale-FWI by frequency continuation with the SEG/EAGE overthrust model. (top left) Initial model. (Bottom right). True model. Other panels whos FWO models recontructed after successivve processing of frequencies 3Hz, 5Hz, 8Hz and 20Hz.

The misfit function can be rescaled to introduce a weighting operator \( {\bf{W}}_d\) and design a better-contioned optimization problem:

$$C(\bf{m}) = \frac{1}{2} \Delta {\bf{d}}^\dagger \, {\bf{W}}_d \, \Delta {\bf{d}} = \frac{1}{2} \Delta {\bf{d}}^\dagger \, {\bf{S}}_d^\dagger \, {\bf{S}}_d \, \Delta {\bf{d}} = \| {\bf{S}}_d \, \Delta {\bf{d}} \|^2.\tag{11.2}\label{11.2}
The weighting operator \( {\bf{W}}_d = {\bf{S}}_d^\dagger \, {\bf{S}}_d \) in the data domain is written as the auto-correlation of operator \( {\bf{S}}_d \). \( {\bf{S}}_d \) is an operator that is applied to the data misfit vector, with the aim to weight the contribution of each element of this vector.
Minimization of the weighted misfit function leads to:
$$\Delta \bf{m} = \gamma
\Re \left( \bf{J}^\dagger \bf{W}_{d} \bf{J} + \left( \frac{\partial \bf{J} }{\partial \bf{m}^T } \right)^\dagger \bf{W}_d \left( \Delta \bf{d} ~ ... ~ \Delta \bf{d} \right) \right)^{-1}
\Re \left( \bf{J}^\dagger \bf{W}_d \Delta \bf{d} \right)
In the gradient, \(\bf{S}_d\) is applied both on the data residuals and the partial derivative wavefield (in terms of implementation \(\bf{W}_d\) can be applied directly to the
source of the adjoint-state equation (i.e., the data residuals).
In the linear Hessian, \(\bf{S}_d\) is applied to the partial derivative wavefields before taking the cross-correlation.
In the non linear Hessian term, \(\bf{W}_d\) is applied to the sequence of data residuals.

As an illustration, short offsets can have much stronger amplitudes than long-offset data. In this case, short-offset residuals (and partial derivative wavefields) have a dominant contribution in the misfit function and hence in the model building. In particular, the Hessian will correct the gradient for illumination effects only for these dominant arrivals because these arrivals will have a dominant effect in the computation of \( \bf{J}^\dagger \, \textbf{J} \) (each coefficient of this matrix is computed by a summation over the data space).
One possible data preconditioning is to correct the data for geometrical spreading through a diagonal weighting matrix \( {\bf{S}}_{d} \), which describes
a linear gain with offset in 3D and a gain as a square-root power of offset in 2D. This is illustrated in Figure 11.4 with an ultra-long offset OBS gather from the eastenr Nankai through



Figure 11.5: Ultra-long offset gather collected in the eastern Nankai subduction zone with two different amplitude balancing. Top: no gain with offset. Bottom: linear gain with offset. In the laetr case, more weight will be given to the long-offset arrivals during FWI.
Example of Laplace-domain FWI

In the frequency domain, this data-driven continuation approaches lead to Laplace-Fourier domain FWI.

In frequency domain FWI where a very few frequencies are processed at a time, the trick to select subdataset with respect to
time is to manipulate {\bf{\textcolor{red}{complex-valued frequencies}}}, which turn out to be equivalent to time damping by decaying exponential functions.
$$TF ( p(t) e^{-(t-t0)/\tau)} ) = \int_{-\infty}^{+\infty} p(t) e^{-i(\omega + i / \tau)t} e^{t_0 / \tau} dt.\tag{11.4}\label{11.4}$$
This transform was called Laplace-Fourier transform by Shin & Ha (2009). It was developed at the same time by Brossier et al. (2009) to process land data
with surface waves.
Accordingly, the recorded data in the time domain must be preprocessed by exponential damping before Fourier transform.
If the time damping is applied from the first arrival travel time \(t_0\), then \({\bf{S}}_d\) is a diagonal matrix of coefficients \(e^{t_0/\tau}].
In Figure 11.6, we show wavepaths associated with source and receiver monochromatic wavefields computed with complex-valued frequencies of different imaginary part. Indeed, the imaginary part of the complex-valued frequency damps the outer fringes of the wavepath, those associated with late-arriving phases. The effects of time damping is also illustrated in the time domain in Figure 11.7 where we see that aggressive time damping leaves in the gather only the early arrivals. Data preconditioning by frequency selection and time damping are combined in Laplace-Fourier FWI algorithms with two nested loops over frequencies and time dampings (see Gorszczyk et al., 2017 for a real-data application). This defines two levels of hierarchy over frequency and scattering angles in the FWI algorithm that controls the range of wavenumbers injected in the subsurface model during one multiscale step of the FWI. Put simply, applying time dampings from the first arrival is close in spirit to select events as a function of the aperture angle.

Also, a suitable choice of this imaginary part allows for the extraction of the phase of the first-arrival and hence makes frequency-domain waveform inversion algorithms amenable to first-arrival traveltime inversion or refraction tomography ( Shin et al., 2002; Min et al., 2006; Pyun et al., 2005; Ellefsen, 2009).









Figure 11.6:* Effect on time damping on Laplace-Fourier FWI. (Top) No time damping. (Middle) Low time damping. (Bottom) Strong time damping. Each figure shows the source and receiver Green functions (real part) and the associated sensitivity kernel (real part). In the bottom figure, the agressive time damping preserves mainly the first Fresnel zone.



Figure 11.7: Effects of time damping on time-domain gathers. Early (or wide-aperture) arrivals are selected to build long wavelengths. Relaxing the time damping progressively leads to a multi-resolution reconstruction. Damping values are 6s, 2s, 1s and 0.5s. The top panel os shown without time damping.
Beyond reduced-space FWI: Extending the linear regime of FWI by seach space extension

  • Wavefield Reconstruction Inversion (WRI) or extended source waveform inversion
  • Migration-based velocity analysis with space or time lag
Beyond difference-based residuals: Designing more convex distance

  • Deconvolution-based FWI
  • FWI based upon optimal transport

Back to \ref{Content}


Regularization of FWI

  • Regularization with penalty method
  • Regularization by reparametrization

Back to \ref{Content}


FWI and Least-Squares Reverse Time Migration (RTM)

The goal of prestack depth migration is to build an image in depth of the reflectivity (the short wavelengths) of the subsurface from reflected waves. This reflectivity mainly represents lithological or tectonic (fault, thrust) discontinuities (Figure 13.1). Migration is a linear process, which relies on the assumption of a scale separation between a background velocity model and the reflectivity (Figure 13.2). Accordingly, it relies on the linearization of the wave equation with the Born approximation. Least-squares migration recasts migration in the framework of least-squares quadratic waveform inversion. Quantitative characterization of the reflectivity is achieved, beyond a purely geometrical description. Least-squares Reverse Time Migration computes Green functions with full-waveform modelling engines, as FWI. An alternative approach relies on ray theory such as ray+Born or ray+Kirchhoff migration/inversion to compute Green function. These approaches are much faster but can provide inaccurate images in complex media (for example, with sald bodies and sharp contrasts).

The goal of this section is to show the subtul differences between a linear and a nonlinear waveform inversion by comparing the FWI and RTM formalism.


Figure 13.1: An application of least-squares ray+Born migration/inversion. Characterization of the decollement, offshore Ecuador (Ribodetti et al., 2011).



Figure 13.2: Scale separation between smooth background velocity model and reflectivity

For now, we assume that we know the velocity macromodel and we seek to estimate the reflectivity by solving a linear inverse problem (RTM). In the next section, we will assume that we know the reflectivity and we will address the more difficult velocity macromodel building problem by nonlinear waveform inversion (RWI).

Linearized forward problem equation

Here, we linearize the wave equation around a known smooth background velocity model \( \boldsymbol{m}_0 \). This will allow us to recarst the migration problem as a quadratic optimization problem.The background wavefield \( \boldsymbol{u}_0 \) satisfies the time-harmonic wave equation $$\boldsymbol{A}(\boldsymbol{m}_0) \boldsymbol{u}_0 = \boldsymbol{b}, \tag{13.1}\label{13.1}$$ where $$\boldsymbol{A}(\boldsymbol{m}_0) = \omega^2 \text{diag}(\boldsymbol{m}_0) + \Delta. \tag{13.2}\label{13.2}$$and \( \boldsymbol{m}_0 \) is a smooth background velocity model.

The true wavefield can be written as the sum of the background wavefield \( \boldsymbol{u}_0 \) and a scattered wavefield \( \delta \boldsymbol{u} \) $$ \boldsymbol{u} = \boldsymbol{u}_0 + \delta \boldsymbol{u} \tag{13.3}\label{13.3}$$.

Also, the subsurface model \( \boldsymbol{m} \) can be written as the sum of the background model \( \boldsymbol{m}_0 \) and a perturbation model \( \delta \boldsymbol{m} \) $$ \boldsymbol{m} = \boldsymbol{m}_0 + \delta \boldsymbol{m} \tag{13.4}\label{13.4}$$.

The true wavefield satisfies the wave equation$$\boldsymbol{A}(\boldsymbol{m}) \boldsymbol{u} = \boldsymbol{b}. \tag{13.5}\label{13.5}$$

Taking the difference between (\ref{13.1}) and (\ref{13.5}) gives the wave equation satisfied by the scattered wavefield \( \delta \boldsymbol{u} \) $$\boldsymbol{A}(\boldsymbol{m}_0) \delta \boldsymbol{u} = - ( \boldsymbol{A}(\boldsymbol{m}) - \boldsymbol{A}(\boldsymbol{m}_0)) \boldsymbol{u} \tag{13.6}\label{13.6}$$.

If we replace \( \boldsymbol{u} \) by \( \boldsymbol{u}_0 \) in the right-hand side, we obtain the single-scattered approximation of \( \delta \boldsymbol{u} \) in the framework of the Born approximation.$$\boldsymbol{A}(\boldsymbol{m}_0) \delta \boldsymbol{u} \approx - (\boldsymbol{A}(\boldsymbol{m}) - \boldsymbol{A}(\boldsymbol{m}_0) ) \boldsymbol{u}_0 \tag{13.7}\label{13.7}$$or equivalently$$\delta \boldsymbol{u} \approx - \boldsymbol{A}(\boldsymbol{m}_0)^{-1} (\boldsymbol{A}(\boldsymbol{m}) - \boldsymbol{A}(\boldsymbol{m}_0)) \boldsymbol{u}_0. \tag{13.8}\label{13.8}$$

For the scalar Helmholtz equation and considering the squared slowness as model parameters, we have$$\boldsymbol{A}(\boldsymbol{m}) - \boldsymbol{A}(\boldsymbol{m}_0) = \omega^2 \text{diag} (\delta \boldsymbol{m}). \tag{13.9}\label{13.9}$$.

Accordingly the single scattered wavefield is given by$$\delta \boldsymbol{u} = - \omega^2 \boldsymbol{A}(\boldsymbol{m}_0)^{-1} \text{diag} (\delta \boldsymbol{m}) \boldsymbol{u}_0. \tag{13.10}\label{13.10}$$Indeed, \( \omega^2 \text{diag} (\delta \boldsymbol{m}) \boldsymbol{u}_0 \) is a scattering source. If the difference between \( \boldsymbol{m} \) and \( \boldsymbol{m}_0 \) reduces to a velocity perturbation at a diffractor point in the subsurface, then \( \delta \boldsymbol{u} / \delta \boldsymbol{m} \) is nothing else that the partial derivative wavefield (\ref{4.3}).

The single scattered data can be written as$$ \delta \boldsymbol{d} = \boldsymbol{P} \delta \boldsymbol{u} = -\omega^2 \boldsymbol{P} \boldsymbol{A}(\boldsymbol{m}_0)^{-1} \text{diag} (\delta \boldsymbol{m}) \boldsymbol{u}_0 = \boldsymbol{B}(\boldsymbol{m}_0) \delta \boldsymbol{m}, \tag{13.11}\label{13.11}$$where the observation operator \( \boldsymbol{P} \) samples the scattered wavefield at the receiver position.

Optimization problem

The least-squares migration problem is quadratic problem$$\min_{\delta \boldsymbol{m}} \| \boldsymbol{B}(\boldsymbol{m}_0) \delta \boldsymbol{m} - \delta \boldsymbol{d}^* \| + \epsilon \| \delta \boldsymbol{m} \|_2^2, $$where a zero-order Tikhonov regularization was added and \( \epsilon \) is the penalty parameter.

The solution of the quadratic minimization problem at first iteration is$$\delta \boldsymbol{m}^{(1)} = \left[ \boldsymbol{B}^\dagger \boldsymbol{B} + \epsilon \boldsymbol{I} \right]^{-1} \boldsymbol{B}^\dagger \delta \boldsymbol{d}^* = \boldsymbol{B}^{-g}\delta \boldsymbol{d}^*, \tag{13.12}\label{13.12}$$where \( \boldsymbol{B}^{-g} \) denotes the generalized or pseudo-inverse inverse of \( \boldsymbol{B} \).

The simulated data \( \delta \boldsymbol{d}^{(1)} \) from the first migrated image \( \delta \boldsymbol{m}^{(1)} \) are given by $$\delta \boldsymbol{d}^{(1)} = \boldsymbol{B} \delta \boldsymbol{m}^{(1)}. \tag{13.13}\label{13.13}$$

The data residuals to minimize are$$\Delta \delta \boldsymbol{d}^{(1)} = \delta \boldsymbol{d}^{(1)} - \delta \boldsymbol{d}^*. \tag{13.14}\label{13.14}$$

We seek the perturbation \( \Delta \delta m \) of \( \delta \boldsymbol{m}^{(1)} \) which cancels out the residuals \( \Delta \delta \boldsymbol{d}^{(1)} \).

$$\min_{\Delta \delta \boldsymbol{m}} \| \boldsymbol{B} \Delta \delta m - \Delta \delta \boldsymbol{d}^{(1)}\|. \tag{13.15}\label{13.15}$$

Indeed, the minimizer is$$\Delta \delta \boldsymbol{m} = \boldsymbol{B}^{-g} \Delta \delta \boldsymbol{d}^{(1)}. \tag{13.16}\label{13.16}$$

Second migrated image At iteration 2, we get$$\delta \boldsymbol{m}^{2} = \delta \boldsymbol{m}^{1} + \Delta \delta \boldsymbol{m} = \boldsymbol{B}^{-g} \left( \delta \boldsymbol{d}^* + \Delta \delta \boldsymbol{d}^{(1)} \right). \tag{13.17}\label{13.17}$$
At iteration k,
$$\delta \boldsymbol{m}^{k} = \boldsymbol{B}^{-g} \left( \delta \boldsymbol{d}^* + \sum_{i=1}^{k-1} \Delta \delta \boldsymbol{d}^{(i)} \right). \tag{13.18}\label{13.18}$$
We have iteratively refined the accuracy of the reflectivity by updating the right-hand side of the quadratic misfit function (namely, the reflection data) with the running sum of the reflection residuals. This is a well-known iterative refinement procedure in linear algebra to improve the solution of ill-conditioned linear system.

Application on Marmousi (ray+Born waveform/inversion): Migrated images

An illustration of iterative asymptotic iterative lesat-squares migration is illustrated in Figures 13.3 and 13.4 for the complex Marmousi model. In such complex model, the ray fields are multi-valued (i.e., folded due to caustics). In this case and for a single-arrival migration scheme, a significant number of iterations are necessary to recover the true amplitudes of the reflectivity although a good approximation of the Hessian has been used.

Figure 13.3: (a) True reflectivity. (b) Migrated reflectivity (iteration 1). (c) Migrated reflectivity (iteration 9).


Figure 13.4: (a-c) Ray+Born seismograms computed in the true reflectivity (a), the migrated reflectivities at iteration 1 (b) and 9 (c).

Back to \ref{Content}


Reflection waveform inversion

We have seen how to compute reflectivity images by iterative least-squares RTM. However, if the velocity macromodel is not kinematically accurate, the migrated image is not optimally focused. We seek now to improve the velocity model from reflection data and reflectivity. These two tasks are performed iteratively in alternating mode (Figure 14.1). The FWI adaptation which achieves this goal is referred to as Reflection Waveform Inversion (RWI). For more details see, Xu et al. (2012), Brossier et al. (2015), Zhou et al. (2015) and Zhou et al. (2018) .


Figure 14.1: Reflection workflow with alternating-direction update of velocity macromodel and reflectivity.

We have seen during the resolution analysis that long wavelengths can be built from large scattering angles and low frequencies. How can we build the long wavelengths of the subsurface from short-offset reflection data without very low frequencies? The fundamentals of seismic reflection processing is to alternate the velocity macromodel building and the reflectivity imaging by migration (Figure 14.1). During the velocity macromodel building, the reflectivity is used as buried virtual sources to mimic a transmission experiment from depth to surface (horizontal cross-hole experiment) and the long-wavelengths of the velocity model are updated along the two paths connecting the reflector to the source and receiver (Figure 14.2).


Figure 14.2: Sketch of velocity macromodel building along the two one-way transmitted paths followed by a reflected wave. The position of the reflector is assumed known.

In the framework of waveform inversion, this approach requires however a slight reformulation of the FWI as explained in the following. A reflected wave contains a coupled information on the reflector properties (geometry and impedance contrast) and the smooth distribution of wavespeeds above the reflector. The issue is that the inprint of the former information (related to the migration task) often overprints the latter in the FWI sensitivity kernel (Figures 14.3). Therefore, we need to explicitly separate the two components of the sensitivity kernel as shown in Figure 14.3f where the transmitted wavepaths from the reflector to the source and the receiver have been extracted. We show now how to achieve this goal mathematically. The governing idea relies on the explicit separation between the velocity macromodel and the reflectivity and the exclusive simulation of reflected waves with the Born approximation.


Figure 14.3: RWI principle. Filtering out the migration isochrone from the FWI sensitivity kernel. (a) Source wavefield in homogeneous background. (b) Sensitivity kernel for a homogeneous background model. Outer fringes are migration isochrones on which reflected waves are back-projected during prestack depth migration and FWI. (c) Source wavefield for a homogeneous background model with a reflector (green line). (d) Sensitivity kernel for the homogeneous background model with a reflector. This kernel is dominated by the migration isocrhones as those shown in (b). (e) Difference between (a) and (c). (f) Difference between (b) and (d). This shows the sensitivity kernel with a rabbit-ear shape that should be used to update the velocity macromodel by RWI.

We now assume that \( \delta \boldsymbol{m} \) is known and we seek to update \( \boldsymbol{m}_0 \). We model the reflected wavefield with the Born approximation according to (\ref{13.9}). Accordingly, we need to involve now three state equations.

State equation 1: The background wavefield \( \boldsymbol{u}_0 \) satisfies the time-harmonic wave equation
$$\boldsymbol{A}(\boldsymbol{m}_0) \boldsymbol{u}_0 = \boldsymbol{b}, \tag{14.1}\label{14.1}$$
where$$\boldsymbol{A}(\boldsymbol{m}_0) = \omega^2 \text{diag}(\boldsymbol{m}_0) + \Delta. \tag{14.2}\label{14.2}$$

State equation 2: The reflected wavefield \( \delta \boldsymbol{u} \) satisfies the linearized wave equation (\ref{13.9})

$$\boldsymbol{A}(\boldsymbol{m}_0) \delta \boldsymbol{u} \approx - \omega^2 \text{diag}(\delta \boldsymbol{m}) \boldsymbol{u}_0, \tag{14.3}\label{14.3} $$
where we have used $$\boldsymbol{A}(\boldsymbol{m}_0 + \delta \boldsymbol{m}) - \boldsymbol{A}(\boldsymbol{m}_0) = \omega^2 \text{diag}(\delta \boldsymbol{m}). \tag{14.4}\label{14.4}$$

State equation 3: Observation equation
$$\delta \boldsymbol{d} = \boldsymbol{P} \delta \boldsymbol{u}. \tag{14.5}\label{14.5}$$

We seek to update \( \delta \boldsymbol{m} \) by lminimizing the least-squares misfit between \( \delta \boldsymbol{d} \) and \( \delta \boldsymbol{d}^* \).
Let's compute the gradient of the misfit function with the adjoint-state method.

Lagrangian function

$$ {\mathcal{L}}(\boldsymbol{m}_0,\overbrace{\boldsymbol{u}_0,\delta \boldsymbol{u}, \delta \boldsymbol{d}}^{State ~ variables}, \overbrace{\boldsymbol{v}, \delta \boldsymbol{v}, \delta \boldsymbol{w}}^{Adjoint ~ variables} )=
\frac{1}{2}\| \delta \boldsymbol{d} - \delta \boldsymbol{d}^* \|_2^2 + \Re < \boldsymbol{v} , \boldsymbol{A}(\boldsymbol{m}_0) \boldsymbol{u}_0 - \boldsymbol{b} > + \Re < \delta \boldsymbol{v} , \boldsymbol{A}(\boldsymbol{m}_0) \delta \boldsymbol{u} + \omega^2 \text{diag}(\delta \boldsymbol{m}) \boldsymbol{u}_0 >
+ < \delta \boldsymbol{w}, \delta \boldsymbol{d} - \boldsymbol{P} \delta \boldsymbol{u} >. \tag{14.6}\label{14.6}$$

Adjoint-state equation 1: \( \partial_{\delta \boldsymbol{d}} {\mathcal{L}} = 0 \)

$$\delta \boldsymbol{w} = \boldsymbol{P}^T \left(\delta \boldsymbol{d} - \delta \boldsymbol{d}^* \right). \tag{14.7}\label{14.7}$$

Adjoint-state equation 2: \( \partial_{\delta \boldsymbol{u}} {\mathcal{L}} = 0 \)

$$\boldsymbol{A}(\boldsymbol{m}_0)^\dagger \delta \boldsymbol{v} = \boldsymbol{P}^T \delta \boldsymbol{w} = \boldsymbol{P}^T \left( \delta \boldsymbol{d} - \delta \boldsymbol{d}^* \right). \tag{14.8}\label{14.8}$$

Adjoint-state equation 3: \( \partial_{\boldsymbol{u}_0} {\mathcal{L}} = 0 \)

$$\boldsymbol{A}(\boldsymbol{m}_0)^\dagger \boldsymbol{v} = \omega^2 \text{diag}(\delta \boldsymbol{m}) \delta \boldsymbol{v}. \tag{14.9}\label{14.9} $$

Gradient: \( \partial_{\boldsymbol{m}_0} {\mathcal{L}} \)
$$\nabla_{\boldsymbol{m}_0} { {\mathcal{C}}(\boldsymbol{m}_0)} = \Re \left\{ \boldsymbol{u}_0^T \left( \frac{ \partial \boldsymbol{A}(\boldsymbol{m}_0)}{\partial \boldsymbol{m}_0} \right)^T \overline{\boldsymbol{v}} \right\} + \Re \left\{ \delta \boldsymbol{u}^T \left( \frac{ \partial \boldsymbol{A}(\boldsymbol{m}_0)}{\partial \boldsymbol{m}_0} \right)^T \overline{\delta \boldsymbol{v}} \right\}. \tag{14.10}\label{14.10}$$
The first term in the gradient is the zero-lag correlation between downgoing incident wavefield \( \boldsymbol{u}_0 \) and upgoing adjoint wavefield \( \boldsymbol{v} \), while the second one is the zero lag correlation between upgoing reflected wavefield \( \delta \boldsymbol{u} \) and downgoing back-propagated residual wavefield \( \delta \boldsymbol{v} \) (Figure 14.4).


Figure 14.4: Sensitivity wavepath generated by a reflector. Left: Correlation between the source wavefield and the reflection of the receiver wavefield, first term in (\ref{14.10}). Right: correlation of the receiver wavefield with the reflection of the source wavefield from the reflector, second term in (\ref{14.10}).

Back to \ref{Content}


Source estimation in FWI

The temporal source signature is often an unknown in FWI. The source wavelet is estimated by solving a linear inverse problem, assuming the subsurface model known. We assume that we estimate a source signature \( s(\omega) \) per common-shot gather. The source signature estimation is straightforwardly solved in the frequency domain as shown hereafter. If the signature should be estimated in the time domain, each spectral component is estimated as shown hereafter before taking an inverse Fourier transform.

Forward problem

$$\boldsymbol{A}(\omega,\boldsymbol{m}(\boldsymbol{x})) \boldsymbol{u}(\omega,\boldsymbol{x}) = s(\omega) \delta (\boldsymbol{x}-\boldsymbol{x}_s). \tag{15.1}\label{15.1} $$

Inverse problem

We seek to minimize the misfit function for the complex-valued unknown \( s\left(\omega\right) \):

$$\min_{s(\omega)} C(s(\omega)) = \min_{s(\omega)} \sum_{r} \lVert \boldsymbol{P}_r \boldsymbol{u}(\omega,\boldsymbol{x}) - d_r^*(\omega) \rVert^2 = \min_{s(\omega)} \sum_r \lVert \boldsymbol{P}_r \boldsymbol{A}^{-1}(\omega) \, s(\omega) \, \delta (\boldsymbol{x}-\boldsymbol{x}_s) - d_r^*(\omega) \rVert^2. \tag{15.2}\label{15.2}$$

Gradient of \( C \)
$$\nabla_{s(\omega)} C(s(\omega)) = \sum_{r} \left( \boldsymbol{P}_r \left(\overline{\boldsymbol{A}^{-1}(\omega)}\right) \, \bar{s}(\omega) \, \delta (\boldsymbol{x}-\boldsymbol{x}_s) - \bar{d^*}_r(\omega) \right) \boldsymbol{P}_r \boldsymbol{A}^{-1}(\omega) \, \delta \left(\boldsymbol{x}-\boldsymbol{x}_s\right). \tag{15.3}\label{15.3}$$

To derive the previous equation, I use the property of derivation with respect to complex variables:

$$\partial_z z = 1 \; \; \; \; \; \; \partial_{z} \bar{z} = 0. \tag{15.4}\label{15.4}$$

Zeroing the gradient gives for \( s \):

$$s(\omega) = \frac{\sum_r \boldsymbol{P}_r \boldsymbol{A}^{-1}(\omega) \, \delta (\boldsymbol{x}-\boldsymbol{x}_s) \cdot d^*_r}{\sum_r \lVert \boldsymbol{P}_r \boldsymbol{A}^{-1}(\omega) \, \delta (\boldsymbol{x}-\boldsymbol{x}_s) \rVert^2} = \frac{\sum_r \bar{g}(\omega,\boldsymbol{x}_r;\boldsymbol{x}_s) \cdot d_r^* }{ \sum_r \bar{g}(\omega,\boldsymbol{x}_r;\boldsymbol{x}_s) \cdot g(\omega,\boldsymbol{x}_r;\boldsymbol{x}_s) }, \tag{15.5}\label{15.5} $$
where \( g(\omega,\boldsymbol{x}_r;\boldsymbol{x}_s) \) denotes the value of the Green functions generated by the source and recorded at the receiver \( r \).
Above, we assume the medium is known. In a FWI algorithm, the source wavelet estimation is often nested with the update of the subsurface medium according to the following iterative sequence:
  1. Compute incident Green functions by solving (\ref{15.1}).
  2. Estimate source wavelet for each shot gather with (\ref{15.5}).
  3. Compute the data residuals and the misfit function.
  4. Compute the adjoint wavefields.
  5. Compute the gradient for the subsurface model perturbations and estimate the step length.
  6. Update the subsurface medium.

Plessix and Cao (2011) recast the source-estimation problem in the framework of the adjoint-state method with two possible implementations. We now introduce a weighting operator \( \boldsymbol{W}\) in the data misfit function.

Implementation 1: The source signature is processed as an optimization parameter.

  • Lagrangian function

$${\mathcal{L}}\left(\boldsymbol{m},s,\boldsymbol{g},\boldsymbol{d},\boldsymbol{v}_1,\boldsymbol{v}_2 \right) = \sum_r w_r \lVert d_r - d^*_r \rVert^2 + \Re <\boldsymbol{v}_1 , \boldsymbol{A} \boldsymbol{g} - \delta \left( \boldsymbol{x} - \boldsymbol{x}_s \right) > + \sum_r \overline{{v}_2}_r \cdot \left( d_r - s \,\boldsymbol{P}_r \boldsymbol{g} \right), \tag{15.6}\label{15.6}$$

where \( \boldsymbol{g} \) denotes the Green functions.
  • Adjoint-state equations

$$\boldsymbol{A}^T \overline{\boldsymbol{v}}_1 = s \, \overline{\boldsymbol{v}_2}, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
\boldsymbol{v}_2 = {\boldsymbol{W}}_d \, \left( \boldsymbol{d} - \boldsymbol{d}^* \right). \tag{15.7}\label{15.7}$$

  • Gradient of the misfit function with respect to \( s \)

$$\nabla C_{s}(\boldsymbol{m},s) = \left[ \boldsymbol{W} \, (\boldsymbol{d} - \boldsymbol{d}^*) \right]^\dagger \boldsymbol{P}_r \boldsymbol{g} = 0. \tag{15.8}\label{15.8}$$

This gives (remember that \( \boldsymbol{d} = s \,\boldsymbol{P}_r \boldsymbol{g} \) ):

$$s = \frac{ \left(\boldsymbol{P} \boldsymbol{g}\right)^\dagger \, \boldsymbol{W} \, \boldsymbol{d}^* }{ \left( \boldsymbol{P} \boldsymbol{g} \right)^\dagger \boldsymbol{W} \boldsymbol{P} \boldsymbol{g} }. \tag{15.9}\label{15.9}$$

  • Gradient of the misfit function with respect to \( \boldsymbol{m} \)

$$\nabla_\boldsymbol{m} C(\boldsymbol{m},s) = \Re \left\{ \boldsymbol{g}^T \left( \frac{ \partial \boldsymbol{A} }{\partial \boldsymbol{m} } \right)^T \bar{\boldsymbol{v}} \right\}
= \Re \left\{ s \boldsymbol{g}^T \left( \frac{ \partial \boldsymbol{A} }{\partial \boldsymbol{m} } \right)^T \left( \boldsymbol{A}^{-1} \right)^T \boldsymbol{W} \, ( \overline{\boldsymbol{d} - \boldsymbol{d}^*} ) \right\}. \tag{15.10}\label{15.10}$$

Implementation 2: the source is processed as a state variable.

The previous implementation requires to use the same data-space weighting operator \( \boldsymbol{W} \) to compute \( s\) and \( \nabla_{\boldsymbol{m}} C \). This can put some limitations for specific applications (for example, see Operto et al., (2006): for example, data weighting may not be wished to estimate \( s \) if the short-offset data are the most suitable for source estimation.

Plessix and Cao (2011) proposes to use \( s \) as a state to gain some flexibility in the inversion design. The state equation satisfied by \( s \) is the solution of the linear inverse problem for \( s \) where we have the flexibility to choose a suitable weighting function in the functional.

$$C_s(s)= \left( s \,\boldsymbol{P} \boldsymbol{g} - \boldsymbol{d}^* \right)^\dagger {\boldsymbol{W}}_s \left( s \,\boldsymbol{P} \boldsymbol{g} - \boldsymbol{d}^* \right). \tag{15.11}\label{15.11}$$

  • State equation satisfied by the state \( s \):

$${\boldsymbol{r}}^\dagger {\boldsymbol{W}}_s {\boldsymbol{r}} \, s - {\boldsymbol{r}}^\dagger {\boldsymbol{W}}_s \boldsymbol{d}^* = 0, \tag{15.12}\label{15.12} $$
where we introduce the state variable \( {\boldsymbol{r}} =\boldsymbol{P} \boldsymbol{g} \).

  • Lagrangian function

$${\mathcal{L}}\left( \boldsymbol{m},\boldsymbol{g},{\boldsymbol{r}},\boldsymbol{d},s,\boldsymbol{v}_1,\boldsymbol{v}_2,\boldsymbol{v}_3, v_4\right) = \left( \boldsymbol{d} - \boldsymbol{d}^* \right)^\dagger {\boldsymbol{W}}_d \left( \boldsymbol{d} - \boldsymbol{d}^* \right) + \Re < \boldsymbol{v}_1 , \boldsymbol{A} \boldsymbol{g} - \delta \left( \boldsymbol{x} - \boldsymbol{x}_s \right) >
+ < \boldsymbol{v}_2 , {\boldsymbol{r}} -\boldsymbol{P} \, \boldsymbol{g} > + < \boldsymbol{v}_3 , \boldsymbol{d} - s \, {\boldsymbol{r}} > + < v_4, {\boldsymbol{r}}^\dagger {\boldsymbol{W}}_s {\boldsymbol{r}} \, s - {\boldsymbol{r}}^\dagger {\boldsymbol{W}}_s \boldsymbol{d}^* >. \tag{15.13}\label{15.13}$$

  • Adjoint-state equations

$$\frac{\partial {\mathcal{L}} }{\partial \boldsymbol{g} } = 0 \Longrightarrow \boldsymbol{A}^T \, \boldsymbol{v}_1 =\boldsymbol{P}^T \boldsymbol{v}_2. \tag{15.14}\label{15.14}$$
$$\frac{\partial {\mathcal{L}} }{\partial \boldsymbol{r} } = 0 \Longrightarrow \boldsymbol{v}_2 = s \, \left( \boldsymbol{v}_3- \, v_4 {\boldsymbol{W}}_s \boldsymbol{r} \right). \tag{15.15}\label{15.15}$$
$$\frac{\partial {\mathcal{L}} }{\partial \boldsymbol{d} } = 0 \Longrightarrow \boldsymbol{v}_3 = - {\boldsymbol{W}}_d \left( \boldsymbol{d} - \boldsymbol{d}^* \right). \tag{15.16}\label{15.16}$$
$$\frac{\partial {\mathcal{L}} }{\partial s } = 0 \Longrightarrow
v_4 = \frac{ \boldsymbol{v}_3^\dagger \boldsymbol{r}}{ \boldsymbol{r}^\dagger {\boldsymbol{W}}_s \boldsymbol{r} } = - \bar{s} \frac{ \boldsymbol{r}^\dagger {\boldsymbol{W}}_d \boldsymbol{r} }{ \boldsymbol{r}^\dagger {\boldsymbol{W}}_s \boldsymbol{r} } +\frac{ {\boldsymbol{d}^*}^\dagger {\boldsymbol{W}}_d \boldsymbol{r} }{ \boldsymbol{r}^\dagger {\boldsymbol{W}}_s \boldsymbol{r}}. \tag{15.17}\label{15.17}$$

If \( {\boldsymbol{W}}_d = {\boldsymbol{W}}_s \), \(v_4\) = 0. Same expression of the adjoint wavefield as for implementation 1.
  • Gradient

$$\nabla_m C = \Re < \boldsymbol{v}_1 , \frac{ \partial \boldsymbol{A} }{ \partial \boldsymbol{m} } \, \boldsymbol{g} >. \tag{15.18}\label{15.18}$$

Wavelet estimation as a QC of FWI results.

Sequences: [1] compute \( s \) with equation XXX, [2] \( v_4 \) , [3] \( \boldsymbol{v}_3 \), [4] \( \boldsymbol{v}_2\) and [5] \( \boldsymbol{v}_1 \).

From Malinoswki et al. (2011) Other examples are shown in \citet{Prieux_2011_FAI}, \citep{Operto_2015_ETF}.

A remark: how to numerically check the adjoint-state equation: the inner product test Kern, 2010, page 110.

$$\partial_u F(a,u) \, v = b$$
$$\partial_u F(a,u)^\dagger \, p = c $$

By definition of the adjoint,

$$<\partial_u F(a,u) \, v, p > = < \partial_u F(a,u)^\dagger p , v>$$

that is

$$<b,p> = <c,v>$$

Back to \ref{Content}


Multi-parameter FWI


Appendix 1: Few words about the method of Lagrange multiplier

Topic attachments
I Attachment Action Size Date Who Comment
cycle.pngpng cycle.png manage 72.6 K 18 Feb 2020 - 13:02 Main.UnknownUser  
fig_adjoint_snap.pngpng fig_adjoint_snap.png manage 118.2 K 03 Jan 2020 - 23:31 Main.UnknownUser  
fig_aliasing.pngpng fig_aliasing.png manage 129.8 K 26 Jan 2020 - 16:46 Main.UnknownUser  
fig_baragiano.pngpng fig_baragiano.png manage 645.8 K 23 Jan 2020 - 07:58 Main.UnknownUser  
fig_data_weigth.pngpng fig_data_weigth.png manage 6679.8 K 18 Apr 2020 - 12:35 Main.UnknownUser  
fig_diag_hessian.pngpng fig_diag_hessian.png manage 56.0 K 23 Jan 2020 - 07:50 Main.UnknownUser  
fig_doublescattering.pngpng fig_doublescattering.png manage 173.1 K 23 Jan 2020 - 08:18 Main.UnknownUser  
fig_freq_versus_angle.pngpng fig_freq_versus_angle.png manage 1097.5 K 23 Jan 2020 - 22:09 Main.UnknownUser  
fig_fwi_Fresnelzone.pngpng fig_fwi_Fresnelzone.png manage 52.8 K 24 Jan 2020 - 09:33 Main.UnknownUser  
fig_fwi_pointdiffractor_1shot.pngpng fig_fwi_pointdiffractor_1shot.png manage 34.4 K 24 Jan 2020 - 09:28 Main.UnknownUser  
fig_fwi_pointdiffractor_fine_41shot.pngpng fig_fwi_pointdiffractor_fine_41shot.png manage 47.7 K 24 Jan 2020 - 09:28 Main.UnknownUser  
fig_hessian1.pngpng fig_hessian1.png manage 360.7 K 23 Jan 2020 - 07:42 Main.UnknownUser  
fig_imaging_principle.pngpng fig_imaging_principle.png manage 178.6 K 03 Jan 2020 - 21:42 Main.UnknownUser  
fig_imagingprinciple2.pngpng fig_imagingprinciple2.png manage 43.0 K 17 Jan 2020 - 22:23 Main.UnknownUser  
fig_imagingprinciple_adjoint.pngpng fig_imagingprinciple_adjoint.png manage 59.0 K 03 Jan 2020 - 23:10 Main.UnknownUser  
fig_impedancematrix.pngpng fig_impedancematrix.png manage 194.4 K 19 Feb 2020 - 08:48 Main.UnknownUser  
fig_inclusion_full.pngpng fig_inclusion_full.png manage 119.4 K 24 Jan 2020 - 09:44 Main.UnknownUser  
fig_inclusion_full_dirac.pngpng fig_inclusion_full_dirac.png manage 118.9 K 24 Jan 2020 - 09:47 Main.UnknownUser  
fig_inclusion_seq_versus_simul.pngpng fig_inclusion_seq_versus_simul.png manage 84.9 K 24 Jan 2020 - 09:50 Main.UnknownUser  
fig_inclusion_small_vtrue.pngpng fig_inclusion_small_vtrue.png manage 50.4 K 23 Jan 2020 - 07:37 Main.UnknownUser  
fig_marmousi_macro+mig.pngpng fig_marmousi_macro+mig.png manage 353.9 K 18 Feb 2020 - 12:30 Main.UnknownUser  
fig_marmousi_tomo+mig.pngpng fig_marmousi_tomo+mig.png manage 353.0 K 18 Feb 2020 - 13:00 Main.UnknownUser  
fig_mig_ray+Born.pngpng fig_mig_ray+Born.png manage 372.3 K 18 Feb 2020 - 12:58 Main.UnknownUser  
fig_migration_Marmousi.pngpng fig_migration_Marmousi.png manage 300.5 K 18 Feb 2020 - 13:03 Main.UnknownUser  
fig_mulder.pngpng fig_mulder.png manage 1094.0 K 18 Apr 2020 - 09:12 Main.UnknownUser  
fig_overthrust.pngpng fig_overthrust.png manage 1025.7 K 18 Apr 2020 - 12:13 Main.UnknownUser  
fig_param_acqui.pngpng fig_param_acqui.png manage 256.8 K 20 Jan 2020 - 21:50 Main.UnknownUser  
fig_partialderivativewavefield.pngpng fig_partialderivativewavefield.png manage 65.4 K 03 Jan 2020 - 20:43 Main.UnknownUser  
fig_plane_wave.pngpng fig_plane_wave.png manage 588.4 K 20 Jan 2020 - 22:02 Main.UnknownUser  
fig_planewave_angle.pngpng fig_planewave_angle.png manage 596.3 K 20 Jan 2020 - 22:10 Main.UnknownUser  
fig_radiation_acoustic.pngpng fig_radiation_acoustic.png manage 261.3 K 03 Jan 2020 - 21:06 Main.UnknownUser  
fig_radiation_pattern_acoustic.pngpng fig_radiation_pattern_acoustic.png manage 67.6 K 03 Jan 2020 - 21:02 Main.UnknownUser  
fig_radiation_snap_h.pngpng fig_radiation_snap_h.png manage 66.3 K 03 Jan 2020 - 21:04 Main.UnknownUser  
fig_rayborn_modeling_Marmousi.pngpng fig_rayborn_modeling_Marmousi.png manage 479.7 K 18 Feb 2020 - 13:04 Main.UnknownUser  
fig_ricker.pngpng fig_ricker.png manage 23.8 K 24 Jan 2020 - 09:23 Main.UnknownUser  
fig_sd_versus_gn.pngpng fig_sd_versus_gn.png manage 159.7 K 23 Jan 2020 - 08:07 Main.UnknownUser  
fig_sensitivity_kernel.pngpng fig_sensitivity_kernel.png manage 465.9 K 24 Jan 2020 - 08:42 Main.UnknownUser  
fig_sensitivity_kernel1.pngpng fig_sensitivity_kernel1.png manage 557.5 K 03 Jan 2020 - 22:18 Main.UnknownUser  
fig_sensitivity_kernel_cycleskipping.pngpng fig_sensitivity_kernel_cycleskipping.png manage 794.6 K 18 Apr 2020 - 10:30 Main.UnknownUser  
fig_sirgue_criterion.pngpng fig_sirgue_criterion.png manage 456.1 K 23 Jan 2020 - 22:23 Main.UnknownUser  
fig_sketch_mbtt.pngpng fig_sketch_mbtt.png manage 90.9 K 18 Feb 2020 - 13:36 Main.UnknownUser  
fig_step_length.pngpng fig_step_length.png manage 29.8 K 04 Feb 2020 - 21:21 Main.UnknownUser  
fig_time_damping.pngpng fig_time_damping.png manage 1284.0 K 18 Apr 2020 - 13:21 Main.UnknownUser  
fig_wavenumber_3d.pngpng fig_wavenumber_3d.png manage 339.2 K 23 Jan 2020 - 21:40 Main.UnknownUser  
fig_wavenumber_map.pngpng fig_wavenumber_map.png manage 373.4 K 20 Jan 2020 - 22:26 Main.UnknownUser  
fig_wavenumber_redundancy.pngpng fig_wavenumber_redundancy.png manage 1097.6 K 23 Jan 2020 - 22:15 Main.UnknownUser  
fig_wavepath_gp.pngpng fig_wavepath_gp.png manage 695.6 K 18 Feb 2020 - 14:01 Main.UnknownUser  
fig_wavepath_ld.pngpng fig_wavepath_ld.png manage 446.0 K 18 Apr 2020 - 12:51 Main.UnknownUser  
fig_wavepath_nd.pngpng fig_wavepath_nd.png manage 1248.1 K 18 Apr 2020 - 12:52 Main.UnknownUser  
fig_wavepath_offset.pngpng fig_wavepath_offset.png manage 1074.9 K 23 Jan 2020 - 22:34 Main.UnknownUser  
fig_wavepath_reflector.pngpng fig_wavepath_reflector.png manage 511.0 K 18 Feb 2020 - 13:37 Main.UnknownUser  
fig_wavepath_sd.pngpng fig_wavepath_sd.png manage 299.6 K 18 Apr 2020 - 12:53 Main.UnknownUser  
fig_wraparound2.pngpng fig_wraparound2.png manage 337.4 K 18 Feb 2020 - 14:14 Main.UnknownUser  
non_convex.pngpng non_convex.png manage 63.6 K 18 Jan 2020 - 20:22 Main.UnknownUser